地球资源数据云——数据资源详情
该数据集《Cyberbullying Classification》主要用于二分类任务,数据形态以文本为主,应用场景偏向金融风控。 题目说明:47k tweets belonging to 6 balanced classes. 任务类型:文本二分类。 建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:cyberbullying_tweets.csv。 Abstract With rise of social media coupled with the Covid - 19 pandemic, cyberbullying has reached all time highs. We can combat this by creating models to automatically flag potentially harmful tweets as well as break down the patterns of hatred. About this dataset As social media usage becomes increasingly prevalent in every age group, a vast majority of citizens rely on this essential medium for day - to - day communication. Social media’s ubiquity means that cyberbullying can effectively impact anyone at any time or anywhere, and the relative anonymity of the internet makes such personal attacks more difficult to stop than traditional bullying.

该数据集《Cyberbullying Classification》主要用于二分类任务,数据形态以文本为主,应用场景偏向金融风控。 题目说明:47k tweets belonging to 6 balanced classes.
任务类型:文本二分类。
建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。
评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。
可用文件:cyberbullying_tweets.csv。
Abstract
With rise of social media coupled with the Covid - 19 pandemic, cyberbullying has reached all time highs. We can combat this by creating models to automatically flag potentially harmful tweets as well as break down the patterns of hatred.
About this dataset
As social media usage becomes increasingly prevalent in every age group, a vast majority of citizens rely on this essential medium for day - to - day communication. Social media’s ubiquity means that cyberbullying can effectively impact anyone at any time or anywhere, and the relative anonymity
of the internet makes such personal attacks more difficult to stop than traditional bullying.