地球资源数据云——数据资源详情
该数据集《Fake News Content Detection 》主要用于多分类任务,数据形态以文本为主,应用场景偏向文本内容分析。 题目说明:NLP, Sentiment Analysis using TF - IDF, CountVectorizer, Transformers, BERT 任务类型:文本多分类。 建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:sample submission.csv, test.csv, train.csv。 Overview Welcome to another weekend hackathon, this weekend we are providing a great opportunity to the machinehackers to flex their NLP muscles again by building a fake content detection algorithm. Fake contents are everywhere from social media platforms, news platforms and there is a big list. Considering the advancement in NLP research institutes are putting a lot of sweat, blood, and tears to detect the fake content generated across the platforms. Fake news, defined by the New York Times as “a made - up story with an intention to deceive”, often for a secondary gain, is arguably one of the most serious challenges facing the news industry today. In a December Pew Research poll, 64% of US adults said that “made - up news” has caused a “great deal of confusion” about the facts of current events In this hackathon, your goal as a data scientist is to create an NLP model, to combat fake content problems. We believe that these AI technologies hold promise for significantly automating parts of the procedure human fact - checkers use today to determine if a story is real or a hoax.

该数据集《Fake News Content Detection 》主要用于多分类任务,数据形态以文本为主,应用场景偏向文本内容分析。 题目说明:NLP, Sentiment Analysis using TF - IDF, CountVectorizer, Transformers, BERT
任务类型:文本多分类。
建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。
评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。
可用文件:sample submission.csv, test.csv, train.csv。
Overview
Welcome to another weekend hackathon, this weekend we are providing a great opportunity to the machinehackers to flex their NLP muscles again by building a fake content detection algorithm. Fake contents are everywhere from social media platforms, news platforms and there is a big list.
Considering the advancement in NLP research institutes are putting a lot of sweat, blood, and tears to detect the fake content generated across the platforms.
Fake news, defined by the New York Times as “a made - up story with an intention to deceive”, often for a secondary gain, is arguably one of the most serious challenges facing the news industry today. In a December Pew Research poll, 64% of US adults said that “made - up news” has caused a “great deal of confusion” about the facts of current events
In this hackathon, your goal as a data scientist is to create an NLP model, to combat fake content problems. We believe that these AI technologies hold promise for significantly automating parts of the procedure human fact - checkers use today to determine if a story is real or a hoax.