地球资源数据云——数据资源详情

假新闻内容检测

Name: 假新闻内容检测
Published: 2026-03-17 14:30:47

发布时间：2026-03-17 14:30:47资源ID：2032008142911868930资源类型：免费

该数据集《Fake News Content Detection 》主要用于多分类任务，数据形态以文本为主，应用场景偏向文本内容分析。题目说明：NLP, Sentiment Analysis using TF - IDF, CountVectorizer, Transformers, BERT 任务类型：文本多分类。建议流程：先做文本清洗与分词，再比较 TF - IDF+线性模型与预训练语言模型。评估建议：使用分层切分或交叉验证，优先关注 F1、Recall、AUC 等分类指标。可用文件：sample submission.csv, test.csv, train.csv。 Overview Welcome to another weekend hackathon, this weekend we are providing a great opportunity to the machinehackers to flex their NLP muscles again by building a fake content detection algorithm. Fake contents are everywhere from social media platforms, news platforms and there is a big list. Considering the advancement in NLP research institutes are putting a lot of sweat, blood, and tears to detect the fake content generated across the platforms. Fake news, defined by the New York Times as “a made - up story with an intention to deceive”, often for a secondary gain, is arguably one of the most serious challenges facing the news industry today. In a December Pew Research poll, 64% of US adults said that “made - up news” has caused a “great deal of confusion” about the facts of current events In this hackathon, your goal as a data scientist is to create an NLP model, to combat fake content problems. We believe that these AI technologies hold promise for significantly automating parts of the procedure human fact - checkers use today to determine if a story is real or a hoax.

摘要概览

该数据集《Fake News Content Detection 》主要用于多分类任务，数据形态以文本为主，应用场景偏向文本内容分析。题目说明：NLP, Sentiment Analysis using TF - IDF, CountVectorizer, Transformers, BERT

任务类型：文本多分类。

建议流程：先做文本清洗与分词，再比较 TF - IDF+线性模型与预训练语言模型。

评估建议：使用分层切分或交叉验证，优先关注 F1、Recall、AUC 等分类指标。

可用文件：sample submission.csv, test.csv, train.csv。

Overview

Welcome to another weekend hackathon, this weekend we are providing a great opportunity to the machinehackers to flex their NLP muscles again by building a fake content detection algorithm. Fake contents are everywhere from social media platforms, news platforms and there is a big list.

Considering the advancement in NLP research institutes are putting a lot of sweat, blood, and tears to detect the fake content generated across the platforms.

Fake news, defined by the New York Times as “a made - up story with an intention to deceive”, often for a secondary gain, is arguably one of the most serious challenges facing the news industry today. In a December Pew Research poll, 64% of US adults said that “made - up news” has caused a “great deal of confusion” about the facts of current events

In this hackathon, your goal as a data scientist is to create an NLP model, to combat fake content problems. We believe that these AI technologies hold promise for significantly automating parts of the procedure human fact - checkers use today to determine if a story is real or a hoax.