地球资源数据云——数据资源详情

带有灾难推文的 NLP - 清理数据

发布时间:2026-03-17 14:31:03资源ID:2032003092038717441资源类型:免费

该数据集《NLP with Disaster Tweets - cleaning data》主要用于监督学习任务,数据形态以图像为主,应用场景偏向文本内容分析。 题目说明:Cleaning dataset for Kaggle Competition "Real or Not? NLP with Disaster Tweets" 任务类型:图像监督学习。 建议流程:先检查类别分布与脏样本,再用迁移学习(如 ResNet/EfficientNet)建立基线。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:test_data_cleaning.csv, test_data_cleaning2.csv, train_data_cleaning.csv 等 4 个文件。 Context The data obtained by clearing the Getting Started Prediction Competition "Real or Not? NLP with Disaster Tweets" data is the result of a public notebook "NLP with Disaster Tweets - EDA and Cleaning data". In the future, I plan to improve cleaning and update the dataset Content id - a unique identifier for each tweet text - the text of the tweet location - the location the tweet was sent from (may be blank) keyword - a particular keyword from the tweet (may be blank) target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0) Acknowledgements

带有灾难推文的 NLP - 清理数据

摘要概览

该数据集《NLP with Disaster Tweets - cleaning data》主要用于监督学习任务,数据形态以图像为主,应用场景偏向文本内容分析。 题目说明:Cleaning dataset for Kaggle Competition "Real or Not? NLP with Disaster Tweets"

任务类型:图像监督学习。

建议流程:先检查类别分布与脏样本,再用迁移学习(如 ResNet/EfficientNet)建立基线。

评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。

可用文件:test_data_cleaning.csv, test_data_cleaning2.csv, train_data_cleaning.csv 等 4 个文件。

Context

The data obtained by clearing the Getting Started Prediction Competition "Real or Not? NLP with Disaster Tweets" data is the result of a public notebook "NLP with Disaster Tweets - EDA and Cleaning data". In the future, I plan to improve cleaning and update the dataset

Content

id - a unique identifier for each tweet text - the text of the tweet location - the location the tweet was sent from (may be blank) keyword - a particular keyword from the tweet (may be blank) target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)

Acknowledgements