地球资源数据云——数据资源详情
该数据集《NLP with Disaster Tweets - cleaning data》主要用于监督学习任务,数据形态以图像为主,应用场景偏向文本内容分析。 题目说明:Cleaning dataset for Kaggle Competition "Real or Not? NLP with Disaster Tweets" 任务类型:图像监督学习。 建议流程:先检查类别分布与脏样本,再用迁移学习(如 ResNet/EfficientNet)建立基线。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:test_data_cleaning.csv, test_data_cleaning2.csv, train_data_cleaning.csv 等 4 个文件。 Context The data obtained by clearing the Getting Started Prediction Competition "Real or Not? NLP with Disaster Tweets" data is the result of a public notebook "NLP with Disaster Tweets - EDA and Cleaning data". In the future, I plan to improve cleaning and update the dataset Content id - a unique identifier for each tweet text - the text of the tweet location - the location the tweet was sent from (may be blank) keyword - a particular keyword from the tweet (may be blank) target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0) Acknowledgements

该数据集《NLP with Disaster Tweets - cleaning data》主要用于监督学习任务,数据形态以图像为主,应用场景偏向文本内容分析。 题目说明:Cleaning dataset for Kaggle Competition "Real or Not? NLP with Disaster Tweets"
任务类型:图像监督学习。
建议流程:先检查类别分布与脏样本,再用迁移学习(如 ResNet/EfficientNet)建立基线。
评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。
可用文件:test_data_cleaning.csv, test_data_cleaning2.csv, train_data_cleaning.csv 等 4 个文件。
Context
The data obtained by clearing the Getting Started Prediction Competition "Real or Not? NLP with Disaster Tweets" data is the result of a public notebook "NLP with Disaster Tweets - EDA and Cleaning data". In the future, I plan to improve cleaning and update the dataset
Content
id - a unique identifier for each tweet text - the text of the tweet location - the location the tweet was sent from (may be blank) keyword - a particular keyword from the tweet (may be blank) target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)
Acknowledgements