地球资源数据云——数据资源详情

带有灾难推文的 NLP - 清理数据

Name: 带有灾难推文的 NLP - 清理数据
Published: 2026-03-17 14:31:03

发布时间：2026-03-17 14:31:03资源ID：2032003092038717441资源类型：免费

该数据集《NLP with Disaster Tweets - cleaning data》主要用于监督学习任务，数据形态以图像为主，应用场景偏向文本内容分析。题目说明：Cleaning dataset for Kaggle Competition "Real or Not? NLP with Disaster Tweets" 任务类型：图像监督学习。建议流程：先检查类别分布与脏样本，再用迁移学习（如 ResNet/EfficientNet）建立基线。评估建议：使用分层切分或交叉验证，优先关注 F1、Recall、AUC 等分类指标。可用文件：test_data_cleaning.csv, test_data_cleaning2.csv, train_data_cleaning.csv 等 4 个文件。 Context The data obtained by clearing the Getting Started Prediction Competition "Real or Not? NLP with Disaster Tweets" data is the result of a public notebook "NLP with Disaster Tweets - EDA and Cleaning data". In the future, I plan to improve cleaning and update the dataset Content id - a unique identifier for each tweet text - the text of the tweet location - the location the tweet was sent from (may be blank) keyword - a particular keyword from the tweet (may be blank) target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0) Acknowledgements

摘要概览

该数据集《NLP with Disaster Tweets - cleaning data》主要用于监督学习任务，数据形态以图像为主，应用场景偏向文本内容分析。题目说明：Cleaning dataset for Kaggle Competition "Real or Not? NLP with Disaster Tweets"

任务类型：图像监督学习。

建议流程：先检查类别分布与脏样本，再用迁移学习（如 ResNet/EfficientNet）建立基线。

评估建议：使用分层切分或交叉验证，优先关注 F1、Recall、AUC 等分类指标。

可用文件：test_data_cleaning.csv, test_data_cleaning2.csv, train_data_cleaning.csv 等 4 个文件。

Context

The data obtained by clearing the Getting Started Prediction Competition "Real or Not? NLP with Disaster Tweets" data is the result of a public notebook "NLP with Disaster Tweets - EDA and Cleaning data". In the future, I plan to improve cleaning and update the dataset

Content

id - a unique identifier for each tweet text - the text of the tweet location - the location the tweet was sent from (may be blank) keyword - a particular keyword from the tweet (may be blank) target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)

Acknowledgements

常见问题

带有灾难推文的 NLP - 清理数据是什么？

该数据集《NLP with Disaster Tweets - cleaning data》主要用于监督学习任务，数据形态以图像为主，应用场景偏向文本内容分析。

带有灾难推文的 NLP - 清理数据是什么数据格式？坐标系是什么？

数据格式为 CSV。

如何获取并引用带有灾难推文的 NLP - 清理数据？

在本页登录后即可下载。建议引用格式：地球资源数据云. 带有灾难推文的 NLP - 清理数据. https://www.gis5g.com/dataset/2032003092038717441