地球资源数据云——数据资源详情

基本 NLP 的垃圾邮件分类

发布时间:2026-03-17 15:38:31资源ID:2033810174387851265资源类型:免费

该数据集《Spam Classification for Basic NLP》主要用于二分类任务,数据形态以文本为主。 题目说明:This data is in raw format in order to do all pre - processing steps in NLP 任务类型:文本二分类。 建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:Spam Email raw text for NLP.csv。 This Data consist of raw mail messages which is suitable for the NLP pre - processing like Tokenizing, Removing Stop words, Stemming and Parsing HTML tags. All the above steps are very important for someone who enters into NLP world. The dataset also goes hand - in - hand with NLP libraries like Vectorizer etc.

基本 NLP 的垃圾邮件分类

摘要概览

该数据集《Spam Classification for Basic NLP》主要用于二分类任务,数据形态以文本为主。 题目说明:This data is in raw format in order to do all pre - processing steps in NLP

任务类型:文本二分类。

建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。

评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。

可用文件:Spam Email raw text for NLP.csv。

This Data consist of raw mail messages which is suitable for the NLP pre - processing like Tokenizing, Removing Stop words, Stemming and Parsing HTML tags. All the above steps are very important for someone who enters into NLP world. The dataset also goes hand - in - hand with NLP libraries like Vectorizer etc.