地球资源数据云——数据资源详情

垃圾邮件分类数据集 CSV

Name: 垃圾邮件分类数据集 CSV
Published: 2026-03-17 15:33:32

发布时间：2026-03-17 15:33:32资源ID：2033808913286467585资源类型：免费

该数据集《Email Spam Classification Dataset CSV》主要用于二分类任务，数据形态以文本为主，应用场景偏向文本内容分析。题目说明：CSV file containing spam/not spam information about 5172 emails. 任务类型：文本二分类。建议流程：先做文本清洗与分词，再比较 TF - IDF+线性模型与预训练语言模型。评估建议：使用分层切分或交叉验证，优先关注 F1、Recall、AUC 等分类指标。可用文件：emails.csv。 Introduction This is a csv file containing related information of 5172 randomly picked email files and their respective labels for spam or not - spam classification. About the Dataset The csv file contains 5172 rows, each row for each email. There are 3002 columns. The first column indicates Email name. The name has been set with numbers and not recipients' name to protect privacy. The last column has the labels for prediction : 1 for spam, 0 for not spam. The remaining 3000 columns are the 3000 most common words in all the emails, after excluding the non - alphabetical characters/words. For each row, the count of each word(column) in that email(row) is stored in the respective cells. Thus, information regarding all 5172 emails are stored in a compact dataframe rather than as separate text files.

摘要概览

该数据集《Email Spam Classification Dataset CSV》主要用于二分类任务，数据形态以文本为主，应用场景偏向文本内容分析。题目说明：CSV file containing spam/not spam information about 5172 emails.

任务类型：文本二分类。

建议流程：先做文本清洗与分词，再比较 TF - IDF+线性模型与预训练语言模型。

评估建议：使用分层切分或交叉验证，优先关注 F1、Recall、AUC 等分类指标。

可用文件：emails.csv。

Introduction

This is a csv file containing related information of 5172 randomly picked email files and their respective labels for spam or not - spam classification.

About the Dataset

The csv file contains 5172 rows, each row for each email. There are 3002 columns. The first column indicates Email name. The name has been set with numbers and not recipients' name to protect privacy. The last column has the labels for prediction : 1 for spam, 0 for not spam.

The remaining 3000 columns are the 3000 most common words in all the emails, after excluding the non - alphabetical characters/words. For each row, the count of each word(column) in that email(row) is stored in the respective cells. Thus, information regarding all 5172 emails are stored in a compact dataframe rather than as separate text files.

常见问题

垃圾邮件分类数据集 CSV是什么？

该数据集《Email Spam Classification Dataset CSV》主要用于二分类任务，数据形态以文本为主，应用场景偏向文本内容分析。

垃圾邮件分类数据集 CSV是什么数据格式？坐标系是什么？

数据格式为 CSV。

如何获取并引用垃圾邮件分类数据集 CSV？

在本页登录后即可下载。建议引用格式：地球资源数据云. 垃圾邮件分类数据集 CSV. https://www.gis5g.com/dataset/2033808913286467585