地球资源数据云——数据资源详情

垃圾邮件 URL 分类数据集

Name: 垃圾邮件 URL 分类数据集
Published: 2026-03-17 15:35:27

发布时间：2026-03-17 15:35:27资源ID：2033809407392256001资源类型：免费

该数据集《Spam URLs Classification Dataset》主要用于二分类任务，数据形态以文本为主，应用场景偏向金融风控。题目说明：Classification of a URL if spam or not spam 任务类型：文本二分类。建议流程：先做文本清洗与分词，再比较 TF - IDF+线性模型与预训练语言模型。评估建议：使用分层切分或交叉验证，优先关注 F1、Recall、AUC 等分类指标。可用文件：url_spam_classification.csv。 URL - Spam or Not Spam - Classification Dataset This dataset contains about 87.5K URLs in which one - third are flagged as a spam URL and restrict are not spam. It can be used to create a binary classification model. Credits: The dataset was created by The Pudding. This dataset of every link is found in different newsletters. The flagging system identifies if a link is a spam or not, as it parses links from over 100 newsletters every 30 minutes. A link is programatically f flagged if it appears 3+ times in a single newsletter or contains a likely subscribe/unsubscribe URL. If you use this dataset, don't forget to cite the author.

摘要概览

该数据集《Spam URLs Classification Dataset》主要用于二分类任务，数据形态以文本为主，应用场景偏向金融风控。题目说明：Classification of a URL if spam or not spam

任务类型：文本二分类。

建议流程：先做文本清洗与分词，再比较 TF - IDF+线性模型与预训练语言模型。

评估建议：使用分层切分或交叉验证，优先关注 F1、Recall、AUC 等分类指标。

可用文件：url_spam_classification.csv。

URL - Spam or Not Spam - Classification Dataset

This dataset contains about 87.5K URLs in which one - third are flagged as a spam URL and restrict are not spam. It can be used to create a binary classification model.

Credits:

The dataset was created by The Pudding. This dataset of every link is found in different newsletters. The flagging system identifies if a link is a spam or not, as it parses links from over 100 newsletters every 30 minutes. A link is programatically f flagged if it appears 3+ times in a single newsletter or contains a likely subscribe/unsubscribe URL.

If you use this dataset, don't forget to cite the author.

常见问题

垃圾邮件 URL 分类数据集是什么？

该数据集《Spam URLs Classification Dataset》主要用于二分类任务，数据形态以文本为主，应用场景偏向金融风控。

垃圾邮件 URL 分类数据集是什么数据格式？坐标系是什么？

数据格式为 CSV。

如何获取并引用垃圾邮件 URL 分类数据集？

在本页登录后即可下载。建议引用格式：地球资源数据云. 垃圾邮件 URL 分类数据集. https://www.gis5g.com/dataset/2033809407392256001