地球资源数据云——数据资源详情

用户与机器人分类

发布时间:2026-03-17 14:32:28资源ID:2031260524841701378资源类型:免费

该数据集《Users vs bots classification》主要用于二分类任务,数据形态以图像为主,应用场景偏向文本内容分析。 题目说明:Vkontakte (vk.com) fake and real users classification dataset 任务类型:图像二分类。 建议流程:先检查类别分布与脏样本,再用迁移学习(如 ResNet/EfficientNet)建立基线。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:bots_vs_users.csv。 This dataset contains profile data collected from VK.com (VKontakte), Russia's largest social network, for distinguishing between genuine users and automated bots. The data includes both numerical and categorical features extracted from user profiles. Data Collection: Collected from public VK.com profiles. Includes both verified human users and verified bot accounts. Represents realistic social network conditions with incomplete profiles. Feature Types: Numerical Features (NaN values preserved): Activity metrics (average posts per week, hashtag usage, etc.) Friend/follower counts. Categorical Features (missing values marked as 'unknown'): Profile attributes (has_photo, has_mobile, etc.) Privacy settings (is_closed_profile, etc) Binary flags (can_post, can_message, etc) Data Processing: Missing values handled differently by feature type: Categorical: Filled with 'unknown' string Numerical: Preserved as NaN Boolean values converted to binary (0/1) where applicable Potential Use Cases: Binary classification (user vs bot detection) Social media behavior analysis Anomaly detection in social networks Feature engineering exercises Dataset Size: 5874 rows × 60 features (balanced 50/50))

用户与机器人分类

摘要概览

该数据集《Users vs bots classification》主要用于二分类任务,数据形态以图像为主,应用场景偏向文本内容分析。 题目说明:Vkontakte (vk.com) fake and real users classification dataset

任务类型:图像二分类。

建议流程:先检查类别分布与脏样本,再用迁移学习(如 ResNet/EfficientNet)建立基线。

评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。

可用文件:bots_vs_users.csv。

This dataset contains profile data collected from VK.com (VKontakte), Russia's largest social network, for distinguishing between genuine users and automated bots. The data includes both numerical and categorical features extracted from user profiles.

Data Collection: Collected from public VK.com profiles. Includes both verified human users and verified bot accounts. Represents realistic social network conditions with incomplete profiles.

Feature Types: Numerical Features (NaN values preserved): Activity metrics (average posts per week, hashtag usage, etc.) Friend/follower counts. Categorical Features (missing values marked as 'unknown'): Profile attributes (has_photo, has_mobile, etc.) Privacy settings (is_closed_profile, etc) Binary flags (can_post, can_message, etc)

Data Processing: Missing values handled differently by feature type: Categorical: Filled with 'unknown' string Numerical: Preserved as NaN Boolean values converted to binary (0/1) where applicable

Potential Use Cases: Binary classification (user vs bot detection) Social media behavior analysis Anomaly detection in social networks Feature engineering exercises Dataset Size: 5874 rows × 60 features (balanced 50/50))