地球资源数据云——数据资源详情

美国成人收入

发布时间:2026-03-17 14:30:08资源ID:2033785774158548994资源类型:免费

该数据集《US Adult Income》主要用于二分类任务,数据形态以表格为主,应用场景偏向文本内容分析。 题目说明:Data set of adult income 任务类型:表格二分类。 建议流程:先做缺失值/异常值处理与特征编码,再比较逻辑回归、随机森林、XGBoost。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:adult - test.csv, adult - training.csv。 US Adult Census data relating income to social factors such as Age, Education, race etc. The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. Each row is labelled as either having a salary greater than ">50K" or "<=50K". This Data set is split into two CSV files, named adult - training.txt and adult - test.txt. The goal here is to train a binary classifier on the training dataset to predict the column income_bracket which has two possible values ">50K" and "<=50K" and evaluate the accuracy of the classifier with the test dataset. Note that the dataset is made up of categorical and continuous features. It also contains missing values The categorical columns are: workclass, education, marital_status, occupation, relationship, race, gender, native_country

美国成人收入

摘要概览

该数据集《US Adult Income》主要用于二分类任务,数据形态以表格为主,应用场景偏向文本内容分析。 题目说明:Data set of adult income

任务类型:表格二分类。

建议流程:先做缺失值/异常值处理与特征编码,再比较逻辑回归、随机森林、XGBoost。

评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。

可用文件:adult - test.csv, adult - training.csv。

US Adult Census data relating income to social factors such as Age, Education, race etc.

The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. Each row is labelled as either having a salary greater than ">50K" or "<=50K".

This Data set is split into two CSV files, named adult - training.txt and adult - test.txt.

The goal here is to train a binary classifier on the training dataset to predict the column income_bracket which has two possible values ">50K" and "<=50K" and evaluate the accuracy of the classifier with the test dataset.

Note that the dataset is made up of categorical and continuous features. It also contains missing values The categorical columns are: workclass, education, marital_status, occupation, relationship, race, gender, native_country