地球资源数据云——数据资源详情
该数据集《Dirty data to clean What's wrong with this dataset》主要用于回归/预测任务,数据形态以表格为主。 题目说明:Animal data for data cleaning, visualization and geospatial analysis 任务类型:表格回归/预测。 建议流程:先做缺失值/异常值处理与特征编码,再比较逻辑回归、随机森林、XGBoost。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:animal_data_dirty1.csv, animal_data_reworked.csv。 This dataset contains ~1000 lines with data about animals spotted in Central/Eastern in 2024 (animal types, country, geolocation - latitude/longitude, gender, estimated height and body length. The data was artificially - generated. The primary purpose of this dataset is data - cleaning; it can be used also for data visualization and geospatial analysis (e.g. with folium). This dataset has multiple issues, including: duplicates, missing data,

该数据集《Dirty data to clean What's wrong with this dataset》主要用于回归/预测任务,数据形态以表格为主。 题目说明:Animal data for data cleaning, visualization and geospatial analysis
任务类型:表格回归/预测。
建议流程:先做缺失值/异常值处理与特征编码,再比较逻辑回归、随机森林、XGBoost。
评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。
可用文件:animal_data_dirty1.csv, animal_data_reworked.csv。
This dataset contains ~1000 lines with data about animals spotted in Central/Eastern in 2024 (animal types, country, geolocation - latitude/longitude, gender, estimated height and body length.
The data was artificially - generated.
The primary purpose of this dataset is data - cleaning; it can be used also for data visualization and geospatial analysis (e.g. with folium). This dataset has multiple issues, including:
duplicates,
missing data,