地球资源数据云——数据资源详情
该数据集《Mushroom Edibility Classification》主要用于二分类任务,数据形态以文本为主。 题目说明:Predicting mushroom edible OR poisonous 任务类型:文本二分类。 建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:secondary_data.csv。 Context This dataset includes 61069 hypothetical mushrooms with caps based on 173 species (353 mushrooms per species). Each mushroom is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended (the latter class was combined with the poisonous class). Of the 20 variables, 17 are nominal and 3 are metrical. Attribute Information: One binary class divided in edible=e and poisonous=p (with the latter one also containing mushrooms of unknown edibility). Twenty remaining variables (n: nominal, m: metrical) 1. cap - diameter (m): float number in cm

该数据集《Mushroom Edibility Classification》主要用于二分类任务,数据形态以文本为主。 题目说明:Predicting mushroom edible OR poisonous
任务类型:文本二分类。
建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。
评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。
可用文件:secondary_data.csv。
Context
This dataset includes 61069 hypothetical mushrooms with caps based on 173 species (353 mushrooms per species). Each mushroom is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended (the latter class was combined with the poisonous class). Of the 20 variables, 17 are nominal and 3 are metrical.
Attribute Information:
One binary class divided in edible=e and poisonous=p (with the latter one also containing mushrooms of unknown edibility). Twenty remaining variables (n: nominal, m: metrical)
1. cap - diameter (m): float number in cm