地球资源数据云——数据资源详情
该数据集《AqSolDB: A curated aqueous solubility dataset》主要用于监督学习任务,数据形态以文本为主,应用场景偏向环保分类。 题目说明:Aqueous solubility and 2D descriptors for a diverse set of compounds 任务类型:文本监督学习。 建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:curated - solubility - dataset.csv。 Context AqSolDB is created by the Autonomous Energy Materials Discovery [AMD] research group, consists of aqueous solubility values of 9,982 unique compounds curated from 9 different publicly available aqueous solubility datasets. This openly accessible dataset, which is the largest of its kind, and will not only serve as a useful reference source of measured solubility data, but also as a much improved and generalizable training data source for building data - driven models. Content In addition to curated experimental solubility values, AqSolDB also contains some relevant topological and physico - chemical 2D descriptors calculated by RDKit. Additionally, AqSolDB contains validated molecular representations of each of the compounds.

该数据集《AqSolDB: A curated aqueous solubility dataset》主要用于监督学习任务,数据形态以文本为主,应用场景偏向环保分类。 题目说明:Aqueous solubility and 2D descriptors for a diverse set of compounds
任务类型:文本监督学习。
建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。
评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。
可用文件:curated - solubility - dataset.csv。
Context
AqSolDB is created by the Autonomous Energy Materials Discovery [AMD] research group, consists of aqueous solubility values of 9,982 unique compounds curated from 9 different publicly available aqueous solubility datasets.
This openly accessible dataset, which is the largest of its kind, and will not only serve as a useful reference source of measured solubility data, but also as a much improved and generalizable training data source for building data - driven models.
Content
In addition to curated experimental solubility values, AqSolDB also contains some relevant topological and physico - chemical 2D descriptors calculated by RDKit. Additionally, AqSolDB contains validated molecular representations of each of the compounds.