地球资源数据云——数据资源详情

AqSolDB:精心策划的水溶性数据集

发布时间:2026-03-17 14:32:25资源ID:2031260891096715265资源类型:免费

该数据集《AqSolDB: A curated aqueous solubility dataset》主要用于监督学习任务,数据形态以文本为主,应用场景偏向环保分类。 题目说明:Aqueous solubility and 2D descriptors for a diverse set of compounds 任务类型:文本监督学习。 建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:curated - solubility - dataset.csv。 Context AqSolDB is created by the Autonomous Energy Materials Discovery [AMD] research group, consists of aqueous solubility values of 9,982 unique compounds curated from 9 different publicly available aqueous solubility datasets. This openly accessible dataset, which is the largest of its kind, and will not only serve as a useful reference source of measured solubility data, but also as a much improved and generalizable training data source for building data - driven models. Content In addition to curated experimental solubility values, AqSolDB also contains some relevant topological and physico - chemical 2D descriptors calculated by RDKit. Additionally, AqSolDB contains validated molecular representations of each of the compounds.

AqSolDB:精心策划的水溶性数据集

摘要概览

该数据集《AqSolDB: A curated aqueous solubility dataset》主要用于监督学习任务,数据形态以文本为主,应用场景偏向环保分类。 题目说明:Aqueous solubility and 2D descriptors for a diverse set of compounds

任务类型:文本监督学习。

建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。

评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。

可用文件:curated - solubility - dataset.csv。

Context

AqSolDB is created by the Autonomous Energy Materials Discovery [AMD] research group, consists of aqueous solubility values of 9,982 unique compounds curated from 9 different publicly available aqueous solubility datasets.

This openly accessible dataset, which is the largest of its kind, and will not only serve as a useful reference source of measured solubility data, but also as a much improved and generalizable training data source for building data - driven models.

Content

In addition to curated experimental solubility values, AqSolDB also contains some relevant topological and physico - chemical 2D descriptors calculated by RDKit. Additionally, AqSolDB contains validated molecular representations of each of the compounds.