地球资源数据云——数据资源详情

Flipkart 产品评论与情绪数据集

发布时间:2026-03-17 15:41:20资源ID:2033810878234005505资源类型:免费

该数据集《Flipkart Product reviews with sentiment Dataset》主要用于二分类任务,数据形态以文本为主,应用场景偏向文本内容分析。 题目说明:Flipkart Product Customer reviews dataset for sentiment analysis 任务类型:文本二分类。 建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:Dataset - SA.csv。 This dataset contains information about Product name, Product price, Rate, Reviews, Summary and Sentiment in csv format. There are 104 different types of products of flipkart.com such as electronics items, clothing of men, women and kids, Home decor items, Automated systems, so on. It has 205053 rows and 6 columns. Also, if any product doesn't have any review but summary is present then Nan value already added to its blank space. This dataset has multiclass label as sentiment such as positive, neutral amd negative.The sentiment given was based on column called Summary using NLP and Vader model. Also, after that we manually check the label and put it into the appropriate categories like if summary has text like okay, just ok or one positive and negative we labeled as neutral for better understanding while using this dataset for human languages. On the summary and price column, data cleaning method is already performed using python module called NumPy and Pandas which are famous.You can learn it also through any online resource.

Flipkart 产品评论与情绪数据集

摘要概览

该数据集《Flipkart Product reviews with sentiment Dataset》主要用于二分类任务,数据形态以文本为主,应用场景偏向文本内容分析。 题目说明:Flipkart Product Customer reviews dataset for sentiment analysis

任务类型:文本二分类。

建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。

评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。

可用文件:Dataset - SA.csv。

This dataset contains information about Product name, Product price, Rate, Reviews, Summary and Sentiment in csv format. There are 104 different types of products of flipkart.com such as electronics items, clothing of men, women and kids, Home decor items, Automated systems, so on. It has 205053 rows and 6 columns.

Also, if any product doesn't have any review but summary is present then Nan value already added to its blank space.

This dataset has multiclass label as sentiment such as positive, neutral amd negative.The sentiment given was based on column called Summary using NLP and Vader model.

Also, after that we manually check the label and put it into the appropriate categories like if summary has text like okay, just ok or one positive and negative we labeled as neutral for better understanding while using this dataset for human languages.

On the summary and price column, data cleaning method is already performed using python module called NumPy and Pandas which are famous.You can learn it also through any online resource.