地球资源数据云——数据资源详情

罗马乌尔都语停用词

发布时间:2026-03-17 14:30:43资源ID:2032009216469143554资源类型:免费

该数据集《Roman Urdu StopWords》主要用于多分类任务,数据形态以文本为主。 题目说明:Urdu NLP sentimental analysis 任务类型:文本多分类。 建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。 评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。 可用文件:romanEng2.csv, RomanUrdu_stopwords.csv。 This csv consist of a single column containing over 200 stop words in Roman Urdu Language. These stop words are designed mainly for sentimental analysis task but can be used on any NLP related task. These stop words might not be sufficient as Urdu is a context rich language so i will try my best to further add new stop words in the new versions. I also attached a notebook in which i have shown how you can use these stop words via stammer library in python. You can use these stop words in various NLP tasks regarding Roman Urdu language as Roman Urdu is widely used in communication as compared to the original Urdu

罗马乌尔都语停用词

摘要概览

该数据集《Roman Urdu StopWords》主要用于多分类任务,数据形态以文本为主。 题目说明:Urdu NLP sentimental analysis

任务类型:文本多分类。

建议流程:先做文本清洗与分词,再比较 TF - IDF+线性模型 与 预训练语言模型。

评估建议:使用分层切分或交叉验证,优先关注 F1、Recall、AUC 等分类指标。

可用文件:romanEng2.csv, RomanUrdu_stopwords.csv。

This csv consist of a single column containing over 200 stop words in Roman Urdu Language. These stop words are designed mainly for sentimental analysis task but can be used on any NLP related task. These stop words might not be sufficient as Urdu is a context rich language so i will try my best to further add new stop words in the new versions.

I also attached a notebook in which i have shown how you can use these stop words via stammer library in python. You can use these stop words in various NLP tasks regarding Roman Urdu language as Roman Urdu is widely used in communication as compared to the original Urdu