地球资源数据云——数据资源详情

学生表现(PIP)

发布时间:2026-03-17 14:31:30资源ID:2031995022998933505资源类型:免费

该数据集《Student performance (PIP)》主要用于多分类任务,数据形态以表格为主,应用场景偏向医疗健康。 题目说明:Student performance in higher education 任务类型:表格多分类。 建议流程:先做缺失值/异常值处理与特征编码,再比较逻辑回归、随机森林、XGBoost。 注意事项:疑似存在类别不均衡,建议使用分层抽样、类别权重与 F1/Recall 指标。 可用文件:Student performance (Polytechnic Institute of Portalegre).csv。 A dataset created at a higher education institution - 'Polytechnic Institute of Portalegre (Portugal)' (derived from several disparate databases) relating to students studying in various undergraduate degrees such as agronomy, design, education, nursing, journalism, management, social services and technology. The dataset includes information known at the time of student enrollment (academic path, demographics, and social - economic factors) and the students' academic performance at the end of the first and second semesters. The data is used to build classification models to predict students' dropout and academic sucess. The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes. For what purpose was the dataset created? The dataset was created in a project that aims to contribute to the reduction of academic dropout and failure in higher education, by using machine learning techniques to identify students at risk at an early stage of their academic path, so that strategies to support them can be put into place.

学生表现(PIP)

摘要概览

该数据集《Student performance (PIP)》主要用于多分类任务,数据形态以表格为主,应用场景偏向医疗健康。 题目说明:Student performance in higher education

任务类型:表格多分类。

建议流程:先做缺失值/异常值处理与特征编码,再比较逻辑回归、随机森林、XGBoost。

注意事项:疑似存在类别不均衡,建议使用分层抽样、类别权重与 F1/Recall 指标。

可用文件:Student performance (Polytechnic Institute of Portalegre).csv。

A dataset created at a higher education institution - 'Polytechnic Institute of Portalegre (Portugal)' (derived from several disparate databases) relating to students studying in various undergraduate degrees such as agronomy, design, education, nursing, journalism, management, social services and technology.

The dataset includes information known at the time of student enrollment (academic path, demographics, and social - economic factors) and the students' academic performance at the end of the first and second semesters. The data is used to build classification models to predict students' dropout and academic sucess.

The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes.

For what purpose was the dataset created?

The dataset was created in a project that aims to contribute to the reduction of academic dropout and failure in higher education, by using machine learning techniques to identify students at risk at an early stage of their academic path, so that strategies to support them can be put into place.