< Back to previous page

Publication

A flexible feature selection approach for predicting students’ academic performance in online courses

Journal Contribution - Journal Article

Educators' loss of ability to read students' comprehension level during the class through quick questions or nonverbal communication is one of the main challenges of online and blended learning. Many researchers recently tackled this problem by proposing different frameworks for predicting students' academic performance. However, previous work relies heavily on feature engineering. Feature engineering is the process of selecting, transforming, manipulating, and constructing new variables from raw data using domain knowledge. A disadvantage of feature engineering is that the features are tailored to a specific dataset making the constructed models inflexible when used in new datasets. A direct consequence is that features need to be rebuilt for each course. This paper proposes a more flexible framework to predict the students' academic performance. In this framework, the raw data is used directly to construct the prediction model without the feature engineering step. The feature selection is instead based on model interpretability. The framework is applied to the open university learning analytics dataset (OULAD) with two different type of classifiers: random forest and artificial neural networks. Obtained results show that the feature engineering step can be abandoned without affecting the models' prediction performance. The prediction results of the flexible feature selection framework either outperform or have a difference of less than 1% accuracy compared to other work in the literature that relies on a manual feature engineering step. Both random forest and artificial neural networks without feature engineering accomplish a high prediction accuracy for the case of students at risk of failing with 86% and 88% compared to all students with pass grades and students with distinction grades, respectively. Also, the prediction models have the highest accuracy rate of 93% in predicting drop-out students. Yet, the prediction models in the proposed framework and previous research work perform poorly in predicting high achieving students with maximum accuracy of 81%, a precision of 69%, and a recall of 57%.
Journal: Computers and Education. Artificial Intelligence
ISSN: 2666-920X
Volume: 3
Publication year:2022
Keywords:Learning analytics, Feature selection, Random forest classifier, Deep learning, virtual learning environment
Accessibility:Open