Hybrid Soft Voting Ensemble of XGBoost and DNN for At-Risk Student Performance Prediction
- 1 Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Malaysia
- 2 Faculty of Engineering, Computing and Science, Swinburne University of Technology Sarawak Campus, 93350 Kuching, Malaysia
Abstract
Early identification of at-risk students in higher education is important for timely academic intervention, yet conventional prediction methods often struggle with data imbalance and limited model precision. This study proposes a hybrid soft voting ensemble model that integrates Extreme Gradient Boosting (XGBoost) and Deep Neural Network (DNN) to enhance multi-class student grade prediction (A-F classification) and at-risk student identification. This proposed approach is evaluated using two datasets: a publicly available Kaggle Student Performance Dataset and a real-world dataset collected from a Database Concept and Design course at Universiti Malaysia Sarawak (UNIMAS). Both datasets undergo comprehensive pre-processing, including class imbalance handling using SMOTE and feature normalization using StandardScaler. Comparative evaluations were conducted against baseline models, including KNN, SVM, XGBoost and DNN, with all models optimised via hyperparameter tuning. Experimental results demonstrate that the proposed hybrid ensemble model outperforms the baseline models, achieving an accuracy of 77.37% and a macro F1-score of 74.50% on Dataset 1, and an accuracy of 74.13% with a macro F1-score of 81.53% on Dataset 2. The ensemble specifically demonstrates better sensitivity in detecting minority "at-risk" categories (Grades F and D). This study highlights the effectiveness of hybrid ensemble learning in improving predictive performance and supporting data-driven educational decision-making for early intervention in higher education.
DOI: https://doi.org/10.3844/jcssp.2026.1620.1635
Copyright: © 2026 Eugene Wan, Po Chan Chiu, Mohammad bin Hossin, Hamizan Sharbini, King Kuok Kuok, Noor Hazlini Borhan and Chih How Bong. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 61 Views
- 19 Downloads
- 0 Citations
Download
Keywords
- At-Risk Student Performance Prediction
- Machine Learning
- Predictive Analytics
- Hybrid Soft Voting Ensemble