Improving Diabetes Risk Prediction Using Ensemble Boosting and SMOTE-Based Class Balancing
- 1 Department of Technology and Business Information System Unit, MSU Research Laboratory of Blockchain and Artificial Intelligence for Interdisciplinary Innovation, Mahasarakham Business School, Mahasarakham University, Mahasarakham, Thailand
- 2 Department of Engineering Management, Suan Sunandha Rajabhat University, 1 U-Thong nok Road, Dusit, Bangkok 10300, Thailand
Abstract
Accurate diabetes prediction is vital for early intervention, optimized resource allocation, and minimizing long-term complications. This study presents a comparative evaluation of traditional and advanced machine learning models for diabetes classification using a structured clinical dataset. Seven baseline algorithms were assessed against five advanced ensemble methods: CatBoost, LightGBM, XGBoost, Voting Ensemble, and Stacking Ensemble. To improve algorithm learning, the Synthetic Minority Over-sampling Technique (SMOTE) and feature normalization were employed. The algorithm’s effectiveness was carefully evaluated using accuracy, precision, recall, and the F1 score. Results show that advanced models substantially outperformed traditional ones, with CatBoost achieving the highest F1 score of 0.7625. Feature importance analysis identified glucose, BMI, and age as the most influential indicators, consistent with clinical evidence. These findings demonstrate the potential of ensemble learning and boosting strategies for building interpretable, scalable, and effective diagnostic support tools in healthcare settings.
DOI: https://doi.org/10.3844/jcssp.2026.61.74
Copyright: © 2026 Kittipol Wisaeng, Pankom Sriboonlue and Benchalak Muangmeesri. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 11 Views
- 2 Downloads
- 0 Citations
Download
Keywords
- Diabetes Prediction
- Ensemble Learning
- Voting Classifier
- SMOTE