Diabetes Prediction Using Traditional Machine Learning Techniques

Authors

  • Ayobami ADEMOROTI Yaba College of Technology, Lagos.
  • Oluwadarasimi O. OLOWE Lead City University, Ibadan, Oyo State, Nigeria
  • John I. AFE Lead City University, Ibadan, Oyo State, Nigeria
  • Oluwaseyi F. AFE Lead City University, Ibadan, Oyo State, Nigeria
  • Akintayo M. AYOADE Lead City University, Ibadan, Oyo State, Nigeria

Keywords:

Logistics Regression, Stratified Sampling, Pima Indians Diabetes Dataset, Diabetes Prediction.

Abstract

This study examines the capability of traditional machine learning (ML) algorithms to predict
the onset of diabetes using the Pima Indians diabetes dataset. It employed decision trees, naive
bayes, k-Nearest Neighbors (kNN), and logistic regression classifiers were evaluated using the
performance metrics of accuracy, precision, recall, F1 score and ROC AUC. The data was preprocessed
to amend implausible values and stratified sampling was performed to facilitate
balancing classes when splitting the data. The naive bayes algorithm achieves the best accuracy
(72.7%) while logistic regression obtains the best class separability (ROC AUC of 0.813). The
project shows that interpretable models can provide actionable insights for early identification,
supporting Sustainable Development Goal 3 (Good Health and Well-Being), particularly by
promoting preventive healthcare and informed decision-making in resource-constrained
environments.

Downloads

Published

2025-08-05