Prediksi Diabetes Menggunakan Algoritma K-Nearest (KNN) Teknik SMOTE-ENN

Authors

  • Zaenul Amri Universitas Hamzanwadi
  • Muhammad Rodi STMIK Lombok
  • M. Nurul Wathani Universitas Hamzanwadi
  • Amir Bagja Universitas Hamzanwadi
  • Zulkipli Universitas Hamzanwadi

DOI:

https://doi.org/10.29408/jit.v8i1.27975

Keywords:

KNN, Prediction, SMOTE-ENN

Abstract

Nowadays, diabetes is a common disease affecting millions of people worldwide, and it is generally more prevalent among women. Recent health research has adopted various innovative and advanced technologies to diagnose individuals and predict diseases based on clinical data. One such technology is Machine Learning (ML), which enables more accurate diagnosis and prediction. The data used in this study is the Pima Indian women diabetes dataset from Kaggle and the UCI data repository. This study focuses on predicting diabetes using the KNN algorithm model by applying optimization to the dataset using the SMOTE-ENN technique to enhance prediction accuracy for Pima Indian women. The dataset was trained and tested with five different splits using Jupyter Notebook to determine the best accuracy for the KNN algorithm model. Parameters such as classification accuracy, classification error, and the ROC curve were evaluated, along with identifying the variables influencing the risk of diabetes. The results showed that applying SMOTE-ENN optimization to the research dataset significantly improved the prediction accuracy using the KNN algorithm model. With a 70% training and 30% testing data split, the model achieved a classification accuracy of 0.96, a classification error of 0.04, and an AUC of 0.95. These predictions indicated that Pima Indian women are more likely to develop diabetes due to factors such as age above 33 years, the number of pregnancies, excessive sugar consumption, blood pressure, skin thickness, insulin levels, BMI (Body Mass Index), and genetic predisposition to diabetes

Author Biography

Zaenul Amri, Universitas Hamzanwadi

Program Studi Pendidikan Informatika, Fakultas Keguruan dan Ilmu Pendidikan, Universitas Hamzanwadi

References

R. Zolfaghari, “Diagnosis Of Diabetes In Female Population Of Pima Indian Heritage With Ensemble Of Bp Neural Network And Svm,” Int. J. Comput. Eng. Manag, Vol. 15, No. 4, Pp. 2230–7893, 2012.

H. N. A. Pham And E. Triantaphyllou, “Prediction Of Diabetes By Employing A New Data Mining Approach Which Balances Fitting And Generalization,” In Computer And Information Science, Springer, 2008, Pp. 11–26.

J. Wu, Y.-B. Diao, M.-L. Li, Y.-P. Fang, And D.-C. Ma, “A Semi-Supervised Learning Based Method: Laplacian Support Vector Machine Used In Diabetes Disease Diagnosis,” Interdiscip. Sci. Comput. Life Sci., Vol. 1, Pp. 151–155, 2009.

S. K. Dey, A. Hossain, And M. M. Rahman, “Implementation Of A Web Application To Predict Diabetes Disease: An Approach Using Machine Learning Algorithm,” In 2018 21st International Conference Of Computer And Information Technology (Iccit), 2018, Pp. 1–5.

S. Nithya, M. Sangeetha, K. N. A. Prethi, K. S. Sahoo, S. K. Panda, And A. H. Gandomi, “Sdcf: A Software-Defined Cyber Foraging Framework For Cloudlet Environment,” Ieee Trans. Netw. Serv. Manag., Vol. 17, No. 4, Pp. 2423–2435, 2020.

Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, And H. Tang, “Predicting Diabetes Mellitus With Machine Learning Techniques,” Front. Genet., Vol. 9, P. 515, 2018.

S. Srivastava, L. Sharma, V. Sharma, A. Kumar, And H. Darbari, “Prediction Of Diabetes Using Artificial Neural Network Approach,” In Engineering Vibration, Communication And Information Processing: Icoevci 2018, India, 2019, Pp. 679–687.

V. Karthikeyani And I. P. Begum, “Comparison A Performance Of Data Mining Algorithms (Cpdma) In Prediction Of Diabetes Disease,” Int. J. Comput. Sci. Eng., Vol. 5, No. 3, P. 205, 2013.

D. Martens, J. Huysmans, R. Setiono, J. Vanthienen, And B. Baesens, “Rule Extraction From Support Vector Machines: An Overview Of Issues And Application In Credit Scoring,” Rule Extr. From Support Vector Mach., Pp. 33–63, 2008.

Y. Guo, G. Bai, And Y. Hu, “Using Bayes Network For Prediction Of Type-2 Diabetes,” In 2012 International Conference For Internet Technology And Secured Transactions, 2012, Pp. 471–472.

L. O. Schulz Et Al., “Effects Of Traditional And Western Environments On Prevalence Of Type 2 Diabetes In Pima Indians In Mexico And The Us,” Diabetes Care, Vol. 29, No. 8, Pp. 1866–1871, 2006.

M. Maniruzzaman, M. J. Rahman, B. Ahammed, And M. M. Abedin, “Classification And Prediction Of Diabetes Disease Using Machine Learning Paradigm,” Heal. Inf. Sci. Syst., Vol. 8, Pp. 1–14, 2020.

J. Han, J. C. Rodriguez, And M. Beheshti, “Diabetes Data Analysis And Prediction Model Discovery Using Rapidminer,” In 2008 Second International Conference On Future Generation Communication And Networking, 2008, Vol. 3, Pp. 96–99.

S. A. Saji And K. Balachandran, “Performance Analysis Of Training Algorithms Of Multilayer Perceptrons In Diabetes Prediction,” In 2015 International Conference On Advances In Computer Engineering And Applications, 2015, Pp. 201–206.

M. Qusyairi, “Analisi Prediksi Tingkat Kesejahteraan Masyarakat Nelayan Lombok Timur Dengan Algoritma Naïve Bayes,” Infotek J. Inform. Dan Teknol., Vol. 7, No. 2, Pp. 563–574, 2024.

Suyanto, Data Mining Untuk Klasifikasi Dan Klasterisasi Data. Bandung: Bandung: Informatika, 2017.

M. Saiful, H. Bahtiar, And M. T. Hidayat, “Penerapan Algoritma K-Means Clustering Dalam Mengelompokkan Smartphone Yang Rekomendasi Berdasarkan Spesifikasi,” Infotek J. Inform. Dan Teknol., Vol. 7, No. 2, Pp. 478–488, 2024.

Z. Amri, K. Kusrini, And K. Kusnawi, “Prediksi Tingkat Kelulusan Mahasiswa Menggunakan Algoritma Naïve Bayes, Decision Tree, Ann, Knn, Dan Svm,” Edumatic J. Pendidik. Inform., Vol. 7, No. 2, Pp. 187–196, 2023.

S. K. Bhoi, “Prediction Of Diabetes In Females Of Pima Indian Heritage: A Complete Supervised Learning Approach,” Turkish J. Comput. Math. Educ., Vol. 12, No. 10, Pp. 3074–3084, 2021.

A. Perdana, A. Hermawan, And D. Avianto, “Analyze Important Features Of Pima Indian Database For Diabetes Prediction Using Knn,” J. Sisfokom (Sistem Inf. Dan Komputer), Vol. 12, No. 1, Pp. 70–75, 2023.

M. Benarbia, “A Machine Learning Approach To Predicting The Onset Of Type Ii Diabetes In A Sample Of Pima Indian Women,” 2022.

M. Abedini, A. Bijari, And T. Banirostam, “Classification Of Pima Indian Diabetes Dataset Using Ensemble Of Decision Tree, Logistic Regression And Neural Network,” Int. J. Adv. Res. Comput. Commun. Eng, Vol. 9, No. 7, Pp. 7–10, 2020.

V. Chang, J. Bailey, Q. A. Xu, And Z. Sun, “Pima Indians Diabetes Mellitus Classification Based On Machine Learning (Ml) Algorithms,” Neural Comput. Appl., Vol. 35, No. 22, Pp. 16157–16173, 2023.

U. C. I. M. Learning, “Pima Indians Diabetes Database,” Kaggle. Com/Uciml/Pima-Indians-Diabetes-Database, 2016.

D. Nofriansyah And G. W. Nurcahyo, Algoritma Data Mining Dan Pengujian. Deepublish, 2015

Downloads

Published

20-01-2025

How to Cite

Amri, Z., Muhammad Rodi, M. Nurul Wathani, Amir Bagja, & Zulkipli. (2025). Prediksi Diabetes Menggunakan Algoritma K-Nearest (KNN) Teknik SMOTE-ENN. Infotek: Jurnal Informatika Dan Teknologi, 8(1), 193–204. https://doi.org/10.29408/jit.v8i1.27975

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.