Prediksi Diabetes Menggunakan Algoritma K-Nearest (KNN) Teknik SMOTE-ENN
DOI:
https://doi.org/10.29408/jit.v8i1.27975Keywords:
KNN, Prediction, SMOTE-ENNAbstract
Nowadays, diabetes is a common disease affecting millions of people worldwide, and it is generally more prevalent among women. Recent health research has adopted various innovative and advanced technologies to diagnose individuals and predict diseases based on clinical data. One such technology is Machine Learning (ML), which enables more accurate diagnosis and prediction. The data used in this study is the Pima Indian women diabetes dataset from Kaggle and the UCI data repository. This study focuses on predicting diabetes using the KNN algorithm model by applying optimization to the dataset using the SMOTE-ENN technique to enhance prediction accuracy for Pima Indian women. The dataset was trained and tested with five different splits using Jupyter Notebook to determine the best accuracy for the KNN algorithm model. Parameters such as classification accuracy, classification error, and the ROC curve were evaluated, along with identifying the variables influencing the risk of diabetes. The results showed that applying SMOTE-ENN optimization to the research dataset significantly improved the prediction accuracy using the KNN algorithm model. With a 70% training and 30% testing data split, the model achieved a classification accuracy of 0.96, a classification error of 0.04, and an AUC of 0.95. These predictions indicated that Pima Indian women are more likely to develop diabetes due to factors such as age above 33 years, the number of pregnancies, excessive sugar consumption, blood pressure, skin thickness, insulin levels, BMI (Body Mass Index), and genetic predisposition to diabetes
References
R. Zolfaghari, “Diagnosis Of Diabetes In Female Population Of Pima Indian Heritage With Ensemble Of Bp Neural Network And Svm,” Int. J. Comput. Eng. Manag, Vol. 15, No. 4, Pp. 2230–7893, 2012.
H. N. A. Pham And E. Triantaphyllou, “Prediction Of Diabetes By Employing A New Data Mining Approach Which Balances Fitting And Generalization,” In Computer And Information Science, Springer, 2008, Pp. 11–26.
J. Wu, Y.-B. Diao, M.-L. Li, Y.-P. Fang, And D.-C. Ma, “A Semi-Supervised Learning Based Method: Laplacian Support Vector Machine Used In Diabetes Disease Diagnosis,” Interdiscip. Sci. Comput. Life Sci., Vol. 1, Pp. 151–155, 2009.
S. K. Dey, A. Hossain, And M. M. Rahman, “Implementation Of A Web Application To Predict Diabetes Disease: An Approach Using Machine Learning Algorithm,” In 2018 21st International Conference Of Computer And Information Technology (Iccit), 2018, Pp. 1–5.
S. Nithya, M. Sangeetha, K. N. A. Prethi, K. S. Sahoo, S. K. Panda, And A. H. Gandomi, “Sdcf: A Software-Defined Cyber Foraging Framework For Cloudlet Environment,” Ieee Trans. Netw. Serv. Manag., Vol. 17, No. 4, Pp. 2423–2435, 2020.
Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, And H. Tang, “Predicting Diabetes Mellitus With Machine Learning Techniques,” Front. Genet., Vol. 9, P. 515, 2018.
S. Srivastava, L. Sharma, V. Sharma, A. Kumar, And H. Darbari, “Prediction Of Diabetes Using Artificial Neural Network Approach,” In Engineering Vibration, Communication And Information Processing: Icoevci 2018, India, 2019, Pp. 679–687.
V. Karthikeyani And I. P. Begum, “Comparison A Performance Of Data Mining Algorithms (Cpdma) In Prediction Of Diabetes Disease,” Int. J. Comput. Sci. Eng., Vol. 5, No. 3, P. 205, 2013.
D. Martens, J. Huysmans, R. Setiono, J. Vanthienen, And B. Baesens, “Rule Extraction From Support Vector Machines: An Overview Of Issues And Application In Credit Scoring,” Rule Extr. From Support Vector Mach., Pp. 33–63, 2008.
Y. Guo, G. Bai, And Y. Hu, “Using Bayes Network For Prediction Of Type-2 Diabetes,” In 2012 International Conference For Internet Technology And Secured Transactions, 2012, Pp. 471–472.
L. O. Schulz Et Al., “Effects Of Traditional And Western Environments On Prevalence Of Type 2 Diabetes In Pima Indians In Mexico And The Us,” Diabetes Care, Vol. 29, No. 8, Pp. 1866–1871, 2006.
M. Maniruzzaman, M. J. Rahman, B. Ahammed, And M. M. Abedin, “Classification And Prediction Of Diabetes Disease Using Machine Learning Paradigm,” Heal. Inf. Sci. Syst., Vol. 8, Pp. 1–14, 2020.
J. Han, J. C. Rodriguez, And M. Beheshti, “Diabetes Data Analysis And Prediction Model Discovery Using Rapidminer,” In 2008 Second International Conference On Future Generation Communication And Networking, 2008, Vol. 3, Pp. 96–99.
S. A. Saji And K. Balachandran, “Performance Analysis Of Training Algorithms Of Multilayer Perceptrons In Diabetes Prediction,” In 2015 International Conference On Advances In Computer Engineering And Applications, 2015, Pp. 201–206.
M. Qusyairi, “Analisi Prediksi Tingkat Kesejahteraan Masyarakat Nelayan Lombok Timur Dengan Algoritma Naïve Bayes,” Infotek J. Inform. Dan Teknol., Vol. 7, No. 2, Pp. 563–574, 2024.
Suyanto, Data Mining Untuk Klasifikasi Dan Klasterisasi Data. Bandung: Bandung: Informatika, 2017.
M. Saiful, H. Bahtiar, And M. T. Hidayat, “Penerapan Algoritma K-Means Clustering Dalam Mengelompokkan Smartphone Yang Rekomendasi Berdasarkan Spesifikasi,” Infotek J. Inform. Dan Teknol., Vol. 7, No. 2, Pp. 478–488, 2024.
Z. Amri, K. Kusrini, And K. Kusnawi, “Prediksi Tingkat Kelulusan Mahasiswa Menggunakan Algoritma Naïve Bayes, Decision Tree, Ann, Knn, Dan Svm,” Edumatic J. Pendidik. Inform., Vol. 7, No. 2, Pp. 187–196, 2023.
S. K. Bhoi, “Prediction Of Diabetes In Females Of Pima Indian Heritage: A Complete Supervised Learning Approach,” Turkish J. Comput. Math. Educ., Vol. 12, No. 10, Pp. 3074–3084, 2021.
A. Perdana, A. Hermawan, And D. Avianto, “Analyze Important Features Of Pima Indian Database For Diabetes Prediction Using Knn,” J. Sisfokom (Sistem Inf. Dan Komputer), Vol. 12, No. 1, Pp. 70–75, 2023.
M. Benarbia, “A Machine Learning Approach To Predicting The Onset Of Type Ii Diabetes In A Sample Of Pima Indian Women,” 2022.
M. Abedini, A. Bijari, And T. Banirostam, “Classification Of Pima Indian Diabetes Dataset Using Ensemble Of Decision Tree, Logistic Regression And Neural Network,” Int. J. Adv. Res. Comput. Commun. Eng, Vol. 9, No. 7, Pp. 7–10, 2020.
V. Chang, J. Bailey, Q. A. Xu, And Z. Sun, “Pima Indians Diabetes Mellitus Classification Based On Machine Learning (Ml) Algorithms,” Neural Comput. Appl., Vol. 35, No. 22, Pp. 16157–16173, 2023.
U. C. I. M. Learning, “Pima Indians Diabetes Database,” Kaggle. Com/Uciml/Pima-Indians-Diabetes-Database, 2016.
D. Nofriansyah And G. W. Nurcahyo, Algoritma Data Mining Dan Pengujian. Deepublish, 2015
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Infotek: Jurnal Informatika dan Teknologi
This work is licensed under a Creative Commons Attribution 4.0 International License.
Semua tulisan pada jurnal ini menjadi tanggung jawab penuh penulis. Jurnal Infotek memberikan akses terbuka terhadap siapapun agar informasi dan temuan pada artikel tersebut bermanfaat bagi semua orang. Jurnal Infotek ini dapat diakses dan diunduh secara gratis, tanpa dipungut biaya sesuai dengan lisense creative commons yang digunakan.Jurnal Infotek is licensed under a Creative Commons Attribution 4.0 International License.
Statistik Pengunjung