Komparasi Distance Measure Pada K-Medoids Clustering untuk Pengelompokkan Penyakit Ispa

Mia Nuranti; Mia Nuur Aini; Ultach Enri

doi:10.29408/edumatic.v5i1.3359

Authors

Mia Nuranti Program Studi Teknik Informatika, Universitas Singaperbangsa Karawang http://orcid.org/0000-0003-2389-7749
Mia Nuur Aini Program Studi Teknik Informatika, Universitas Singaperbangsa Karawang http://orcid.org/0000-0002-0987-5642
Ultach Enri Program Studi Teknik Informatika, Universitas Singaperbangsa Karawang http://orcid.org/0000-0002-9516-5478

DOI:

https://doi.org/10.29408/edumatic.v5i1.3359

Keywords:

Chebyshev Distance, Davies Bouldin Index, Euclidean Distance, K-Medoids

Abstract

K-Medoids is an unsupervised algorithm that uses a distance measure to classify data. The distance measure is a method that can help an algorithm classify data based on the similarity of the variables. Several studies have shown that using the right distance measure can improve the performance of the algorithm in clustering. Euclidean and Chebyshev is two of some distance measures that can be used. In 2016, Karawang Health Office stated that 175.891 Karawang citizens were suffering from ISPA. This figure continued to increase in the following year until 2019. The total of Karawang citizens who suffering from ISPA reached 181.945 people. To assist the government in overcoming this problem, a clustering process will be carried out to group the areas where the ISPA is spreading in Karawang District. The area will be divided into three clusters, namely low, medium and high. Comparison of distance measures is carried out to find the best model based on the evaluation of the Davies Bouldin Index (DBI). The use of Euclidean-distance produces a DBI score of 0,088 meanwhile the use of Chebyshev distance resulted in a DBI score of 0,116. The performance of the K-Medoids algorithm with Euclidean-distance is considered to be better than Chebyshev distance because it produces a DBI score that is near to 0.

References

Bastian, A., Sujadi, H., & Febrianto, G. (2018). Penerapan Algoritma k-means clustering analisis pada penyakit menular manusia. Analisis Pada Penyakit Menular Manusia, 14(1), 28–34.

de la Vega, A., García-Saiz, D., Zorrilla, M., & Sánchez, P. (2020). Lavoisier: A DSL for increasing the level of abstraction of data selection and formatting in data mining. Journal of Computer Languages, 60, 100987.

Ghazal, M. M., & Hammad, A. (2020). Application of knowledge discovery in database (KDD) techniques in cost overrun of construction projects. International Journal of Construction Management, 1–15. https://doi.org/10.1080/15623599.2020.1738205

Gueorguieva, N., Valova, I., & Georgiev, G. (2017). M&MFCM: fuzzy c-means clustering with mahalanobis and minkowski distance metrics. Procedia Computer Science, 114, 224–233.

Gultom, S., Sriadhi, S., Martiano, M., & Simarmata, J. (2018). Comparison analysis of K-means and K-medoid with Ecluidience distance algorithm, Chanberra distance, and Chebyshev distance for big data clustering. IOP Conference Series: Materials Science and Engineering, 420(1), 12092. IOP Publishing.

Gunawan, I., Anggraeni, G., Rini, E. S., & Mustofa, Y. (2020). Klasterisasi provinsi di Indonesia berbasis perkembangan kasus Covid-19 menggunakan metode K-Medoids. Seminar Nasional Matematika Dan Pendidikan Matematika (5thSENATIK), 301–306. Semarang: Universitas PGRI Semarang Press.

He, L., Agard, B., & Trépanier, M. (2020). A classification of public transit users with smart card data based on time series distance metrics and a hierarchical clustering method. Transportmetrica A: Transport Science, 16(1), 56–75.

Irawan, E., Siregar, S. P., Damanik, I. S., & Saragih, I. S. (2020). Implementasi Algoritma K-Medoids untuk Pengelompokkan Sebaran Mahasiswa Baru. Jurasik (Jurnal Riset Sistem Informasi Dan Teknik Informatika), 5(2), 275–281. https://doi.org/10.30645/jurasik.v5i2.213

Ishak, A., Siregar, K., Ginting, R., & Afif, M. (2020). Orange Software Usage in Data Mining Classification Method on The Dataset Lenses. IOP Conference Series: Materials Science and Engineering, 1003(1), 12113. IOP Publishing.

Juninda, T., Mustasim, & Andri, E. (2019). Penerapan Algoritma K-Medoids untuk Pengelompokan Penyakit di Pekanbaru Riau. Seminar Nasional Teknologi Informasi, Komunikasi Dan Industri, 11(1), 42–49.

Kumar, N., Jain, S., & Chauhan, K. (2019). Knowledge Discovery from Data Mining Techniques. International Journal of Engineering Research & Technology (IJERT), 7(12), 1–3.

Liu, H., Zhang, X., Zhang, X., & Cui, Y. (2017). Self-adapted mixture distance measure for clustering uncertain data. Knowledge-Based Systems, 126, 33–47.

Miftahuddin, Y., Umaroh, S., & Karim, F. R. (2020). Perbandingan Metode Perhitungan Jarak Euclidean, Haversine, dan Manhattan dalam Penentuan Posisi Karyawan. Jurnal Tekno Insentif, 14(2), 69–77. https://doi.org/10.36787/jti.v14i2.270

Mustofa, Z., & Suasana, I. S. (2018). Algoritma Clustering K-Medoids Pada E-Government Bidang Information And Communication Technology Dalam Penentuan Status EDGI. Jurnal Teknologi Informasi Dan Komunikasi, 9(1), 1–10.

Nahdliyah, M. A., Widiharih, T., & Prahutama, A. (2019). Metode K-Medoids Clustering dengan Validasi Silhouette Index dan C-Index. JURNAL GAUSSIAN, 8(2), 161–170.

Nawrin, S., Rahatur, M., & Akhter, S. (2017). Exploreing K-Means with Internal Validity Indexes for Data Clustering in Traffic Management System. International Journal of Advanced Computer Science and Applications, 8(3), 264–272. https://doi.org/10.14569/ijacsa.2017.080337

Ningrat, D. R., Maruddani, D. A. I., & Wuryandari, T. (2016). Analisis Cluster Dengan Algoritma K-Means Dan Fuzzy C-Means Clustering Untuk Pengelompokan Data Obligasi Korporasi. None, 5(4), 641–650.

Nishom, M. (2019). Perbandingan Akurasi Euclidean Distance, Minkowski Distance, dan Manhattan Distance pada Algoritma K-Means Clustering berbasis Chi-Square. Jurnal Informatika: Jurnal Pengembangan IT, 4(1), 20–24. https://doi.org/10.30591/jpit.v4i1.1253

Pandey, A., & Jain, A. (2017). Comparative Analysis of KNN Algorithm using Various Normalization Techniques. International Journal of Computer Network and Information Security, 9(11), 36–42. https://doi.org/10.5815/ijcnis.2017.11.04

Santoso, B., Cholissodin, I., & Setiawan, B. D. (2017). Optimasi K-Means untuk Clustering Kinerja Akademik Dosen Menggunakan Algoritme Genetika. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 1(12), 1652–1659.

Saputra, D. M., Saputra, D., & Oswari, L. D. (2020). Effect of Distance Metrics in Determining K-Value in K-Means Clustering Using Elbow and Silhouette Method. Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN, 341–346. Indonesia: Atlantis Press.

Sari, V. R., Firdausi, F., & Azhar, Y. (2020). Perbandingan Prediksi Kualitas Kopi Arabika dengan Menggunakan Algoritma SGD, Random Forest dan Naive Bayes. Edumatic: Jurnal Pendidikan Informatika, 4(2), 1–9.

Schmidt, C., & Sun, W. N. (2018). Synthesizing agile and knowledge discovery: case study results. Journal of Computer Information Systems, 58(2), 142–150.

Takdirillah, R. (2020). Penerapan Data Mining Menggunakan Algoritma Apriori Terhadap Data Transaksi Penjualan Bisnis Ritel. Edumatic: Jurnal Pendidikan Informatika, 4(1), 37–46.

Tao, X., Wang, R., Chang, R., Li, C., Liu, R., & Zou, J. (2019). Spectral clustering algorithm using density-sensitive distance measure with global and local consistencies. Knowledge-Based Systems, 170, 26–42.

Uska, M., Wirasasmita, R., Usuluddin, U., & Arianti, B. (2020). Evaluation of Rapidminer-Aplication in Data Mining Learning using PeRSIVA Model. Edumatic: Jurnal Pendidikan Informatika, 4(2), 164–171. https://doi.org/10.29408/edumatic.v4i2.2688