Perbandingan Prediksi Kualitas Kopi Arabika dengan Menggunakan Algoritma SGD, Naive Bayes, dan Random Forest

Veronica Retno Sari; Feranandah Firdausi; Yufis Azhar

doi:10.29408/edumatic.v4i2.2202

Authors

Veronica Retno Sari Program Studi Informatika, Universitas Muhammadiyah Malang http://orcid.org/0000-0002-6817-263X
Feranandah Firdausi Program Studi Informatika, Universitas Muhammadiyah Malang http://orcid.org/0000-0001-5749-2438
Yufis Azhar Program Studi Informatika, Universitas Muhammadiyah Malang http://orcid.org/0000-0002-8108-7085

DOI:

https://doi.org/10.29408/edumatic.v4i2.2202

Keywords:

Classifie, Coffee Arabica, Comparison, Prediction

Abstract

Classification is one of the techniques that exist in data mining and is useful for grouping a data based on the attachment of the data with the sample data. The dataset that is used in this study is the coffee dataset taken from Dataset Coffee Quality Institute on the GitHub platform. The attributes that contained in the dataset are Aroma, Aftertaste, Flavor, Acidity, Balance, Body, Uniformity, Sweetness, Clean Cup, and Copper points. There are 3 classification methods that are used in this study, Stochastic Gradient Descent, Random Forest and Naive Bayes. The aim of this study is to find out which algorithm is the most effective to predict the coffee quality in the dataset. After that, the prediction results will be tested using K-Fold Cross Validation and Area Under the Curve (AUC) method. The results show that Stochastic Gradient Descent obtained the best accuracy results compared to the other two methods with an accuracy of 98% and increased to 99% after tested using K-fold Cross Validation and AUC method.

References

Anjelika, T. & I. (2018). Penerapan Algoritma Modified K-Nearest Neighbour Pada Pengklasifikasian Penyakit Kejiwaan Skizofrenia. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 2(10), 3957â€“3961.

Arifin, O., & Sasongko, T. B. (2018). Analisa Perbandingan Tingkat Performansi Metode Support Vector Machine dan Naive Bayes Classifier Untuk Klasifikasi Jalur Minat SMA. Seminar Nasional Teknologi Informasi Dan Multimedia 2018, 6(1), 67â€“72.

Budi Adnyana, I. M. (2016). Prediksi Lama Studi Mahasiswa Dengan Metode Random Forest (Studi Kasus : Stikom Bali). CSRID (Computer Science Research and Its Development Journal), 8(3), 201â€“208. https://doi.org/10.22303/csrid.8.3.2016.201-208

Chandra, D., Ismono, R. H. dan, & Kasymir, E. (2013). Prospek Perdagangan Kopi Robusta Indonesia di Pasar Internasional. JIIA Jurnal Ilmu Ilmu Agribisnis, 1(1), 10â€“15.

Chen, W., Xie, X., Peng, J., Wang, J., Duan, Z., & Hong, H. (2017). GIS-based landslide susceptibility modelling: a comparative assessment of kernel logistic regression, NaÃ¯ve-Bayes tree, and alternating decision tree models. Geomatics, Natural Hazards and Risk, 8(2), 950â€“973. https://doi.org/10.1080/19475705.2017.1289250

Edowai, D. N., & Tahoba, A. E. (2018). Proses produksi dan uji mutu bubuk kopi arabika (coffea arabica L) asal kabupaten Dogiyai, Papua. Agriovet, 1(1), 1â€“18.

Gorunescu, F. (2011). Data Mining - Concepts, Models and Techniques. https://doi.org/10.16309/j.cnki.issn.1007-1776.2003.03.004

Harismawan, A. F., Kharisma, A. P., & Afirianto, T. (2018). Analisis Perbandingan Performa Web Service Menggunakan Bahasa Pemrograman Python , PHP , dan Perl pada Client Berbasis Android. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 4(2), 237â€“245.

Januarsyah, M. F., Zuhairi, E., & Malik, R. F. (2020). Perbandingan Algoritma Random Forest, Decision Stump, NaÃ¯ve Bayes, Bayesian Network dan Algoritma C4. 5 Untuk Prediksi Pola Kartu Poker. Annual Research Seminar (ARS), 5 (1), 122-126).

JimÃ©nez-Carvelo, A. M., GonzÃ¡lez-Casado, A., Bagur-GonzÃ¡lez, M. G., & Cuadros-RodrÃguez, L. (2019). Alternative data mining/machine learning methods for the analytical evaluation of food quality and authenticityâ€“A review. Food research international, 122, 25-39. https://doi.org/10.1016/j.foodres.2019.03.063

Karyadiputra, E., & Hijriana, N. (2018). Analisis Penerapan Algoritma Naive Bayes Untuk Klasifikasi Prioritas Pengembangan Jalan Di Provinsi Kalimantan Selatan. Technologia: Jurnal Ilmiah, 9(2), 105-108. https://doi.org/10.31602/tji.v9i2.1374

Mandt, S., Hoffman, M. D., & Blei, D. M. (2017). Stochastic gradient descent as approximate bayesian inference. The Journal of Machine Learning Research, 18(1), 4873-4907.

Maskoen, T. T., & Purnama, D. (2018). Area Under the Curve dan Akurasi Cystatin C untuk Diagnosis Acute Kidney Injury pada Pasien Politrauma. Majalah Kedokteran Bandung, 50(4), 259â€“264. https://doi.org/10.15395/mkb.v50n4.1342

Oktanisa, I., & Supianto, A. A. (2018). Perbandingan Teknik Klasifikasi Dalam Data Mining Untuk Bank a Comparison of Classification Techniques in Data Mining for. Teknologi Informasi Dan Ilmu Komputer, 5(5), 567â€“576. https://doi.org/10.25126/jtiik20185958

Pawar, D., Mahajan, A., & Bhoithe, S. (2019). Wine Quality Prediction using Machine Learning Algorithms. International Journal of Computer Applications Technology and Research, 8(9), 385â€“388. https://doi.org/10.7753/ijcatr0809.1010

Pham, B. T., & Prakash, I. (2019). A novel hybrid model of Bagging-based NaÃ¯ve Bayes Trees for landslide susceptibility assessment. Bulletin of Engineering Geology and the Environment, 78(3), 1911â€“1925. https://doi.org/10.1007/s10064-017-1202-5

Pitria, P. (2014). Pengguna Twitter Pada Akun Resmi Samsung Indonesia Dengan Menggunakan NaÃ¯ve Bayes. Skripsi. Fakultas Teknologi Industri. Universitas Atma Jaya Yogyakarta.

Ridwan, M., Suyono, H., & Sarosa, M. (2013). Penerapan Data Mining Untuk Evaluasi Kinerja Akademik Mahasiswa Menggunakan Algoritma Naive Bayes Classifier. Eeccis, 7(1), 59â€“64. https://doi.org/10.1038/hdy.2009.180

Santos, K. M., Moura, M. F. V., Azevedo, F. G., Lima, K. M. G., Raimundo, I. M., & Pasquini, C. (2012). Classification of Brazilian Coffee Using Near-Infrared Spectroscopy and Multivariate Calibration. Analytical Letters, 45(7), 774â€“781. https://doi.org/10.1080/00032719.2011.653905

Saritas, M. M., & Yasar, A. (2019). Performance analysis of ANN and Naive Bayes classification algorithm for data classification. International Journal of Intelligent Systems and Applications in Engineering, 7(2), 88â€“91.

Takdirillah, R. (2020). Penerapan Data Mining Menggunakan Algoritma Apriori Terhadap Data Transaksi Sebagai Pendukung Informasi Strategi Penjualan. Edumatic: Jurnal Pendidikan Informatika, 4(1), 37â€“46.

Tolessa, K., Rademaker, M., De Baets, B., & Boeckx, P. (2016). Prediction of specialty coffee cup quality based on near infrared spectra of green coffee beans. Talanta, 150, 367-374. https://doi.org/10.1016/j.talanta.2015.12.039

Umar, R., Riadi, I., & Purwono. (2020). Perbandingan Metode SVM, RF dan SGD untuk Penentuan Model Klasifikasi Kinerja Programmer pada Aktivitas Media Sosial. Jurnal Resti, 4(2), 329â€“335. https://doi.org/10.29207/resti.v4i2.1770

Xu, S. (2018). Bayesian NaÃ¯ve Bayes classifiers to text classification. Journal of Information Science, 44(1), 48â€“59.