Optimasi Klasifikasi Sentimen Ulasan Game Berbahasa Indonesia: IndoBERT dan SMOTE untuk Menangani Ketidakseimbangan Kelas

Authors

DOI:

https://doi.org/10.29408/edumatic.v9i1.29666

Keywords:

sentiment analysis, indobert, user reviews, smote, imbalanced dataset

Abstract

The increased use of gaming apps on platforms like the Google Play Store has signaled the importance of user reviews as a source of app quality evaluation. However, sentiment analysis of Indonesian-language reviews faces challenges due to the peculiarities of language structure, emotional expressions, and the use of slang and specialized terms in game reviews. This study aims to classify reviews into three sentiment classes: positive, negative, and neutral, using the IndoBERT-base-uncased model. The type of research used is experimental by comparing the performance of the model using original and synthetic datasets. The total original dataset collected was 998 reviews. The k_neighbors SMOTE parameter used is 5. The IndoBERT-base-uncased epoch parameter is 10, with a batch value per device and a batch for evaluation of 16. Configuration variable warmup_steps is 500 with L2 weight_decay regularization at 0.01. Evaluation results after SMOTE implementation: the precision score increased from 0.44 to 0.45, and the F1-score from 0.46 to 0.47. However, the recall score did not increase. The evaluation results show that the model has variable performance between classes with an initial accuracy of 69.,5%. Data imbalance is a major challenge, especially in minority classes such as class 1 (neutral), which cannot be predicted by the model. The SMOTE technique successfully improved data balance and increased accuracy to 72.5%, as well as improving metrics such as precision, recall, and F1-score overall.

References

A’la, F. Y. (2022). Indonesian Sentiment Analysis towards MyPertamina Application Reviews by Utilizing Machine Learning Algorithms. Journal of Informatics Information System Software Engineering and Applications (INISTA), 5(1), 80–91. https://doi.org/https://doi.org/10.20895/inista.v5i1.838

A’la, F. Y., Firdaus, N., Hartatik, & Safi’Ie, M. A. (2023). SMOTE on Numeric Breast Cancer Dataset to Overcome Imbalance Class. 2023 6th International Conference of Computer and Informatics Engineering (IC2IE), 335–339. Indonesia: IEEE. https://doi.org/10.1109/IC2IE60547.2023.10331221

Cahyono, B. E. H., & Sawitri, D. A. (2024). Pemerolehan Aspek Leksikon dan Struktur Kalimat Bahasa Indonesia: Sebuah Studi Kasus Anak Naya di Ponorogo. Jurnal Onoma: Pendidikan, Bahasa, Dan Sastra, 10(3), 3144–3162. https://doi.org/10.30605/onoma.v10i3.4062

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Dharmendra, I. K., Putra, I. M. A. W., & Atmojo, Y. P. (2024). Evaluation of the Effectiveness of SMOTE and Random Under Sampling in Emotion Classification of Tweets. Informatics for Educators and Professional: Journal of Informatics, 9(2), 182-193. https://doi.org/10.51211/itbi.v9i2.3183

Hidayat, W., Ardiansyah, M., & Setyanto, A. (2021). Pengaruh Algoritma ADASYN dan SMOTE terhadap Performa Support Vector Machine pada Ketidakseimbangan Dataset Airbnb. Edumatic: Jurnal Pendidikan Informatika, 5(1), 11–20. https://doi.org/10.29408/edumatic.v5i1.3125

Imaduddin, H., A’la, F. Y., & Nugroho, Y. S. (2023). Sentiment Analysis in Indonesian Healthcare Applications using IndoBERT Approach. International Journal of Advanced Computer Science and Applications, 14(8)113-117. https://doi.org/10.14569/IJACSA.2023.0140813

Imaduddin, H., Kusumaningtias, L. A., & A’la, F. Y. (2023). Application of LSTM and GloVe Word Embedding for Hate Speech Detection in Indonesian Twitter Data. Ingénierie Des Systèmes d Information, 28(4), 1107–1112. https://doi.org/10.18280/isi.280430

Irfannandhy, R., Handoko, L. B., & Ariyanto, N. (2024). Analisis Performa Model Random Forest dan CatBoost dengan Teknik SMOTE dalam Prediksi Risiko Diabetes. Edumatic: Jurnal Pendidikan Informatika, 8(2), 714–723. https://doi.org/10.29408/edumatic.v8i2.27990

Jimoh, I. A., Ismaila, I., & Olalere, M. (2019). Enhanced Decision Tree-J48 With SMOTE Machine Learning Algorithm for Effective Botnet Detection in Imbalance Dataset. The 15th International Conference on Electronics, Computer and Computation (ICECCO), 1–8. Abuja, Nigeria: IEEE. https://doi.org/10.1109/ICECCO48375.2019.9043233

Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP. arXiv preprint arXiv:2011.00677. https://doi.org/10.18653/v1/2020.coling-main.66

Lv, D., Ma, Z., Yang, S., Li, X., Ma, Z., & Jiang, F. (2018). The Application of SMOTE Algorithm for Unbalanced Data. Proceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality, 10–13. Nagoya, Japan: ACM Digital Library. https://doi.org/10.1145/3293663.3293686

Mohammed, A. J. (2020). Improving Classification Performance for a Novel Imbalanced Medical Dataset using SMOTE Method. International Journal of Advanced Trends in Computer Science and Engineering, 9(3), 3161–3172. https://doi.org/10.30534/ijatcse/2020/104932020

Nuradita, R. D. (2022). Struktur Kalimat Dasar Bahasa Indonesia dalam Materi Aplikasi Duolingo (Kajian Sintaksis). Wicara: Jurnal Sastra, Bahasa, Dan Budaya, 1(2), 84–89.

Pamungkas, A. A., Alam, C. N., Atmadja, A. R., & Juliansyah, R. (2024). Integrasi Kamus Multibahasa pada Feed Forward Neural Network dan IndoBERT dalam Pengembangan Chatbot Mobile. Edumatic: Jurnal Pendidikan Informatika, 8(2), 635–644. https://doi.org/10.29408/edumatic.v8i2.27886

Pardede, J., & Pamungkas, D. P. (2024). The Impact of Balanced Data Techniques on Classification Model Performance. Scientific Journal of Informatics, 11(2), 401-412. https://doi.org/10.15294/sji.v11i2.3649

Pradnyana, G. A., Anggraeni, W., Yuniarno, E. M., & Purnomo, M. H. (2023). Fine-Tuning IndoBERT Model for Big Five Personality Prediction from Indonesian Social Media. International Seminar on Intelligent Technology and Its Applications (ISITIA), 93–98. Bandung, Indonesia: IEEE. https://doi.org/10.1109/ICoDSA58501.2023.10277572

Putra, A. B. Y. A., Sibaroni, Y., & Ihsan, A. F. (2023). Disinformation Detection on 2024 Indonesia Presidential Election using IndoBERT. International Conference on Data Science and Its Applications (ICoDSA), 350–355. Bandung, Indonesia: IEEE. https://doi.org/10.1109/ICoDSA58501.2023.10277572

Rahmawati, A., Alamsyah, A., & Romadhony, A. (2022). Hoax News Detection Analysis using IndoBERT Deep Learning Methodology. 2022 10th International Conference on Information and Communication Technology (ICoICT), 368–373. Bandung, Indonesia: IEEE. https://doi.org/10.1109/ICoICT55009.2022.9914902

Riandhoko, A., Amalita, N., Vionanda, D., & Salma, A. (2024). Handling Unbalanced Data with SMOTE Algorithm for Unemployment Classification in Lima Puluh Kota Regency Using CART Method. Indonesian Journal of Statistics and Its Applications, 8(2), 166–177. https://doi.org/10.29244/ijsa.v8i2p166-177

Setiawan, A., & Suryono, R. R. (2024). Analisis Sentimen Ibu Kota Nusantara menggunakan Algoritma Support Vector Machine dan Naïve Bayes. Edumatic: Jurnal Pendidikan Informatika, 8(1), 183–192. https://doi.org/10.29408/edumatic.v8i1.25667

Wilie, B., Vincentio, K., Winata, G. I., Cahyawijaya, S., Li, X., Lim, Z. Y., ... & Purwarianti, A. (2020). IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding. arXiv preprint arXiv:2009.05387. https://doi.org/10.18653/v1/2020.aacl-main.85

Yulfa, R. I., Setiawan, B. H., Lourensius, G. G., & Purwandari, K. (2023). Enhancing Hate Speech Detection in Social Media Using IndoBERT Model: A Study of Sentiment Analysis during the 2024 Indonesia Presidential Election. International Conference on Computer and Applications (ICCA), 1–6. Cairo, Egypt: IEEE. https://doi.org/10.1109/ICCA59364.2023.10401700

Downloads

Published

2025-04-17

How to Cite

A’la, F. Y. (2025). Optimasi Klasifikasi Sentimen Ulasan Game Berbahasa Indonesia: IndoBERT dan SMOTE untuk Menangani Ketidakseimbangan Kelas. Edumatic: Jurnal Pendidikan Informatika, 9(1), 256–265. https://doi.org/10.29408/edumatic.v9i1.29666