Comparative Analysis of Naive Bayes and SVM for Improved Emotion Classification on Social Media
DOI:
https://doi.org/10.29408/edumatic.v9i1.29087Keywords:
emotion classification, feature extraction, naïve bayes, preprocessing, svmAbstract
identifying emotions such as happy, angry, sad, and fear. However, Indonesian text processing faces challenges due to language complexity and slang. This research aims to compare Naive Bayes and SVM models, focusing on evaluating the impact of preprocessing, feature extraction, and parameter optimization to improve emotion classification. The dataset was collected from API X using crawling techniques and manually annotated by six annotators. The training process used full and half preprocessing datasets with TF-IDF, BoW, and Word2Vec feature extraction. Naive Bayes and SVM models were evaluated using accuracy, precision, recall, and F1 score. Our results show that full preprocessing improves accuracy, with TF-IDF + BoW achieving 78.01% with SVM and outperforming Naïve Bayes at 75.53%. The results classify emotions into four classes: happy, sad, angry, and fear. This study demonstrates the value of preprocessing and feature selection to deal with slang and complexity in Indonesian texts. These results provide insights for developing optimal emotion classification models and offer applications in sentiment analysis, social media monitoring, and mental health detection.
References
Akbar, B. M., Akbar, A. T., & Husaini, R. (2022). Analysis of Sentiments and Emotions about Sinovac Vaccine Using Naive Bayes. Telematika: Jurnal Informatika Dan Teknologi Informasi, 19(2), 185–200. https://doi.org/10.31315/telematika.v19i2.7601
Fudholi, D. H. (2021). Klasifikasi Emosi pada Teks dengan Menggunakan Metode Deep Learning. Syntax Literate; Jurnal Ilmiah Indonesia, 6(1), 546-553.
Galke, L., & Scherp, A. (2021). Bag-of-words vs. graph vs. sequence in text classification: Questioning the necessity of text-graphs and the surprising strength of a wide MLP. ArXiv Preprint ArXiv:2109.03777.
Habberrih, A., & Abuzaraida, M. A. (2024). Sentiment Analysis of Libyan Dialect Using Machine Learning with Stemming and Stop-words Removal. International conference on communication engineering and computer science, 259-264. https://doi.org/10.24086/cocos2024/paper.1171
Indira, R., & Maharani, W. (2021). Personality detection on social media twitter using long short-term memory with word2vec. IEEE International Conference on Communication, Networks and Satellite (COMNETSAT), 64-69. IEEE. https://doi.org/10.1109/COMNETSAT53002.2021.9530820
Jabasheela, L. (2024). Secured Text-Based Emotion Classification Using Machine Learning With NLP. Educational Administration: Theory and Practice, 30(5), 901-910.
Mokari, A., Guo, S., & Bocklitz, T. (2023). Exploring the steps of infrared (IR) spectral analysis: Pre-processing, (classical) data modelling, and deep learning. Molecules, 28(19), 6886. https://doi.org/10.3390/molecules28196886
Ningsih, M. R., Unjung, J., Pertiwi, D. A. A., Prasetiyo, B., & Muslim, M. A. (2024). Optimized support vector machine with particle swarm optimization to improve the accuracy amazon sentiment analysis classification. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 101-108. https://doi.org/10.22219/kinetik.v9i1.1888
Nugroho, R. A., Cholissodin, I., & Indriati, I. (2021). Implementasi Naïve Bayes Classifier untuk Klasifikasi Emosi Tweet Berbahasa Indonesia pada Spark. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 5(1), 301-310.
Paskahningrum, Y. K., Utami, E., & Yaqin, A. (2023). A Systematic Literature Review of Stemming in Non-Formal Indonesian Language. International Journal of Innovative Science and Research Technology, 8(1), 62-69.
Prasetija, Z. R. N. S., Romadhony, A., & Setiawan, E. B. (2022). Analisis Pengaruh Normalisasi Teks pada Klasifikasi Sentimen Ulasan Produk Kecantikan. eProceedings of Engineering, 9(3), 1769.
Regita, E., Luthfiyyah, N., & Marsuki, N. R. (2024). Pengaruh Media Sosial Terhadap Persepsi Diri dan Pembentukan Identitas Remaja di Indonesia. Jurnal Kajian Dan Penelitian Umum, 2(1), 46–52. https://doi.org/10.47861/jkpu-nalanda.v2i1.830
Riani, A. P., Sulistyowati, N., Ridwan, T., & Voutama, A. (2023). Klasifikasi Emosi Publik Terhadap Larangan Penggunaan Obat Sirup Menggunakan Algoritma Naive Bayes. METHOMIKA: Jurnal Manajemen Informatika & Komputerisasi Akuntansi, 7(2), 325-339. https://doi.org/10.46880/jmika.Vol7No2.pp325-339
Sarimole, F. M., & Kudrat, K. (2024). Analisis Sentimen Terhadap Aplikasi Satu Sehat Pada Twitter Menggunakan Algoritma Naive Bayes Dan Support Vector Machine. Jurnal Sains Dan Teknologi, 5(3), 783–790. https://doi.org/10.55338/saintek.v5i3.2702
Siddique, Z. B., Khan, M. A., Din, I. U., Almogren, A., Mohiuddin, I., & Nazir, S. (2021). Machine Learning‐Based Detection of Spam Emails. Scientific Programming, 2021(1), 6508784. https://doi.org/10.1155/2021/6508784
Sudianto, S. (2022). Analisis Kinerja Algoritma Machine Learning Untuk Klasifikasi Emosi. Building of Informatics, Technology and Science (BITS), 4(2), 1027-1034. https://doi.org/10.47065/bits.v4i2.2261
Supian, A., Revaldo, B. T., Marhadi, N., Efrizoni, L., & Rahmaddeni, R. (2024). Perbandingan Kinerja Naïve Bayes dan SVM pada Analisis Sentimen Twitter Ibukota Nusantara. JURNAL ILMIAH INFORMATIKA, 12(01), 15-21. https://doi.org/10.33884/jif.v12i01.8721
Zhang, F., Chen, J., Tang, Q., & Tian, Y. (2024). Evaluation of emotion classification schemes in social media text: an annotation-based approach. BMC Psychology, 12(1), 503. https://doi.org/10.1186/s40359-024-02008-w
Zhang, P., Ma, Z., Ren, Z., Wang, H., Zhang, C., Wan, Q., & Sun, D. (2024). Design of an Automatic Classification System for Educational Reform Documents Based on Naive Bayes Algorithm. Mathematics, 12(8), 1127. https://doi.org/10.3390/math12081127
Zharifa, A. H. A., & Ujianto, E. I. H. (2024). Analisis Sentimen Publik di Twitter Pasca Debat Kelima Pilpres 2024 dengan Naive Bayes. Edumatic: Jurnal Pendidikan Informatika, 8(2), 754-763. https://doi.org/10.29408/edumatic.v8i2.28048
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Rio Ferdinand Putra Pratama, Warih Maharani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Semua tulisan pada jurnal ini adalah tanggung jawab penuh penulis. Edumatic: Jurnal Pendidikan Informatika bisa diakses secara free (gratis) tanpa ada pungutan biaya, sesuai dengan lisensi creative commons yang digunakan.

This work is licensed under a Lisensi a Creative Commons Attribution-ShareAlike 4.0 International License.