Logistic Regression and Naïve Bayes Comparison in Classifying Emotions on Indonesian X Social Media
DOI:
https://doi.org/10.29408/edumatic.v9i1.29120Keywords:
bow, emotion classification, logistic regression, naive bayes, tf-idfAbstract
Emotions are integral to human interaction and decision-making, often expressed on social media platforms like X, which provides valuable data for sentiment analysis. However, analyzing texts from X poses challenges due to informal language, slang, and unique textual features. This study compares Logistic Regression and Naive Bayes in classifying emotions from Indonesian tweets, addressing gaps in prior research by exploring feature extraction methods, data split ratios, and hyperparameter tuning. Data were collected from 100 Telkom University students, resulting in 8,978 tweets labeled into four emotions: Happy, Sad, Angry, and Fear. After preprocessing, feature extraction methods TF-IDF and Bag of Words (BoW) were applied. Models were trained and tested on 10%, 20%, and 30% data splits, and performance was evaluated using accuracy, precision, recall, and F1-score. Hyperparameter tuning was conducted for Logistic Regression using GridSearch. Results showed Logistic Regression outperformed Naive Bayes, achieving 73.49% accuracy compared to 70.27%, with BoW yielding superior results over TF-IDF. The 20% data split provided the best balance for training and testing. This research demonstrates the effectiveness of Logistic Regression and highlights the importance of tailored feature extraction and parameter optimization for emotion classification in informal text datasets, particularly for Indonesian tweets.
References
Arias, F., Zambrano Nunez, M., Guerra-Adames, A., Tejedor-Flores, N., & Vargas-Lombardo, M. (2022). Sentiment Analysis of Public Social Media as a Tool for Health-Related Topics. IEEE Access, 10, 74850–74872. https://doi.org/10.1109/ACCESS.2022.3187406
Armansyah, A., & Ramli, R. K. (2022). Model prediksi kelulusan mahasiswa tepat waktu dengan metode Naïve Bayes. Edumatic: Jurnal Pendidikan Informatika, 6(1), 1-10. https://doi.org/10.29408/edumatic.v6i1.4789
Bahtiar, S. A. H., Dewa, C. K., & Luthfi, A. (2023). Comparison of Naïve Bayes and Logistic Regression in Sentiment Analysis on Marketplace Reviews Using Rating-Based Labeling. Journal of Information Systems and Informatics, 5(3), 915–927. https://doi.org/10.51519/journalisi.v5i3.539
Balboa, A., Cuesta, A., González-Villa, J., Ortiz, G., & Alvear, D. (2024). Logistic regression vs machine learning to predict evacuation decisions in fire alarm situations. Safety science, 174, 106485. https://doi.org/10.1016/j.ssci.2024.106485
Basha, S. R., Rao, M. S. B., Reddy, P. K. K., & Kumar, G. R. (2021). Emotional Tweets Analysis on Social Media with Short Text Classification Using Various Machine Learning Techniques. Journal of Computational and Theoretical Nanoscience, 17(12), 5477–5482. https://doi.org/10.1166/jctn.2020.9442
Hendrawan, I. R., Utami, E., & Hartanto, A. D. (2022). Comparison of naïve bayes algorithm and XGBoost on local product review text classification. Edumatic: Jurnal Pendidikan Informatika, 6(1), 143-149. https://doi.org/10.29408/edumatic.v6i1.5613
Holtgraves, T. (2022). Implicit communication of emotions via written text messages. Computers in Human Behavior Reports, 7, 1-8. https://doi.org/10.1016/j.chbr.2022.100219
Irmayani, D., Edi, F., Harahap, J. M., Rangkuti, R. K., Ulya, B., & Watrianthos, R. (2021, June). Naives Bayes algorithm for twitter sentiment analysis. In Journal of Physics: Conference Series , 1933(1), 1-6. IOP Publishing. https://doi.org/10.1088/1742-6596/1933/1/012019
Ismail, M., Hassan, N., & Bafjaish, S. S. (2020). Comparative Analysis of Naive Bayesian Techniques in Health-Related for Classification Task. Journal of Soft Computing and Data Mining, 1(2), 1–10.
Jim, J. R., Talukder, M. A. R., Malakar, P., Kabir, M. M., Nur, K., & Mridha, M. F. (2024). Recent advancements and challenges of NLP-based sentiment analysis: A state-of-the-art review. Natural Language Processing Journal, 6, 100059. https://doi.org/10.1016/j.nlp.2024.100059
Kabir, M. Y., & Madria, S. (2021). EMOCOV: Machine learning for emotion detection, analysis and visualization using COVID-19 tweets. Online Social Networks and Media, 23. https://doi.org/10.1016/j.osnem.2021.100135
Mao, Y., Liu, Q., & Zhang, Y. (2024). Sentiment analysis methods, applications, and challenges: A systematic literature review. Journal of King Saud University - Computer and Information Sciences, 36(1), 102048. https://doi.org/10.1016/j.jksuci.2024.102048
Naveen, P., & Trojovský, P. (2024). Overview and challenges of machine translation for contextually appropriate translations. Iscience, 27(10), 110878. https://doi.org/10.1016/j.isci.2024.110878
Nip, J. Y. M., & Berthelier, B. (2024). Social Media Sentiment Analysis. Encyclopedia, 4(4), 1590–1598. https://doi.org/10.3390/encyclopedia4040104
Pintas, J. T., Fernandes, L. A. F., & Garcia, A. C. B. (2021). Feature selection methods for text classification: a systematic literature review. Artificial Intelligence Review, 54(8), 6149–6200. https://doi.org/10.1007/s10462-021-09970-6
Ramadhani, B., & Suryono, R. R. (2024). Komparasi Algoritma Naïve Bayes dan Logistic Regression Untuk Analisis Sentimen Metaverse. Jurnal Media Informatika Budidarma, 8(2), 714-725. https://doi.org/10.30865/mib.v8i2.7458
Riani, A. P., Sulistyowati, N., Ridwan, T., & Voutama, A. (2023). Klasifikasi Emosi Publik Terhadap Larangan Penggunaan Obat Sirup Menggunakan Algoritma Naive Bayes. METHOMIKA: Jurnal Manajemen Informatika & Komputerisasi Akuntansi, 7(2), 325-339. https://doi.org/10.46880/jmika.Vol7No2.pp325-339
Sihombing, L. O., Hannie, H., & Dermawan, B. A. (2021). Sentimen Analisis Customer Review Produk Shopee Indonesia Menggunakan Algortima Naïve Bayes Classifier. Edumatic: Jurnal Pendidikan Informatika, 5(2), 233-242. https://doi.org/10.29408/edumatic.v5i2.4089
Toyibah, Z. B., Putri, Y. N., Puandini, P., Widodo, Z. M., & Ni’mah, A. T. (2024). Perbandingan Kinerja Algoritma Multinomial Naïve Bayes dan Logistic Regression pada Analisis Sentimen Movie Ratings IMDB. Jurnal Ilmiah Edutic : Pendidikan Dan Informatika, 10(2), 181–189. https://doi.org/10.21107/edutic.v10i2.28150
Vistorte, A. O. R., Deroncele-Acosta, A., Ayala, J. L. M., Barrasa, A., López-Granero, C., & Martí-González, M. (2024). Integrating artificial intelligence to assess emotions in learning environments: a systematic literature review. Frontiers in psychology, 15, 1-13. https://doi.org/10.3389/fpsyg.2024.1387089
Wahyuningsih, T., Manongga, D., Sembiring, I., & Wijono, S. (2024). Comparison of Effectiveness of Logistic Regression, Naive Bayes, and Random Forest Algorithms in Predicting Student Arguments. Procedia Computer Science, 234, 349–356. https://doi.org/10.1016/j.procs.2024.03.014
Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7), 5731–5780. https://doi.org/10.1007/s10462-022-10144-1
Zharifa, A. H. A., & Ujianto, E. I. H. (2024). Analisis Sentimen Publik di Twitter Pasca Debat Kelima Pilpres 2024 dengan Naive Bayes. Edumatic: Jurnal Pendidikan Informatika, 8(2), 754-763. https://doi.org/10.29408/edumatic.v8i2.28048
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Gerald Shabran Rasyad, Warih Maharani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Semua tulisan pada jurnal ini adalah tanggung jawab penuh penulis. Edumatic: Jurnal Pendidikan Informatika bisa diakses secara free (gratis) tanpa ada pungutan biaya, sesuai dengan lisensi creative commons yang digunakan.

This work is licensed under a Lisensi a Creative Commons Attribution-ShareAlike 4.0 International License.