Comparison of Naïve Bayes Algorithm and XGBoost on Local Product Review Text Classification

Ivan Rifky Hendrawan, Ema Utami, Anggit Dwi Hartanto


Online reviews are critical in supporting purchasing decisions because, with the development of e-commerce, there are more and more fake reviews, so more and more consumers are worried about being deceived in online shopping. Sentiment analysis can be applied to Marketplace product reviews. This study aims to compare the two categories of Naïve Bayes and XGBoost by using the two vector spaces wod2vec and TFIDF. The methods used in this research are data collection, data cleaning, data labelling, data pre-processing, classification and evaluation. The data scraping process produced 25,581 data which was divided into 80% training data and 20% test data. The data is divided into two classes, namely good sentiment and bad sentiment. Based on the research that has been done, the combination of Word2vec + XGBoost F1 scores higher by 0.941, followed by TF-IDF + XGBoost by 0.940. Meanwhile, Naïve Bayes has an F1-Score of 0.915 with TF-IDF and 0.900 with word2vec. Classification using XGBoost proved to be able to classify unbalanced data better than Naïve Bayes.


marketplace; naïve bayes; sentiment analysis; TF-IDF; XGBoost

Full Text:



Afifah, K., Yulita, I. N., & Sarathan, I. (2021). Sentiment Analysis on Telemedicine App Reviews using XGBoost Classifier. 2021 International Conference on Artificial Intelligence and Big Data Analytics, 22–27.

Akter, M. T., Begum, M., & Mustafa, R. (2021). Bengali Sentiment Analysis of E-commerce Product Reviews using K-Nearest Neighbors. International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), 40–44.

Amin, S., Uddin, M. I., AlSaeed, D., & Khan, A. (2021). Early Detection of Seasonal Outbreaks from Twitter Data Using Machine Learning Approaches. Complexity, 2021, 1–12.

Bi, J.-W., Liu, Y., & Fan, Z.-P. (2019). Representing sentiment analysis results of online reviews using interval type-2 fuzzy numbers and its application to product ranking. Information Sciences, 504, 293–307.

Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660-667.

Jayadi, S. F. N. H. R. (2022). Sentiment Analysis Of Indonesian E-Commerce Product Reviews Using Support Vector Machine Based Term Frequency Inverse Document. Journal of Theoretical and Applied Information Technology. 99(17), 4316–4325.

Kemp, S. (2022). Digital 2022 Indonesia :Internet use in Indonesia 2022. Datareportal.

Kevin, V., Que, S., Iriani, A., & Purnomo, H. D. (2020). Analisis Sentimen Transportasi Online Menggunakan Support Vector Machine Berbasis Particle Swarm Optimization ( Online Transportation Sentiment Analysis Using Support Vector Machine Based on Particle Swarm Optimization ). Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 9(2), 162–170.

Kotu, V., & Deshpande, B. (2019). Chapter 8 - Model Evaluation. In V. Kotu & B. Deshpande (Eds.), Data Science (Second Edition) (Second Edition, pp. 263–279). Morgan Kaufmann.

Kurniawan, F. W., & Maharani, W. (2020). Analisis Sentimen Twitter Bahasa Indonesia dengan Word2Vec. eProceedings of Engineering, 7(2), 7821–7829.

Nurdin, A., Seno aji, B., Bustamin, A., & Abidin, Z. (2020). Perbandingan Kinerja Word Embedding Word2vec, Glove, dan Fasttext pada Klasifikasi Teks. Jurnal Tekno Kompak, 14(2), 74–79.

Permadi, V. A. (2020). Analisis Sentimen Menggunakan Algoritma Naive Bayes Terhadap Review Restoran di Singapura. Jurnal Buana Informatika, 11(2), 141–151.

Rohman, A. N., Luviana Musyarofah, R., Utami, E., & Raharjo, S. (2020). Natural Language Processing on Marketplace Product Review Sentiment Analysis. 2ndInternational Conference on Cybernetics and Intelligent System (ICORIS), 1–5.

Shuai, Q., Huang, Y., Jin, L., & Pang, L. (2018). Sentiment Analysis on Chinese Hotel Reviews with Doc2Vec and Classifiers. IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 1171–1174.

Sihombing, L. O., Hannie, H., & Dermawan, B. A. (2021). Sentimen Analisis Customer Review Produk Shopee Indonesia Menggunakan Algortima Naïve Bayes Classifier. Edumatic: Jurnal Pendidikan Informatika, 5(2), 233–242.

Lestandy, M., Abdurrahim, A., & Syafa’ah, L. (2021). Analisis Sentimen Tweet Vaksin COVID-19 Menggunakan Recurrent Neural Network dan Naïve Bayes. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(4), 802-808.

Wang, Q., Zhang, W., Li, J., Mai, F., & Ma, Z. (2022). Effect of online review sentiment on product sales: The moderating role of review credibility perception. Computers in Human Behavior, 133, 107272.

Wang, X., Zhou, T., Wang, X., & Fang, Y. (2022). Harshness-aware sentiment mining framework for product review. Expert Systems with Applications, 187, 115887.

Warsito, B., & Prahutama, A. (2020). Sentiment Analysis on Tokopedia Product Online Reviews Using Random Forest Method. The 5th International Conference on Energy, Environmental and Information System (ICENIS 2020). 1-10, EDP Sciences.

Setiawan, E. B., & Nugraha, F. N. (2019). Implementation of Decision Tree C4. 5 for Big Five Personality Predictions with TF-RF and TF-CHI2 on Social Media Twitter. In 2019 International Conference on Computer, Control, Informatics and its Applications (IC3INA) 114–119. IEEE.

Yennimar, Y., & Rizal, R. A. (2019). Comparison of Machine Learning Classification Algorithms in Sentiment Analysis Product Review of North Padang Lawas Regency. Sinkron: jurnal dan penelitian teknik informatika, 4(1), 268-273.



  • There are currently no refbacks.

Copyright (c) 2022 Ivan Rifky Hendrawan, Ema Utami, Anggit Dwi Hartanto

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


Statistik Pengunjung

Creative Commons License

Edumatic: Jurnal Pendidikan Informatika is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.