Comparison of Naïve Bayes Algorithm and XGBoost on Local Product Review Text Classification

Authors

DOI:

https://doi.org/10.29408/edumatic.v6i1.5613

Keywords:

marketplace, naïve bayes, sentiment analysis, TF-IDF, XGBoost

Abstract

Online reviews are critical in supporting purchasing decisions because, with the development of e-commerce, there are more and more fake reviews, so more and more consumers are worried about being deceived in online shopping. Sentiment analysis can be applied to Marketplace product reviews. This study aims to compare the two categories of Naïve Bayes and XGBoost by using the two vector spaces wod2vec and TFIDF. The methods used in this research are data collection, data cleaning, data labelling, data pre-processing, classification and evaluation. The data scraping process produced 25,581 data which was divided into 80% training data and 20% test data. The data is divided into two classes, namely good sentiment and bad sentiment. Based on the research that has been done, the combination of Word2vec + XGBoost F1 scores higher by 0.941, followed by TF-IDF + XGBoost by 0.940. Meanwhile, Naïve Bayes has an F1-Score of 0.915 with TF-IDF and 0.900 with word2vec. Classification using XGBoost proved to be able to classify unbalanced data better than Naïve Bayes.

References

Afifah, K., Yulita, I. N., & Sarathan, I. (2021). Sentiment Analysis on Telemedicine App Reviews using XGBoost Classifier. 2021 International Conference on Artificial Intelligence and Big Data Analytics, 22–27. https://doi.org/10.1109/ICAIBDA53487.2021.9689735

Akter, M. T., Begum, M., & Mustafa, R. (2021). Bengali Sentiment Analysis of E-commerce Product Reviews using K-Nearest Neighbors. International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), 40–44. https://doi.org/10.1109/ICICT4SD50815.2021.9396910

Amin, S., Uddin, M. I., AlSaeed, D., & Khan, A. (2021). Early Detection of Seasonal Outbreaks from Twitter Data Using Machine Learning Approaches. Complexity, 2021, 1–12. https://doi.org/10.1155/2021/5520366

Bi, J.-W., Liu, Y., & Fan, Z.-P. (2019). Representing sentiment analysis results of online reviews using interval type-2 fuzzy numbers and its application to product ranking. Information Sciences, 504, 293–307. https://doi.org/https://doi.org/10.1016/j.ins.2019.07.025

Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660-667. https://doi.org/https://doi.org/10.1016/j.procs.2021.12.187

Jayadi, S. F. N. H. R. (2022). Sentiment Analysis Of Indonesian E-Commerce Product Reviews Using Support Vector Machine Based Term Frequency Inverse Document. Journal of Theoretical and Applied Information Technology. 99(17), 4316–4325.

Kemp, S. (2022). Digital 2022 Indonesia :Internet use in Indonesia 2022. Datareportal. https://datareportal.com/reports/digital-2022-indonesia

Kevin, V., Que, S., Iriani, A., & Purnomo, H. D. (2020). Analisis Sentimen Transportasi Online Menggunakan Support Vector Machine Berbasis Particle Swarm Optimization ( Online Transportation Sentiment Analysis Using Support Vector Machine Based on Particle Swarm Optimization ). Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 9(2), 162–170. https://doi.org/10.22146/jnteti.v9i2.102

Kotu, V., & Deshpande, B. (2019). Chapter 8 - Model Evaluation. In V. Kotu & B. Deshpande (Eds.), Data Science (Second Edition) (Second Edition, pp. 263–279). Morgan Kaufmann. https://doi.org/https://doi.org/10.1016/B978-0-12-814761-0.00008-3

Kurniawan, F. W., & Maharani, W. (2020). Analisis Sentimen Twitter Bahasa Indonesia dengan Word2Vec. eProceedings of Engineering, 7(2), 7821–7829.

Nurdin, A., Seno aji, B., Bustamin, A., & Abidin, Z. (2020). Perbandingan Kinerja Word Embedding Word2vec, Glove, dan Fasttext pada Klasifikasi Teks. Jurnal Tekno Kompak, 14(2), 74–79. https://doi.org/10.33365/jtk.v14i2.732

Permadi, V. A. (2020). Analisis Sentimen Menggunakan Algoritma Naive Bayes Terhadap Review Restoran di Singapura. Jurnal Buana Informatika, 11(2), 141–151. https://doi.org/10.24002/jbi.v11i2.3769

Rohman, A. N., Luviana Musyarofah, R., Utami, E., & Raharjo, S. (2020). Natural Language Processing on Marketplace Product Review Sentiment Analysis. 2ndInternational Conference on Cybernetics and Intelligent System (ICORIS), 1–5. https://doi.org/10.1109/ICORIS50180.2020.9320827

Shuai, Q., Huang, Y., Jin, L., & Pang, L. (2018). Sentiment Analysis on Chinese Hotel Reviews with Doc2Vec and Classifiers. IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 1171–1174. https://doi.org/10.1109/IAEAC.2018.8577581

Sihombing, L. O., Hannie, H., & Dermawan, B. A. (2021). Sentimen Analisis Customer Review Produk Shopee Indonesia Menggunakan Algortima Naïve Bayes Classifier. Edumatic: Jurnal Pendidikan Informatika, 5(2), 233–242. https://doi.org/10.29408/edumatic.v5i2.4089

Lestandy, M., Abdurrahim, A., & Syafa’ah, L. (2021). Analisis Sentimen Tweet Vaksin COVID-19 Menggunakan Recurrent Neural Network dan Naïve Bayes. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(4), 802-808. https://doi.org/10.29207/resti.v5i4.3308

Wang, Q., Zhang, W., Li, J., Mai, F., & Ma, Z. (2022). Effect of online review sentiment on product sales: The moderating role of review credibility perception. Computers in Human Behavior, 133, 107272. https://doi.org/https://doi.org/10.1016/j.chb.2022.107272

Wang, X., Zhou, T., Wang, X., & Fang, Y. (2022). Harshness-aware sentiment mining framework for product review. Expert Systems with Applications, 187, 115887. https://doi.org/10.1016/j.eswa.2021.115887

Warsito, B., & Prahutama, A. (2020). Sentiment Analysis on Tokopedia Product Online Reviews Using Random Forest Method. The 5th International Conference on Energy, Environmental and Information System (ICENIS 2020). 1-10, EDP Sciences. https://doi.org/10.1051/e3sconf/202020216006

Setiawan, E. B., & Nugraha, F. N. (2019). Implementation of Decision Tree C4. 5 for Big Five Personality Predictions with TF-RF and TF-CHI2 on Social Media Twitter. In 2019 International Conference on Computer, Control, Informatics and its Applications (IC3INA) 114–119. IEEE. https://doi.org/10.1109/IC3INA48034.2019.8949601

Yennimar, Y., & Rizal, R. A. (2019). Comparison of Machine Learning Classification Algorithms in Sentiment Analysis Product Review of North Padang Lawas Regency. Sinkron: jurnal dan penelitian teknik informatika, 4(1), 268-273. https://doi.org/10.33395/sinkron.v4i1.10416

Downloads

Published

2022-06-19