Comparison of Naïve Bayes Algorithm and XGBoost on Local Product Review Text Classification
DOI:
https://doi.org/10.29408/edumatic.v6i1.5613Keywords:
marketplace, naïve bayes, sentiment analysis, TF-IDF, XGBoostAbstract
Online reviews are critical in supporting purchasing decisions because, with the development of e-commerce, there are more and more fake reviews, so more and more consumers are worried about being deceived in online shopping. Sentiment analysis can be applied to Marketplace product reviews. This study aims to compare the two categories of Naïve Bayes and XGBoost by using the two vector spaces wod2vec and TFIDF. The methods used in this research are data collection, data cleaning, data labelling, data pre-processing, classification and evaluation. The data scraping process produced 25,581 data which was divided into 80% training data and 20% test data. The data is divided into two classes, namely good sentiment and bad sentiment. Based on the research that has been done, the combination of Word2vec + XGBoost F1 scores higher by 0.941, followed by TF-IDF + XGBoost by 0.940. Meanwhile, Naïve Bayes has an F1-Score of 0.915 with TF-IDF and 0.900 with word2vec. Classification using XGBoost proved to be able to classify unbalanced data better than Naïve Bayes.
References
Afifah, K., Yulita, I. N., & Sarathan, I. (2021). Sentiment Analysis on Telemedicine App Reviews using XGBoost Classifier. 2021 International Conference on Artificial Intelligence and Big Data Analytics, 22–27. https://doi.org/10.1109/ICAIBDA53487.2021.9689735
Akter, M. T., Begum, M., & Mustafa, R. (2021). Bengali Sentiment Analysis of E-commerce Product Reviews using K-Nearest Neighbors. International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), 40–44. https://doi.org/10.1109/ICICT4SD50815.2021.9396910
Amin, S., Uddin, M. I., AlSaeed, D., & Khan, A. (2021). Early Detection of Seasonal Outbreaks from Twitter Data Using Machine Learning Approaches. Complexity, 2021, 1–12. https://doi.org/10.1155/2021/5520366
Bi, J.-W., Liu, Y., & Fan, Z.-P. (2019). Representing sentiment analysis results of online reviews using interval type-2 fuzzy numbers and its application to product ranking. Information Sciences, 504, 293–307. https://doi.org/https://doi.org/10.1016/j.ins.2019.07.025
Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660-667. https://doi.org/https://doi.org/10.1016/j.procs.2021.12.187
Jayadi, S. F. N. H. R. (2022). Sentiment Analysis Of Indonesian E-Commerce Product Reviews Using Support Vector Machine Based Term Frequency Inverse Document. Journal of Theoretical and Applied Information Technology. 99(17), 4316–4325.
Kemp, S. (2022). Digital 2022 Indonesia :Internet use in Indonesia 2022. Datareportal. https://datareportal.com/reports/digital-2022-indonesia
Kevin, V., Que, S., Iriani, A., & Purnomo, H. D. (2020). Analisis Sentimen Transportasi Online Menggunakan Support Vector Machine Berbasis Particle Swarm Optimization ( Online Transportation Sentiment Analysis Using Support Vector Machine Based on Particle Swarm Optimization ). Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 9(2), 162–170. https://doi.org/10.22146/jnteti.v9i2.102
Kotu, V., & Deshpande, B. (2019). Chapter 8 - Model Evaluation. In V. Kotu & B. Deshpande (Eds.), Data Science (Second Edition) (Second Edition, pp. 263–279). Morgan Kaufmann. https://doi.org/https://doi.org/10.1016/B978-0-12-814761-0.00008-3
Kurniawan, F. W., & Maharani, W. (2020). Analisis Sentimen Twitter Bahasa Indonesia dengan Word2Vec. eProceedings of Engineering, 7(2), 7821–7829.
Nurdin, A., Seno aji, B., Bustamin, A., & Abidin, Z. (2020). Perbandingan Kinerja Word Embedding Word2vec, Glove, dan Fasttext pada Klasifikasi Teks. Jurnal Tekno Kompak, 14(2), 74–79. https://doi.org/10.33365/jtk.v14i2.732
Permadi, V. A. (2020). Analisis Sentimen Menggunakan Algoritma Naive Bayes Terhadap Review Restoran di Singapura. Jurnal Buana Informatika, 11(2), 141–151. https://doi.org/10.24002/jbi.v11i2.3769
Rohman, A. N., Luviana Musyarofah, R., Utami, E., & Raharjo, S. (2020). Natural Language Processing on Marketplace Product Review Sentiment Analysis. 2ndInternational Conference on Cybernetics and Intelligent System (ICORIS), 1–5. https://doi.org/10.1109/ICORIS50180.2020.9320827
Shuai, Q., Huang, Y., Jin, L., & Pang, L. (2018). Sentiment Analysis on Chinese Hotel Reviews with Doc2Vec and Classifiers. IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 1171–1174. https://doi.org/10.1109/IAEAC.2018.8577581
Sihombing, L. O., Hannie, H., & Dermawan, B. A. (2021). Sentimen Analisis Customer Review Produk Shopee Indonesia Menggunakan Algortima Naïve Bayes Classifier. Edumatic: Jurnal Pendidikan Informatika, 5(2), 233–242. https://doi.org/10.29408/edumatic.v5i2.4089
Lestandy, M., Abdurrahim, A., & Syafa’ah, L. (2021). Analisis Sentimen Tweet Vaksin COVID-19 Menggunakan Recurrent Neural Network dan Naïve Bayes. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(4), 802-808. https://doi.org/10.29207/resti.v5i4.3308
Wang, Q., Zhang, W., Li, J., Mai, F., & Ma, Z. (2022). Effect of online review sentiment on product sales: The moderating role of review credibility perception. Computers in Human Behavior, 133, 107272. https://doi.org/https://doi.org/10.1016/j.chb.2022.107272
Wang, X., Zhou, T., Wang, X., & Fang, Y. (2022). Harshness-aware sentiment mining framework for product review. Expert Systems with Applications, 187, 115887. https://doi.org/10.1016/j.eswa.2021.115887
Warsito, B., & Prahutama, A. (2020). Sentiment Analysis on Tokopedia Product Online Reviews Using Random Forest Method. The 5th International Conference on Energy, Environmental and Information System (ICENIS 2020). 1-10, EDP Sciences. https://doi.org/10.1051/e3sconf/202020216006
Setiawan, E. B., & Nugraha, F. N. (2019). Implementation of Decision Tree C4. 5 for Big Five Personality Predictions with TF-RF and TF-CHI2 on Social Media Twitter. In 2019 International Conference on Computer, Control, Informatics and its Applications (IC3INA) 114–119. IEEE. https://doi.org/10.1109/IC3INA48034.2019.8949601
Yennimar, Y., & Rizal, R. A. (2019). Comparison of Machine Learning Classification Algorithms in Sentiment Analysis Product Review of North Padang Lawas Regency. Sinkron: jurnal dan penelitian teknik informatika, 4(1), 268-273. https://doi.org/10.33395/sinkron.v4i1.10416
Downloads
Published
Issue
Section
License
Semua tulisan pada jurnal ini adalah tanggung jawab penuh penulis. Edumatic: Jurnal Pendidikan Informatika bisa diakses secara free (gratis) tanpa ada pungutan biaya, sesuai dengan lisensi creative commons yang digunakan.
This work is licensed under a Lisensi a Creative Commons Attribution-ShareAlike 4.0 International License.