Pengaruh Algoritma ADASYN dan SMOTE terhadap Performa Support Vector Machine pada Ketidakseimbangan Dataset Airbnb

Wahyu Hidayat; Mursyid Ardiansyah; Arief Setyanto

doi:10.29408/edumatic.v5i1.3125

Authors

Wahyu Hidayat Program Studi Teknik Informatika, Universitas Jenderal Soedirman http://orcid.org/0000-0001-8692-8579
Mursyid Ardiansyah Program Studi Teknik Informatika, Universitas Jenderal Soedirman http://orcid.org/0000-0002-5121-4450
Arief Setyanto Program Studi Teknik Informatika, Universitas Jenderal Soedirman http://orcid.org/0000-0003-0721-3941

DOI:

https://doi.org/10.29408/edumatic.v5i1.3125

Keywords:

ADASYN, Classification, SMOTE, SVM, Traveling

Abstract

Traveling activities are increasingly being carried out by people in the world. Some tourist attractions are difficult to reach hotels because some tourist attractions are far from the city center, Airbnb is a platform that provides home or apartment-based rentals. In lodging offers, there are two types of hosts, namely non-super host and super host. The super-host badge is obtained if the innkeeper has a good reputation and meets the requirements. There are advantages to being a super host such as having more visibility, increased earning potential and exclusive rewards. Support Vector Machine (SVM) algorithm classification process by these criteria data. Data set is unbalanced. The super host population is smaller than the non-super host. Overcoming the imbalance, this over sampling technique is carried out using ADASYN and SMOTE. Research goal was to decide the performance of ADASYN and sampling technique, SVM algorithm. Data analyse used over sampling which aims to handle unbalanced data sets, and confusion matrix used for testing Precision, Recall, and F1-SCORE, and Accuracy. Research shows that SMOTE SVM increases the accuracy rate by 1 percent from 80% to 81%, which is influenced by the increase in the True (minority) label test results and a decrease in the False label test results (majority), the SMOTE SVM is better than ADASYN SVM, and SVM without over sampling.

References

Ahmad, I., Basheri, M., Iqbal, M. J., & Rahim, A. (2018). Performance Comparison of Support Vector Machine, Random Forest, and Extreme Learning Machine for Intrusion Detection. IEEE Access, 6, 33789â€“33795. https://doi.org/10.1109/ACCESS.2018.2841987

Alsmadi, I., & Hoon, G. K. (2019). Term weighting scheme for short-text classification: Twitter corpuses. Neural Computing and Applications, 31(8), 3819â€“3831. https://doi.org/10.1007/s00521-017-3298-8

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321â€“357. https://doi.org/10.1002/eap.2043

Chen, C. C., & Chang, Y. C. (2018). What drives purchase intention on Airbnb? Perspectives of consumer reviews, information quality, and media richness. Telematics and Informatics, 35(5), 1512â€“1523. https://doi.org/10.1016/j.tele.2018.03.019

Crommelin, L., Troy, L., Martin, C., & Pettit, C. (2018). Is Airbnb a Sharing Economy Superstar? Evidence from Five Global Cities. Urban Policy and Research, 36(4), 429â€“444. https://doi.org/10.1080/08111146.2018.1460722

DÃ¼ntsch, I., & Gediga, G. (2020). Indices for rough set approximation and the application to confusion matrices. International Journal of Approximate Reasoning, 118, 155â€“172. https://doi.org/10.1016/j.ijar.2019.12.008

Fico, G., Montalva, J., Medrano, A., Liappas, N., Cea, G., & Arredondo, M. T. (2018). EMBEC & NBC 2017. IFMBE Proceedings, 65, 1089â€“1090. https://doi.org/10.1007/978-981-10-5122-7

Guttentag, D., Smith, S., Potwarka, L., & Havitz, M. (2018). Why Tourists Choose Airbnb: A Motivation-Based Segmentation Study. Journal of Travel Research, 57(3), 342â€“359. https://doi.org/10.1177/0047287517696980

Harianto, H., Sunyoto, A., & Sudarmawan, S. (2020). Optimasi Algoritma NaÃ¯ve Bayes Classifier untuk Mendeteksi Anomaly dengan Univariate Fitur Selection. Edumatic: Jurnal Pendidikan Informatika, 4(2), 40â€“49. https://doi.org/10.29408/edumatic.v4i2.2433

He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Proceedings of the International Joint Conference on Neural Networks, (3), 1322â€“1328. https://doi.org/10.1109/IJCNN.2008.4633969

Kusumawati, R., Dâ€™Arofah, A., & Pramana, P. A. (2019). Comparison Performance of Naive Bayes Classifier and Support Vector Machine Algorithm for Twitterâ€™s Classification of Tokopedia Services. Journal of Physics: Conference Series, 1320(1), 0â€“10. https://doi.org/10.1088/1742-6596/1320/1/012016

Patil, N. M., & Nemade, M. U. (2017). Music Genre Classification Using MFCC , K-NN and SVM Classifier. International Journal of Computer Applications, 4(2), 43â€“47.

Pucci, F., & Rooman, M. (2017). Airbnb recsys. Kdd, 311â€“320. https://doi.org/10.1145/3219819.3219885

Rimal, B., Rijal, S., & Kunwar, R. (2020). Comparing Support Vector Machines and Maximum Likelihood Classifiers for Mapping of Urbanization. Journal of the Indian Society of Remote Sensing, 48(1), 71â€“79. https://doi.org/10.1007/s12524-019-01056-9

Rustam, Z., & Audia Ariantari, N. P. A. (2018). Support Vector Machines for Classifying Policyholders Satisfactorily in Automobile Insurance. Journal of Physics: Conference Series, 1028(1). https://doi.org/10.1088/1742-6596/1028/1/012005

Sari, V., Firdausi, F., & Azhar, Y. (2020). Perbandingan Prediksi Kualitas Kopi Arabika dengan Menggunakan Algoritma SGD, Random Forest dan Naive Bayes. Edumatic: Jurnal Pendidikan Informatika, 4(2), 1â€“9. https://doi.org/10.29408/edumatic.v4i2.2202

Thanh Noi, P., & Kappas, M. (2017). Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery. Sensors (Basel, Switzerland), 18(1). https://doi.org/10.3390/s18010018

Zubrinic, K., Milicevic, M., & Zakarija, I. (2013). Comparison of NaÃ¯ve Bayes and SVM Classifiers in Categorization of Concept Maps. International Journal of Computers, 7(3), 109â€“116.