Pengaruh Algoritma ADASYN dan SMOTE terhadap Performa Support Vector Machine pada Ketidakseimbangan Dataset Airbnb

Wahyu Hidayat, Mursyid Ardiansyah, Arief Setyanto

Abstract


Traveling activities are increasingly being carried out by people in the world. Some tourist attractions are difficult to reach hotels because some tourist attractions are far from the city center, Airbnb is a platform that provides home or apartment-based rentals. In lodging offers, there are two types of hosts, namely non-super host and super host. The super-host badge is obtained if the innkeeper has a good reputation and meets the requirements. There are advantages to being a super host such as having more visibility, increased earning potential and exclusive rewards. Support Vector Machine (SVM) algorithm classification process by these criteria data. Data set is unbalanced. The super host population is smaller than the non-super host. Overcoming the imbalance, this over sampling technique is carried out using ADASYN and SMOTE. Research goal was to decide the performance of ADASYN and sampling technique, SVM algorithm.  Data analyse used over sampling which aims to handle unbalanced data sets, and confusion matrix used for testing Precision, Recall, and F1-SCORE, and Accuracy. Research shows that SMOTE SVM increases the accuracy rate by 1 percent from 80% to 81%, which is influenced by the increase in the True (minority) label test results and a decrease in the False label test results (majority), the SMOTE SVM is better than ADASYN SVM, and SVM without over sampling.


Keywords


ADASYN; Classification; SMOTE; SVM; Traveling

Full Text:

PDF

References


Ahmad, I., Basheri, M., Iqbal, M. J., & Rahim, A. (2018). Performance Comparison of Support Vector Machine, Random Forest, and Extreme Learning Machine for Intrusion Detection. IEEE Access, 6, 33789–33795. https://doi.org/10.1109/ACCESS.2018.2841987

Alsmadi, I., & Hoon, G. K. (2019). Term weighting scheme for short-text classification: Twitter corpuses. Neural Computing and Applications, 31(8), 3819–3831. https://doi.org/10.1007/s00521-017-3298-8

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357. https://doi.org/10.1002/eap.2043

Chen, C. C., & Chang, Y. C. (2018). What drives purchase intention on Airbnb? Perspectives of consumer reviews, information quality, and media richness. Telematics and Informatics, 35(5), 1512–1523. https://doi.org/10.1016/j.tele.2018.03.019

Crommelin, L., Troy, L., Martin, C., & Pettit, C. (2018). Is Airbnb a Sharing Economy Superstar? Evidence from Five Global Cities. Urban Policy and Research, 36(4), 429–444. https://doi.org/10.1080/08111146.2018.1460722

Düntsch, I., & Gediga, G. (2020). Indices for rough set approximation and the application to confusion matrices. International Journal of Approximate Reasoning, 118, 155–172. https://doi.org/10.1016/j.ijar.2019.12.008

Fico, G., Montalva, J., Medrano, A., Liappas, N., Cea, G., & Arredondo, M. T. (2018). EMBEC & NBC 2017. IFMBE Proceedings, 65, 1089–1090. https://doi.org/10.1007/978-981-10-5122-7

Guttentag, D., Smith, S., Potwarka, L., & Havitz, M. (2018). Why Tourists Choose Airbnb: A Motivation-Based Segmentation Study. Journal of Travel Research, 57(3), 342–359. https://doi.org/10.1177/0047287517696980

Harianto, H., Sunyoto, A., & Sudarmawan, S. (2020). Optimasi Algoritma Naïve Bayes Classifier untuk Mendeteksi Anomaly dengan Univariate Fitur Selection. Edumatic: Jurnal Pendidikan Informatika, 4(2), 40–49. https://doi.org/10.29408/edumatic.v4i2.2433

He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Proceedings of the International Joint Conference on Neural Networks, (3), 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969

Kusumawati, R., D’Arofah, A., & Pramana, P. A. (2019). Comparison Performance of Naive Bayes Classifier and Support Vector Machine Algorithm for Twitter’s Classification of Tokopedia Services. Journal of Physics: Conference Series, 1320(1), 0–10. https://doi.org/10.1088/1742-6596/1320/1/012016

Patil, N. M., & Nemade, M. U. (2017). Music Genre Classification Using MFCC , K-NN and SVM Classifier. International Journal of Computer Applications, 4(2), 43–47.

Pucci, F., & Rooman, M. (2017). Airbnb recsys. Kdd, 311–320. https://doi.org/10.1145/3219819.3219885

Rimal, B., Rijal, S., & Kunwar, R. (2020). Comparing Support Vector Machines and Maximum Likelihood Classifiers for Mapping of Urbanization. Journal of the Indian Society of Remote Sensing, 48(1), 71–79. https://doi.org/10.1007/s12524-019-01056-9

Rustam, Z., & Audia Ariantari, N. P. A. (2018). Support Vector Machines for Classifying Policyholders Satisfactorily in Automobile Insurance. Journal of Physics: Conference Series, 1028(1). https://doi.org/10.1088/1742-6596/1028/1/012005

Sari, V., Firdausi, F., & Azhar, Y. (2020). Perbandingan Prediksi Kualitas Kopi Arabika dengan Menggunakan Algoritma SGD, Random Forest dan Naive Bayes. Edumatic: Jurnal Pendidikan Informatika, 4(2), 1–9. https://doi.org/10.29408/edumatic.v4i2.2202

Thanh Noi, P., & Kappas, M. (2017). Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery. Sensors (Basel, Switzerland), 18(1). https://doi.org/10.3390/s18010018

Zubrinic, K., Milicevic, M., & Zakarija, I. (2013). Comparison of Naïve Bayes and SVM Classifiers in Categorization of Concept Maps. International Journal of Computers, 7(3), 109–116.




DOI: https://doi.org/10.29408/edumatic.v5i1.3125

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

 

Statistik Pengunjung

Creative Commons License

Edumatic: Jurnal Pendidikan Informatika is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.