Mitigating Class Imbalance in Indonesian Sarcasm Detection: A Cross-Platform Transformer Study

A. Salky Maulana; I Made Artha Agastya

doi:10.29408/edumatic.v10i1.33724

Authors

A. Salky Maulana Universitas Amikom Yogyakarta https://orcid.org/0009-0002-9977-055X
I Made Artha Agastya Universitas Amikom Yogyakarta https://orcid.org/0000-0002-8739-5767

DOI:

https://doi.org/10.29408/edumatic.v10i1.33724

Keywords:

class imbalance, cross-domain robustness, generative data augmentation, sarcasm detection, transformer models

Abstract

Sarcasm detection in Indonesian social media remains challenging due to implicit pragmatic expressions, severe class imbalance, and strong domain variation across platforms. Unlike prior Indonesian sarcasm studies that predominantly focus on in-domain accuracy using conventional balancing methods, this study provides the first systematic cross-platform analysis of generative data balancing under domain shift. We empirically examine whether GPT-4o based generative balancing improves robustness rather than accuracy-centric evaluation in Transformer-based sarcasm detection. Models trained on Twitter data are evaluated across Twitter, Reddit, and TikTok as an unseen domain. The results show that generative balancing yields limited gains in in-domain evaluation but consistently improves cross-domain robustness by increasing sarcasm recall, particularly for Base models. Notably, XLM-R Base achieves an absolute F1-score improvement of +10.8 points on TikTok, while IndoBERT-Large attains the highest in-domain F1-score of 0.7444. These findings indicate that generative augmentation partially mitigates class imbalance by enhancing robustness under domain shift, thereby repositioning sarcasm detection as a robustness-oriented problem and highlighting generative balancing as a complementary strategy rather than a substitute for larger Transformer models in cross-platform NLP settings.

References

A’la, F. Y. (2025). Optimasi Klasifikasi Sentimen Ulasan Game Berbahasa Indonesia: IndoBERT dan SMOTE untuk Menangani Ketidakseimbangan Kelas. Edumatic: Jurnal Pendidikan Informatika, 9(1), 256–265. https://doi.org/10.29408/edumatic.v9i1.29666

An, T., Yan, P., Zuo, J., Jin, X., Liu, M., & Wang, J. (2024). Enhancing Cross-Lingual Sarcasm Detection by a Prompt Learning Framework with Data Augmentation and Contrastive Learning. Electronics (Switzerland), 13(11). https://doi.org/10.3390/electronics13112163

Bayer, M., Kaufhold, M. A., & Reuter, C. (2023). A Survey on Data Augmentation for Text Classification. ACM Computing Surveys, 55(7). https://doi.org/10.1145/3544558

Chen, Z., Zhang, J. M., Sarro, F., & Harman, M. (2023). A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers. ACM Transactions on Software Engineering and Methodology, 32(4). https://doi.org/10.1145/3583561

Dogra, V., Verma, S., Kavita, Wozniak, M., Shafi, J., & Ijaz, M. F. (2024). Shortcut Learning Explanations for Deep Natural Language Processing: A Survey on Dataset Biases. IEEE Access, 12, 26183–26195. https://doi.org/10.1109/ACCESS.2024.3360306

Fei, H., Chua, T. S., Li, C., Ji, D., Zhang, M., & Ren, Y. (2022). On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model, Data, and Training. ACM Transactions on Information Systems, 41(2). https://doi.org/10.1145/3564281

Gedela, R. T., Baruah, U., & Soni, B. (2024). Deep Contextualised Text Representation and Learning for Sarcasm Detection. Arabian Journal for Science and Engineering, 49(3), 3719–3734. https://doi.org/10.1007/s13369-023-08170-4

Helal, N. A., Hassan, A., Badr, N. L., & Afify, Y. M. (2024). A contextual-based approach for sarcasm detection. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-65217-8

Herawan, D. F., & Saputri, T. R. D. (2025). Benchmarking Model Transformer Modern untuk Analisis Sentimen dan Tren Konsumen dalam Industri Fashion. Edumatic: Jurnal Pendidikan Informatika, 9(3), 945–954. https://doi.org/10.29408/edumatic.v9i3.32657

Hu, Y. H., Liu, T. H., Tsai, C. F., & Lin, Y. J. (2025). Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques. Applied Artificial Intelligence, 39(1). https://doi.org/10.1080/08839514.2025.2468534

Hupkes, D., Giulianelli, M., Dankers, V., Artetxe, M., Elazar, Y., Pimentel, T., Christodoulopoulos, C., Lasri, K., Saphra, N., Sinclair, A., Ulmer, D., Schottmann, F., Batsuren, K., Sun, K., Sinha, K., Khalatbari, L., Ryskina, M., Frieske, R., Cotterell, R., & Jin, Z. (2023). A taxonomy and review of generalization research in NLP. Nature Machine Intelligence, 5(10), 1161–1174. https://doi.org/10.1038/s42256-023-00729-y

Javid, B., & Mashayekhi, H. (2025). Classification of imbalanced user reviews using a generative approach. Social Network Analysis and Mining, 15(1). https://doi.org/10.1007/s13278-025-01477-0

Liebeskind, C., & Bączkowska, A. (2025). Sarcastic comments on Reddit and Twitter. Topics in Linguistics, 26(1), 174–193. https://doi.org/10.17846/topling-2025-0008

Liu, H., Yang, B., & Yu, Z. (2024). A Multi-View Interactive Approach for Multimodal Sarcasm Detection in Social Internet of Things with Knowledge Enhancement. Applied Sciences (Switzerland), 14(5). https://doi.org/10.3390/app14052146

Nasution, A. H., Onan, A., Murakami, Y., Monika, W., & Hanafiah, A. (2025). Benchmarking Open-Source Large Language Models for Sentiment and Emotion Classification in Indonesian Tweets. IEEE Access, 13, 94009–94025. https://doi.org/10.1109/ACCESS.2025.3574629

Pandey, R., & Singh, J. P. (2023). BERT-LSTM model for sarcasm detection in code-mixed social media post. Journal of Intelligent Information Systems, 60(1), 235–254. https://doi.org/10.1007/s10844-022-00755-z

Šandor, D., & Bagić Babac, M. (2024). Sarcasm detection in online comments using machine learning. Information Discovery and Delivery, 52(2), 213–226. https://doi.org/10.1108/IDD-01-2023-0002

Suhartono, D., Wongso, W., & Tri Handoyo, A. (2024). IdSarcasm: Benchmarking and Evaluating Language Models for Indonesian Sarcasm Detection. IEEE Access, 12, 87323–87332. https://doi.org/10.1109/ACCESS.2024.3416955

Sujana, Y., & Kao, H. Y. (2023). LiDA: Language-Independent Data Augmentation for Text Classification. IEEE Access, 11, 10894–10901. https://doi.org/10.1109/ACCESS.2023.3234019

Sukhavasi, V., & Dondeti, V. (2023). Effective Automated Transformer Model based Sarcasm Detection Using Multilingual Data. Multimedia Tools and Applications, 83(16), 47531–47562. https://doi.org/10.1007/s11042-023-17302-9

Thakkar, G., Preradović, N. M., & Tadić, M. (2024). Examining Sentiment Analysis for Low-Resource Languages with Data Augmentation Techniques. Eng, 5(4), 2920–2942. https://doi.org/10.3390/eng5040152

Zhao, H., Chen, H., Ruggles, T. A., Feng, Y., Singh, D., & Yoon, H. J. (2024). Improving Text Classification with Large Language Model-Based Data Augmentation. Electronics (Switzerland), 13(13). https://doi.org/10.3390/electronics13132535