Feature Interaction and Performance Analysis of RankSum-Based Extractive Summarization in Indonesian Scientific Articles
DOI:
https://doi.org/10.29408/edumatic.v10i1.33443Keywords:
extractive summarization, indonesian scientific articles, multi-feature integration, rank fusion, rouge-bertscore evaluationAbstract
The extractive summarization of Indonesian scientific articles is hindered by a domain mismatch where established methodologies rely on news-corpus assumptions, whereas Indonesian scientific discourse follows rigid, IMRaD-driven structural and lexical patterns. This study aims to systematically analyze feature interaction effects and saturation behaviour in RankSum-based extractive summaries for Indonesian scientific articles. Designed as a controlled comparative experiment, this research evaluates a RankSum framework integrating variables, such as graph-based, semantic-thematic vectors, and structural heuristics. The dataset comprises 2,897 Indonesian journal articles (2021-2025) collected via web scraping from open-access university repositories. Analysis across 31 scenarios demonstrates that for Indonesian scientific articles, the assumption that increasing feature density improves performance is flawed, instead a feature saturation effect occurs. Results show that a 4-feature combination maximizes unigram lexical precision (ROUGE-1 0.3564), whereas the full 5-feature fusion is necessary to preserve global semantic integrity, structural flow, and stable (ROUGE-L 0.2018; BERTScore 0.6977). This study establishes a generalizable principle for domain-aware ATS by demonstrating that overcoming domain mismatch relies on navigating feature saturation through selection aligned with the document’s inherent logic rather than raw feature quantity.
References
Aurelia. M.. Monica. S.. & Girsang. A. S. (2024). Transformer-based abstractive indonesian text summarization. International Journal of Informatics and Communication Technology (IJ-ICT). 13(3). 388. https://doi.org/10.11591/ijict.v13i3.pp388-399
Azam. M.. Khalid. S.. Almutairi. S.. Ali Khattak. H.. Namoun. A.. Ali. A.. & Syed Muhammad Bilal. H. (2025). Current Trends and Advances in Extractive Text Summarization: A Comprehensive Review. IEEE Access. 13. 28150–28166. https://doi.org/10.1109/ACCESS.2025.3538886
Aziz. N. M. A.. Ali. A. A.. Naguib. S. M.. & Fayed. L. S. (2025). Clustering-based topic modeling for biomedical documents extractive text summarization. The Journal of Supercomputing. 81(1). 171. https://doi.org/10.1007/s11227-024-06640-6
Bandaru. R.. & Radhika. Y. (2022). Extractive multi-document text summarization leveraging hybrid semantic similarity measures. International Journal of Advanced Computer Science and Applications. 13(9). 844–852. https://doi.org/10.14569/IJACSA.2022.0130998
Cai. X.. Liu. S.. Yang. L.. Lu. Y.. Zhao. J.. Shen. D.. & Liu. T. (2022). COVIDSum: A linguistically enriched SciBERT-based summarization model for COVID-19 scientific papers. Journal of Biomedical Informatics. 127. 103999. https://doi.org/10.1016/j.jbi.2022.103999
Faisal. M.. Mawaridi. B. H.. Afrah. A. S.. Supriyono. Arif. Y. M.. Aziz. A.. Wijayanti. L.. & Mulyadi. M. (2024). Enhancing Indonesian text summarization with latent dirichlet allocation and maximum marginal relevance. International Journal of Advanced Computer Science and Applications. 15(8). 519–528. https://doi.org/10.14569/IJACSA.2024.0150852
Fitrianah. D.. & Jauhari. R. N. (2022). Extractive text summarization for scientific journal articles using long short-term memory and gated recurrent units. Bulletin of Electrical Engineering and Informatics. 11(1). 150–157. https://doi.org/10.11591/eei.v11i1.3278
Girsang. A. S.. & Amadeus. F. J. (2023). Extractive text summarization for Indonesian news article using ant system algorithm. Journal of Advances in Information Technology. 14(2). 295–301. https://doi.org/10.12720/jait.14.2.295-301
Gulati. V.. Kumar. D.. Popescu. D. E.. & Hemanth. J. D. (2023). Extractive article summarization using integrated TextRank and BM25+ algorithm. Electronics (Switzerland). 12(2). https://doi.org/10.3390/electronics12020372
Joshi. A.. Fidalgo. E.. Alegre. E.. & Alaiz-Rodriguez. R. (2022). RankSum—An unsupervised extractive text summarization based on rank fusion. Expert Systems with Applications. 200(6). 116846. https://doi.org/10.1016/j.eswa.2022.116846
Joshi. A.. Fidalgo. E.. Alegre. E.. & Fernández-Robles. L. (2023). DeepSumm: Exploiting topic models and sequence to sequence networks for extractive text summarization. Expert Systems with Applications. 211. https://doi.org/10.1016/j.eswa.2022.118442
Lau. A. J. J.. & Tan. C. W. (2024). LongT5Rank: A novel integrated hybrid approach for text summarisation. Journal of Telecommunications and the Digital Economy. 12(3). 73–96. https://doi.org/10.18080/jtde.v12n3.977
Liu. R.. Liu. M.. Yu. M.. Jiang. J.. Li. G.. Zhang. D.. Li. J.. Meng. X.. & Huang. W. (2024). GLIMMER: incorporating graph and lexical features in unsupervised multi-document summarization. Frontiers in Artificial Intelligence and Applications. 392. 3709–3716. https://doi.org/10.3233/FAIA240930
Ma. T.. Pan. Q.. Rong. H.. Qian. Y.. Tian. Y.. & Al-Nabhan. N. (2022). T-BERTSum: topic-aware text summarization based on BERT. IEEE Transactions on Computational Social Systems. 9(3). 879–890. https://doi.org/10.1109/TCSS.2021.3088506
Onan. A.. & Alhumyani. H. A. (2024). FuzzyTP-BERT: Enhancing extractive text summarization with fuzzy topic modeling and transformer networks. Journal of King Saud University - Computer and Information Sciences. 36(6). 102080. https://doi.org/10.1016/j.jksuci.2024.102080
Ulker. M.. & Ozer. A. B. (2024). Abstractive summarization model for summarizing scientific article. IEEE Access. 12. 91252–91262. https://doi.org/10.1109/ACCESS.2024.3420163
Wijaya. J.. & Girsang. A. S. (2024). Indonesian news extractive summarization using Lexrank and YAKE algorithm. Statistics. Optimization and Information Computing. 12(6). 1973–1983. https://doi.org/10.19139/soic-2310-5070-1976
Yulianti. E.. Pangestu. N.. & Jiwanggi. M. A. (2023). Enhanced TextRank using weighted word embedding for text summarization. International Journal of Electrical and Computer Engineering. 13(5). 5472–5482. https://doi.org/10.11591/ijece.v13i5.pp5472-5482
Zhang. X.. Wei. Q.. Song. Q.. & Zhang. P. (2024). TOMDS (Topic-Oriented Multi-Document Summarization): enabling personalized customization of multi-document summaries. Applied Sciences. 14(5). 1880. https://doi.org/10.3390/app14051880
Zhou. Y.. Wei. J.. Sun. Y.. & Du. W. (2025). MP-UnSciBioSum: a multi-phase unsupervised document summarization method in scientific and biomedical domains. Journal of King Saud University Computer and Information Sciences. 37(1–2). 11. https://doi.org/10.1007/s44443-025-00004-7
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Verrino Adityya, Yohannes Yohannes

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All articles in this journal are the sole responsibility of the authors. Edumatic: Jurnal Pendidikan Informatika can be accessed free of charge, in accordance with the Creative Commons license used.

This work is licensed under a Lisensi a Creative Commons Attribution-ShareAlike 4.0 International License.


