Feature Interaction and Performance Analysis of RankSum-Based Extractive Summarization in Indonesian Scientific Articles

Verrino Adityya; Yohannes Yohannes

doi:10.29408/edumatic.v10i1.33443

Authors

Verrino Adityya Universitas Multi Data Palembang https://orcid.org/0009-0006-2690-3250
Yohannes Yohannes Universitas Multi Data Palembang https://orcid.org/0009-0009-3261-2611

DOI:

https://doi.org/10.29408/edumatic.v10i1.33443

Keywords:

extractive summarization, indonesian scientific articles, multi-feature integration, rank fusion, rouge-bertscore evaluation

Abstract

The extractive summarization of Indonesian scientific articles is hindered by a domain mismatch where established methodologies rely on news-corpus assumptions, whereas Indonesian scientific discourse follows rigid, IMRaD-driven structural and lexical patterns. This study aims to systematically analyze feature interaction effects and saturation behaviour in RankSum-based extractive summaries for Indonesian scientific articles. Designed as a controlled comparative experiment, this research evaluates a RankSum framework integrating variables, such as graph-based, semantic-thematic vectors, and structural heuristics. The dataset comprises 2,897 Indonesian journal articles (2021-2025) collected via web scraping from open-access university repositories. Analysis across 31 scenarios demonstrates that for Indonesian scientific articles, the assumption that increasing feature density improves performance is flawed, instead a feature saturation effect occurs. Results show that a 4-feature combination maximizes unigram lexical precision (ROUGE-1 0.3564), whereas the full 5-feature fusion is necessary to preserve global semantic integrity, structural flow, and stable (ROUGE-L 0.2018; BERTScore 0.6977). This study establishes a generalizable principle for domain-aware ATS by demonstrating that overcoming domain mismatch relies on navigating feature saturation through selection aligned with the document’s inherent logic rather than raw feature quantity.

References

Aurelia. M.. Monica. S.. & Girsang. A. S. (2024). Transformer-based abstractive indonesian text summarization. International Journal of Informatics and Communication Technology (IJ-ICT). 13(3). 388. https://doi.org/10.11591/ijict.v13i3.pp388-399

Azam. M.. Khalid. S.. Almutairi. S.. Ali Khattak. H.. Namoun. A.. Ali. A.. & Syed Muhammad Bilal. H. (2025). Current Trends and Advances in Extractive Text Summarization: A Comprehensive Review. IEEE Access. 13. 28150–28166. https://doi.org/10.1109/ACCESS.2025.3538886

Aziz. N. M. A.. Ali. A. A.. Naguib. S. M.. & Fayed. L. S. (2025). Clustering-based topic modeling for biomedical documents extractive text summarization. The Journal of Supercomputing. 81(1). 171. https://doi.org/10.1007/s11227-024-06640-6

Bandaru. R.. & Radhika. Y. (2022). Extractive multi-document text summarization leveraging hybrid semantic similarity measures. International Journal of Advanced Computer Science and Applications. 13(9). 844–852. https://doi.org/10.14569/IJACSA.2022.0130998

Cai. X.. Liu. S.. Yang. L.. Lu. Y.. Zhao. J.. Shen. D.. & Liu. T. (2022). COVIDSum: A linguistically enriched SciBERT-based summarization model for COVID-19 scientific papers. Journal of Biomedical Informatics. 127. 103999. https://doi.org/10.1016/j.jbi.2022.103999

Faisal. M.. Mawaridi. B. H.. Afrah. A. S.. Supriyono. Arif. Y. M.. Aziz. A.. Wijayanti. L.. & Mulyadi. M. (2024). Enhancing Indonesian text summarization with latent dirichlet allocation and maximum marginal relevance. International Journal of Advanced Computer Science and Applications. 15(8). 519–528. https://doi.org/10.14569/IJACSA.2024.0150852

Fitrianah. D.. & Jauhari. R. N. (2022). Extractive text summarization for scientific journal articles using long short-term memory and gated recurrent units. Bulletin of Electrical Engineering and Informatics. 11(1). 150–157. https://doi.org/10.11591/eei.v11i1.3278

Girsang. A. S.. & Amadeus. F. J. (2023). Extractive text summarization for Indonesian news article using ant system algorithm. Journal of Advances in Information Technology. 14(2). 295–301. https://doi.org/10.12720/jait.14.2.295-301

Gulati. V.. Kumar. D.. Popescu. D. E.. & Hemanth. J. D. (2023). Extractive article summarization using integrated TextRank and BM25+ algorithm. Electronics (Switzerland). 12(2). https://doi.org/10.3390/electronics12020372

Joshi. A.. Fidalgo. E.. Alegre. E.. & Alaiz-Rodriguez. R. (2022). RankSum—An unsupervised extractive text summarization based on rank fusion. Expert Systems with Applications. 200(6). 116846. https://doi.org/10.1016/j.eswa.2022.116846

Joshi. A.. Fidalgo. E.. Alegre. E.. & Fernández-Robles. L. (2023). DeepSumm: Exploiting topic models and sequence to sequence networks for extractive text summarization. Expert Systems with Applications. 211. https://doi.org/10.1016/j.eswa.2022.118442

Lau. A. J. J.. & Tan. C. W. (2024). LongT5Rank: A novel integrated hybrid approach for text summarisation. Journal of Telecommunications and the Digital Economy. 12(3). 73–96. https://doi.org/10.18080/jtde.v12n3.977

Liu. R.. Liu. M.. Yu. M.. Jiang. J.. Li. G.. Zhang. D.. Li. J.. Meng. X.. & Huang. W. (2024). GLIMMER: incorporating graph and lexical features in unsupervised multi-document summarization. Frontiers in Artificial Intelligence and Applications. 392. 3709–3716. https://doi.org/10.3233/FAIA240930

Ma. T.. Pan. Q.. Rong. H.. Qian. Y.. Tian. Y.. & Al-Nabhan. N. (2022). T-BERTSum: topic-aware text summarization based on BERT. IEEE Transactions on Computational Social Systems. 9(3). 879–890. https://doi.org/10.1109/TCSS.2021.3088506

Onan. A.. & Alhumyani. H. A. (2024). FuzzyTP-BERT: Enhancing extractive text summarization with fuzzy topic modeling and transformer networks. Journal of King Saud University - Computer and Information Sciences. 36(6). 102080. https://doi.org/10.1016/j.jksuci.2024.102080

Ulker. M.. & Ozer. A. B. (2024). Abstractive summarization model for summarizing scientific article. IEEE Access. 12. 91252–91262. https://doi.org/10.1109/ACCESS.2024.3420163

Wijaya. J.. & Girsang. A. S. (2024). Indonesian news extractive summarization using Lexrank and YAKE algorithm. Statistics. Optimization and Information Computing. 12(6). 1973–1983. https://doi.org/10.19139/soic-2310-5070-1976

Yulianti. E.. Pangestu. N.. & Jiwanggi. M. A. (2023). Enhanced TextRank using weighted word embedding for text summarization. International Journal of Electrical and Computer Engineering. 13(5). 5472–5482. https://doi.org/10.11591/ijece.v13i5.pp5472-5482

Zhang. X.. Wei. Q.. Song. Q.. & Zhang. P. (2024). TOMDS (Topic-Oriented Multi-Document Summarization): enabling personalized customization of multi-document summaries. Applied Sciences. 14(5). 1880. https://doi.org/10.3390/app14051880

Zhou. Y.. Wei. J.. Sun. Y.. & Du. W. (2025). MP-UnSciBioSum: a multi-phase unsupervised document summarization method in scientific and biomedical domains. Journal of King Saud University Computer and Information Sciences. 37(1–2). 11. https://doi.org/10.1007/s44443-025-00004-7