Optimizing XGBoost Performance through Recursive Feature Elimination for Methanol Conversion Prediction
DOI:
https://doi.org/10.29408/edumatic.v10i1.33509Keywords:
feature selection, methanol, recursive feature eliminiation (rfe), xgboostAbstract
The strong nonlinear interaction between catalytic properties and operating conditions complicates accurate space time yield modeling in thermocatalytic carbon dioxide hydrogenation, especially when redundant descriptors are included. Although XGBoost is widely used for predictive tasks, the influence of feature redundancy on generalization and interpretability in carbon dioxide to methanol systems remains insufficiently examined. This study investigates the integration of Recursive Feature Elimination with XGBoost using 639 experimental observations derived from copper based catalysts. Reducing the feature set from fifteen to eight variables improves generalization performance, as indicated by lower prediction error and higher explained variance. The retained variables correspond to key catalytic and operational parameters, including reaction temperature, pressure, and copper content, aligning with established kinetic and mechanistic principles. These results show that eliminating redundant descriptors stabilizes cross validated performance and reduces training complexity without sacrificing predictive accuracy. The reduced model concentrates predictive weight on kinetically relevant variables, providing a clearer quantitative representation of the parameters that govern space time yield in carbon dioxide hydrogenation.
References
Aklilu, E. G., & Bounahmidi, T. (2024). Machine learning applications in catalytic hydrogenation of carbon dioxide to methanol: A comprehensive review. International Journal of Hydrogen Energy, 61, 578–602. https://doi.org/10.1016/j.ijhydene.2024.02.309
Barbieri, M. C., Grisci, B. I., & Dorn, M. (2024). Analysis and comparison of feature selection methods towards performance and stability. Expert Systems with Applications, 249, 123667. https://doi.org/10.1016/j.eswa.2024.123667
Barzani, A. R., Pahlavani, P., Ghorbanzadeh, O., Gholamnia, K., & Ghamisi, P. (2024). Evaluating the Impact of Recursive Feature Elimination on Machine Learning Models for Predicting Forest Fire-Prone Zones. Fire, 7(12), 440. https://doi.org/10.3390/fire7120440
Benjamin, K. J. M., Katipalli, T., & Paquola, A. C. M. (2023). dRFEtools: Dynamic recursive feature elimination for omics. Bioinformatics, 39(8), btad513. https://doi.org/10.1093/bioinformatics/btad513
Bernal, L., Rastelli, G., & Pinzi, L. (2025). Improving Machine Learning Classification Predictions through SHAP and Features Analysis Interpretation. Journal of Chemical Information and Modeling, 65(21), 11716–11732. https://doi.org/10.1021/acs.jcim.5c02015
Chen, C., Liang, J., Sun, W., Yang, G., & Meng, X. (2025). An automatically recursive feature elimination method based on threshold decision in random forest classification. Geo-Spatial Information Science, 28(4), 1494–1519. https://doi.org/10.1080/10095020.2024.2387457
Ding, X., Li, Y., & Chen, S. (2024). Maximum margin and global criterion based-recursive feature selection. Neural Networks, 169, 597–606. https://doi.org/10.1016/j.neunet.2023.10.037
Khalil, M. T., Wu, X., Liu, S., Liu, Y., Ashraf, S., Shen, R., Zhang, H., Peng, Z., Jiang, J., & Li, B. (2025). Recent advancements in catalytic CO2 conversion to methanol: Strategies, innovations, and future directions. Green Chemistry, 27(30), 9016–9054. https://doi.org/10.1039/D5GC01906K
Lamens, A., & Bajorath, J. (2025). Contrastive explanations for machine learning predictions in chemistry. Journal of Cheminformatics, 17(1), 143. https://doi.org/10.1186/s13321-025-01100-6
Lee, Y., Cappellato, M., & Di Camillo, B. (2022). Machine learning–based feature selection to search stable microbial biomarkers: Application to inflammatory bowel disease. GigaScience, 12, giad083. https://doi.org/10.1093/gigascience/giad083
Lv, B., Gong, H., Dong, B., Wang, Z., Guo, H., Wang, J., & Wu, J. (2025). An Explainable XGBoost Model for International Roughness Index Prediction and Key Factor Identification. Applied Sciences, 15(4), 1893. https://doi.org/10.3390/app15041893
Mallikharjuna Rao, K., Saikrishna, G., & Supriya, K. (2023). Data preprocessing techniques: Emergence and selection towards machine learning models - a practical review using HPA dataset. Multimedia Tools and Applications, 82(24), 37177–37196. https://doi.org/10.1007/s11042-023-15087-5
Schwaller, P., Vaucher, A. C., Laplaza, R., Bunne, C., Krause, A., Corminboeuf, C., & Laino, T. (2022). Machine intelligence for chemical reaction space. WIREs Computational Molecular Science, 12(5), e1604. https://doi.org/10.1002/wcms.1604
Shaik, N. B., Jongkittinarukorn, K., & Bingi, K. (2024). XGBoost based enhanced predictive model for handling missing input parameters: A case study on gas turbine. Case Studies in Chemical and Environmental Engineering, 10, 100775. https://doi.org/10.1016/j.cscee.2024.100775
Shao, X., Wang, H., Zhu, X., Xiong, F., Mu, T., & Zhang, Y. (2023). EFFECT: Explainable framework for meta-learning in automatic classification algorithm selection. Information Sciences, 622, 211–234. https://doi.org/10.1016/j.ins.2022.11.144
Shi, R., Yu, G., Huo, X., & Yang, Y. (2024). Prediction of chemical reaction yields with large-scale multi-view pre-training. Journal of Cheminformatics, 16(1), 22. https://doi.org/10.1186/s13321-024-00815-2
Su, Y., Wang, X., Ye, Y., Xie, Y., Xu, Y., Jiang, Y., & Wang, C. (2024). Automation and machine learning augmented by large language models in a catalysis study. Chemical Science, 15(31), 12200–12233. https://doi.org/10.1039/D3SC07012C
Suvarna, M., Araújo, T. P., & Pérez-Ramírez, J. (2022). A generalized machine learning framework to predict the space-time yield of methanol from thermocatalytic CO2 hydrogenation. Applied Catalysis B: Environmental, 315, 121530. https://doi.org/10.1016/j.apcatb.2022.121530
Yao, S., Kronenburg, A., Shamooni, A., Stein, O. T., & Zhang, W. (2022). Gradient boosted decision trees for combustion chemistry integration. Applications in Energy and Combustion Science, 11, 100077. https://doi.org/10.1016/j.jaecs.2022.100077
Zhu, M., Wang, J., Yang, X., Zhang, Y., Zhang, L., Ren, H., Wu, B., & Ye, L. (2022). A review of the application of machine learning in water quality evaluation. Eco-Environment & Health, 1(2), 107–116. https://doi.org/10.1016/j.eehl.2022.06.001
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ibnu Richo Kurniawan, Muhamad Febrian Akrom, Novianto Nur Hidayat, Muhammad Naufal

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All articles in this journal are the sole responsibility of the authors. Edumatic: Jurnal Pendidikan Informatika can be accessed free of charge, in accordance with the Creative Commons license used.

This work is licensed under a Lisensi a Creative Commons Attribution-ShareAlike 4.0 International License.


