Optimizing XGBoost Performance through Recursive Feature Elimination for Methanol Conversion Prediction

Authors

DOI:

https://doi.org/10.29408/edumatic.v10i1.33509

Keywords:

feature selection, methanol, recursive feature eliminiation (rfe), xgboost

Abstract

The strong nonlinear interaction between catalytic properties and operating conditions complicates accurate space time yield modeling in thermocatalytic carbon dioxide hydrogenation, especially when redundant descriptors are included. Although XGBoost is widely used for predictive tasks, the influence of feature redundancy on generalization and interpretability in carbon dioxide to methanol systems remains insufficiently examined. This study investigates the integration of Recursive Feature Elimination with XGBoost using 639 experimental observations derived from copper based catalysts. Reducing the feature set from fifteen to eight variables improves generalization performance, as indicated by lower prediction error and higher explained variance. The retained variables correspond to key catalytic and operational parameters, including reaction temperature, pressure, and copper content, aligning with established kinetic and mechanistic principles. These results show that eliminating redundant descriptors stabilizes cross validated performance and reduces training complexity without sacrificing predictive accuracy. The reduced model concentrates predictive weight on kinetically relevant variables, providing a clearer quantitative representation of the parameters that govern space time yield in carbon dioxide hydrogenation.

References

Aklilu, E. G., & Bounahmidi, T. (2024). Machine learning applications in catalytic hydrogenation of carbon dioxide to methanol: A comprehensive review. International Journal of Hydrogen Energy, 61, 578–602. https://doi.org/10.1016/j.ijhydene.2024.02.309

Barbieri, M. C., Grisci, B. I., & Dorn, M. (2024). Analysis and comparison of feature selection methods towards performance and stability. Expert Systems with Applications, 249, 123667. https://doi.org/10.1016/j.eswa.2024.123667

Barzani, A. R., Pahlavani, P., Ghorbanzadeh, O., Gholamnia, K., & Ghamisi, P. (2024). Evaluating the Impact of Recursive Feature Elimination on Machine Learning Models for Predicting Forest Fire-Prone Zones. Fire, 7(12), 440. https://doi.org/10.3390/fire7120440

Benjamin, K. J. M., Katipalli, T., & Paquola, A. C. M. (2023). dRFEtools: Dynamic recursive feature elimination for omics. Bioinformatics, 39(8), btad513. https://doi.org/10.1093/bioinformatics/btad513

Bernal, L., Rastelli, G., & Pinzi, L. (2025). Improving Machine Learning Classification Predictions through SHAP and Features Analysis Interpretation. Journal of Chemical Information and Modeling, 65(21), 11716–11732. https://doi.org/10.1021/acs.jcim.5c02015

Chen, C., Liang, J., Sun, W., Yang, G., & Meng, X. (2025). An automatically recursive feature elimination method based on threshold decision in random forest classification. Geo-Spatial Information Science, 28(4), 1494–1519. https://doi.org/10.1080/10095020.2024.2387457

Ding, X., Li, Y., & Chen, S. (2024). Maximum margin and global criterion based-recursive feature selection. Neural Networks, 169, 597–606. https://doi.org/10.1016/j.neunet.2023.10.037

Khalil, M. T., Wu, X., Liu, S., Liu, Y., Ashraf, S., Shen, R., Zhang, H., Peng, Z., Jiang, J., & Li, B. (2025). Recent advancements in catalytic CO2 conversion to methanol: Strategies, innovations, and future directions. Green Chemistry, 27(30), 9016–9054. https://doi.org/10.1039/D5GC01906K

Lamens, A., & Bajorath, J. (2025). Contrastive explanations for machine learning predictions in chemistry. Journal of Cheminformatics, 17(1), 143. https://doi.org/10.1186/s13321-025-01100-6

Lee, Y., Cappellato, M., & Di Camillo, B. (2022). Machine learning–based feature selection to search stable microbial biomarkers: Application to inflammatory bowel disease. GigaScience, 12, giad083. https://doi.org/10.1093/gigascience/giad083

Lv, B., Gong, H., Dong, B., Wang, Z., Guo, H., Wang, J., & Wu, J. (2025). An Explainable XGBoost Model for International Roughness Index Prediction and Key Factor Identification. Applied Sciences, 15(4), 1893. https://doi.org/10.3390/app15041893

Mallikharjuna Rao, K., Saikrishna, G., & Supriya, K. (2023). Data preprocessing techniques: Emergence and selection towards machine learning models - a practical review using HPA dataset. Multimedia Tools and Applications, 82(24), 37177–37196. https://doi.org/10.1007/s11042-023-15087-5

Schwaller, P., Vaucher, A. C., Laplaza, R., Bunne, C., Krause, A., Corminboeuf, C., & Laino, T. (2022). Machine intelligence for chemical reaction space. WIREs Computational Molecular Science, 12(5), e1604. https://doi.org/10.1002/wcms.1604

Shaik, N. B., Jongkittinarukorn, K., & Bingi, K. (2024). XGBoost based enhanced predictive model for handling missing input parameters: A case study on gas turbine. Case Studies in Chemical and Environmental Engineering, 10, 100775. https://doi.org/10.1016/j.cscee.2024.100775

Shao, X., Wang, H., Zhu, X., Xiong, F., Mu, T., & Zhang, Y. (2023). EFFECT: Explainable framework for meta-learning in automatic classification algorithm selection. Information Sciences, 622, 211–234. https://doi.org/10.1016/j.ins.2022.11.144

Shi, R., Yu, G., Huo, X., & Yang, Y. (2024). Prediction of chemical reaction yields with large-scale multi-view pre-training. Journal of Cheminformatics, 16(1), 22. https://doi.org/10.1186/s13321-024-00815-2

Su, Y., Wang, X., Ye, Y., Xie, Y., Xu, Y., Jiang, Y., & Wang, C. (2024). Automation and machine learning augmented by large language models in a catalysis study. Chemical Science, 15(31), 12200–12233. https://doi.org/10.1039/D3SC07012C

Suvarna, M., Araújo, T. P., & Pérez-Ramírez, J. (2022). A generalized machine learning framework to predict the space-time yield of methanol from thermocatalytic CO2 hydrogenation. Applied Catalysis B: Environmental, 315, 121530. https://doi.org/10.1016/j.apcatb.2022.121530

Yao, S., Kronenburg, A., Shamooni, A., Stein, O. T., & Zhang, W. (2022). Gradient boosted decision trees for combustion chemistry integration. Applications in Energy and Combustion Science, 11, 100077. https://doi.org/10.1016/j.jaecs.2022.100077

Zhu, M., Wang, J., Yang, X., Zhang, Y., Zhang, L., Ren, H., Wu, B., & Ye, L. (2022). A review of the application of machine learning in water quality evaluation. Eco-Environment & Health, 1(2), 107–116. https://doi.org/10.1016/j.eehl.2022.06.001

Downloads

Published

2026-03-15

How to Cite

Kurniawan, I. R., Akrom, M. F., Hidayat, N. N., & Naufal, M. (2026). Optimizing XGBoost Performance through Recursive Feature Elimination for Methanol Conversion Prediction. Edumatic: Jurnal Pendidikan Informatika, 10(1), 90–99. https://doi.org/10.29408/edumatic.v10i1.33509