The Implementation of Automated Speech Recognition (ASR) in ELT Classroom: A Systematic Literature Review from 2012-2023




Speaking Assessment, Automated Speech Recognition (ASR), speech technology, speaking proficiency, oral communication


Automated Speech Recognition (ASR) turns speech audio streams into text. The use of computer-based voice recognition is beneficial for teaching pronunciation. It may also be used to evaluate a learner's speech in a broader context and set up the potential for creating aural interactions between the learner and the computer. The study is aimed to investigate the use of ASR in speaking assessment using a systematic literature review. Automated scores should be considered based on their validity and any potential issues or mistakes with employing technology (ASR) to evaluate speaking. Although some research has been carried out on the use of technology in speaking assessment, there have been few empirical studies using systematic literature reviews to explore more about ASR in education. Therefore, this study is aimed to provide an updated and comprehensive review of the implementation of Automated Speech Recognition (ASR) in education. Ten studies from journals cited in the Taylor & Francis, Wiley, and Springer databases were selected.  The results show the benefits of ASR in education including students' progress, interaction, and pedagogical contributions. Pedagogical contributions provide collaboration between the human and automated scores. Teachers can use this study to improve speaking assessments using ASR in the classroom.

Author Biography

Ngadiso Sutomo, Sebelas Maret University

English Education Department


Ahn, T. Y., & Lee, S. M. (2015). User experience of a mobile speaking application with automatic speech recognition for EFL learning. British Journal of Educational Technology, 47(4), 778–786.

Bashori, M., Van, H. R., Strik, H., & Cucchiarini, C. (2022). “Look, I can speak correctly”: learning vocabulary and pronunciation through websites equipped with automatic speech recognition technology. Computer Assisted Language Learning, 1–29.

Carrier, M. (2017). Automated Speech Recognition in language learning: potential models, benefits and impact. Training, Language and Culture, 1(1), 46-61.

Chan, D. M. & Ghosh, S. (2022). Content-context factorized

representations for automated speech recognition. Interspeech 2022. Doi:

Coniam, D. (1999). Voice recognition software accuracy with second language speakers of English. System, 27(1), 49–64.

Cucchiarini, C., Strik, H., & Boves, L. (2002). Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech. Journal of the Acoustical Society of America, 111(6), 2862–2873.

Cucchiarini, C., Van, D. J., & Strik, H. (2010). Fluency in non-native read and spontaneous speech. Proceedings of the DiSS-LPSS Joint Workshop 2010, 15–18. Tokyo, Japan: University of Tokyo.

Derwing, T. M., Munro, M. J. & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34(3), 592–603.

Gu, L., Davis, L., Tao, J., & Zechner, K. (2020). Using spoken language technology for generating feedback to prepare for the Toefl ibt® test: A user perception study. Assessment in Education: Principles, Policy, and Practice, 28(1), 58–76

Ivanov, A. V., Ramanarayanan, V., Suendermann-Oeft, D., Lopez, M., Evanini, K., & Tao, J. (2015). Automated Speech Recognition Technology for dialogue interaction with Non-Native interlocutors. Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue.

Kanabur, V. & Harakannanavar, S.S. (2019). An extensive review of feature extraction techniques, challenges and trends in automatic speech recognition. Image, Graphics and Signal Processing, 5, 1-12. Doi:

Kholis, A. (2021). Elsa speak app: Automatic speech recognition (ASR) for supplementing english pronunciation skills. Pedagogy: Journal of English Language Teaching, 9(1). Doi:

Koizumi, R. (2022). L2 speaking assessment in secondary school classrooms in Japan. Language Assessment Quarterly, 19(2), 142–161.

Laughlin, V. T., Sydorenko, T., & Daurio, P. (2020). Using spoken dialogue technology for L2 speaking practice: what do teachers think? Computer Assisted Language Learning.

Litman, D., Strik, H., & Lim, G. S. (2018). Speech technologies and the assessment of second language speaking: approaches, challenges, and opportunities. Language Assessment Quarterly, 15(3), 294–309.

Liu, Y., Li, Y., Deng, G., Juefei-Xu, F., Du, Y., Zhang, C., Liu, C., Li, Y., Ma, L., & Liu, Y. (2023). ASTER: Automatic speech recognition system accessibility testing for stutterers. Doi:

Papi, S., Gretter, R., Matassoni, M., & Falavigna, D. (2021). Mixtures of deep neural experts for automated speech scoring. Proceedings of Interspeech 2020. Doi:

Paul, D., & Parekh, R. (2011). Automated speech recognition of isolated words using neural networks. International Journal of Engineering Science and Technology (IJEST), 3(6), 4993-5000.

Strik, H., Neri, A., & Cucchiarini, C. (2008). Speech technology for language tutoring. Proceedings of LangTech 2008, 73–76.

Tao, J., Evanini, K., & Wang, X. (2014). The influence of automatic speech recognition accuracy on the performance of an automated speech assessment system. EEE Spoken Language Technology Workshop (SLT), pp. 294-299. Doi:

Tranfield, D., Denyer, D., & Smart, P. (2003). Towards a methodology for developing evidence-informed management knowledge by means of systematic review. British Journal of Management, 14(3), 207–222. Doi:

Van, C. D. (2001). Recognizing speech of goats, wolves, sheep, and non-natives. Speech Communication, 35(1-2), 71–79.

Van, D. J., Cucchiarini, C., & Strik, H. (2010). Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP Journal on Audio, Speech, and Music Processing. Doi:

Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883–895. Doi:


Additional Files



How to Cite

Sutomo, N. (2024). The Implementation of Automated Speech Recognition (ASR) in ELT Classroom: A Systematic Literature Review from 2012-2023. Voices of English Language Education Society, 7(3), 816–828.