The Implementation of Automated Speech Recognition (ASR) in ELT Classroom: A Systematic Literature Review from 2012-2023

Authors

DOI:

https://doi.org/10.29408/veles.v7i3.23978

Keywords:

Speaking Assessment, Automated Speech Recognition (ASR), speech technology, speaking proficiency, oral communication

Abstract

Automated Speech Recognition (ASR) turns speech audio streams into text. The use of computer-based voice recognition is beneficial for teaching pronunciation. It may also be used to evaluate a learner's speech in a broader context and set up the potential for creating aural interactions between the learner and the computer. The study is aimed to investigate the use of ASR in speaking assessment using a systematic literature review. Automated scores should be considered based on their validity and any potential issues or mistakes with employing technology (ASR) to evaluate speaking. Although some research has been carried out on the use of technology in speaking assessment, there have been few empirical studies using systematic literature reviews to explore more about ASR in education. Therefore, this study is aimed to provide an updated and comprehensive review of the implementation of Automated Speech Recognition (ASR) in education. Ten studies from journals cited in the Taylor & Francis, Wiley, and Springer databases were selected.  The results show the benefits of ASR in education including students' progress, interaction, and pedagogical contributions. Pedagogical contributions provide collaboration between the human and automated scores. Teachers can use this study to improve speaking assessments using ASR in the classroom.

Author Biography

Ngadiso Sutomo, Sebelas Maret University

English Education Department

References

Ahn, T. Y., & Lee, S. M. (2015). User experience of a mobile speaking application with automatic speech recognition for EFL learning. British Journal of Educational Technology, 47(4), 778–786. https://doi.org/10.1111/bjet.12354

Bashori, M., Van, H. R., Strik, H., & Cucchiarini, C. (2022). “Look, I can speak correctly”: learning vocabulary and pronunciation through websites equipped with automatic speech recognition technology. Computer Assisted Language Learning, 1–29. https://doi.org/10.1080/09588221.2022.2080230

Carrier, M. (2017). Automated Speech Recognition in language learning: potential models, benefits and impact. Training, Language and Culture, 1(1), 46-61. http://doi.org/10.29366/2017tlc.1.1.3

Chan, D. M. & Ghosh, S. (2022). Content-context factorized

representations for automated speech recognition. Interspeech 2022. Doi: https://doi.org/10.48550/arXiv.2205.09872

Coniam, D. (1999). Voice recognition software accuracy with second language speakers of English. System, 27(1), 49–64. https://doi.org/10.1016/S0346-251X(98)00049-9

Cucchiarini, C., Strik, H., & Boves, L. (2002). Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech. Journal of the Acoustical Society of America, 111(6), 2862–2873. https://doi.org/10.1121/1.1471894

Cucchiarini, C., Van, D. J., & Strik, H. (2010). Fluency in non-native read and spontaneous speech. Proceedings of the DiSS-LPSS Joint Workshop 2010, 15–18. Tokyo, Japan: University of Tokyo.

Derwing, T. M., Munro, M. J. & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34(3), 592–603. https://doi.org/10.2307/3587748

Gu, L., Davis, L., Tao, J., & Zechner, K. (2020). Using spoken language technology for generating feedback to prepare for the Toefl ibt® test: A user perception study. Assessment in Education: Principles, Policy, and Practice, 28(1), 58–76 https://doi.org/10.1080/0969594x.2020.1735995

Ivanov, A. V., Ramanarayanan, V., Suendermann-Oeft, D., Lopez, M., Evanini, K., & Tao, J. (2015). Automated Speech Recognition Technology for dialogue interaction with Non-Native interlocutors. Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. https://doi.org/10.18653/v1/w15-4617

Kanabur, V. & Harakannanavar, S.S. (2019). An extensive review of feature extraction techniques, challenges and trends in automatic speech recognition. Image, Graphics and Signal Processing, 5, 1-12. Doi: https://doi.org/10.5815/ijigsp.2019.05.01

Kholis, A. (2021). Elsa speak app: Automatic speech recognition (ASR) for supplementing english pronunciation skills. Pedagogy: Journal of English Language Teaching, 9(1). Doi: https://doi.org/10.32332/pedagogy.v8i1

Koizumi, R. (2022). L2 speaking assessment in secondary school classrooms in Japan. Language Assessment Quarterly, 19(2), 142–161. https://doi.org/10.1080/15434303.2021.2023542

Laughlin, V. T., Sydorenko, T., & Daurio, P. (2020). Using spoken dialogue technology for L2 speaking practice: what do teachers think? Computer Assisted Language Learning. https://doi.org/10.1080/09588221.2020.1774904

Litman, D., Strik, H., & Lim, G. S. (2018). Speech technologies and the assessment of second language speaking: approaches, challenges, and opportunities. Language Assessment Quarterly, 15(3), 294–309. https://doi.org/10.1080/15434303.2018.1472265

Liu, Y., Li, Y., Deng, G., Juefei-Xu, F., Du, Y., Zhang, C., Liu, C., Li, Y., Ma, L., & Liu, Y. (2023). ASTER: Automatic speech recognition system accessibility testing for stutterers. Doi: https://doi.org/10.48550/arXiv.2308.15742

Papi, S., Gretter, R., Matassoni, M., & Falavigna, D. (2021). Mixtures of deep neural experts for automated speech scoring. Proceedings of Interspeech 2020. Doi: https://doi.org/10.21437/Interspeech.2020-1055

Paul, D., & Parekh, R. (2011). Automated speech recognition of isolated words using neural networks. International Journal of Engineering Science and Technology (IJEST), 3(6), 4993-5000.

Strik, H., Neri, A., & Cucchiarini, C. (2008). Speech technology for language tutoring. Proceedings of LangTech 2008, 73–76.

Tao, J., Evanini, K., & Wang, X. (2014). The influence of automatic speech recognition accuracy on the performance of an automated speech assessment system. EEE Spoken Language Technology Workshop (SLT), pp. 294-299. Doi: https://doi.org/10.1109/SLT.2014.7078590

Tranfield, D., Denyer, D., & Smart, P. (2003). Towards a methodology for developing evidence-informed management knowledge by means of systematic review. British Journal of Management, 14(3), 207–222. Doi: http://doi.org/10.1111/1467-8551.00375

Van, C. D. (2001). Recognizing speech of goats, wolves, sheep, and non-natives. Speech Communication, 35(1-2), 71–79. https://doi.org/10.1016/S0167-6393(00)00096-0

Van, D. J., Cucchiarini, C., & Strik, H. (2010). Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP Journal on Audio, Speech, and Music Processing. Doi: https://doi.org/10.1155/2010/973954/

Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883–895. Doi: https://doi.org/10.1016/j.specom.2009.04.009

Downloads

Additional Files

Published

2024-01-08