The Implementation of Automated Speech Recognition (ASR) in ELT Classroom: A Systematic Literature Review from 2012-2023
DOI:
https://doi.org/10.29408/veles.v7i3.23978Keywords:
Speaking Assessment, Automated Speech Recognition (ASR), speech technology, speaking proficiency, oral communicationAbstract
Automated Speech Recognition (ASR) turns speech audio streams into text. The use of computer-based voice recognition is beneficial for teaching pronunciation. It may also be used to evaluate a learner's speech in a broader context and set up the potential for creating aural interactions between the learner and the computer. The study is aimed to investigate the use of ASR in speaking assessment using a systematic literature review. Automated scores should be considered based on their validity and any potential issues or mistakes with employing technology (ASR) to evaluate speaking. Although some research has been carried out on the use of technology in speaking assessment, there have been few empirical studies using systematic literature reviews to explore more about ASR in education. Therefore, this study is aimed to provide an updated and comprehensive review of the implementation of Automated Speech Recognition (ASR) in education. Ten studies from journals cited in the Taylor & Francis, Wiley, and Springer databases were selected. The results show the benefits of ASR in education including students' progress, interaction, and pedagogical contributions. Pedagogical contributions provide collaboration between the human and automated scores. Teachers can use this study to improve speaking assessments using ASR in the classroom.
References
Ahn, T. Y., & Lee, S. M. (2015). User experience of a mobile speaking application with automatic speech recognition for EFL learning. British Journal of Educational Technology, 47(4), 778–786. https://doi.org/10.1111/bjet.12354
Bashori, M., Van, H. R., Strik, H., & Cucchiarini, C. (2022). “Look, I can speak correctly”: learning vocabulary and pronunciation through websites equipped with automatic speech recognition technology. Computer Assisted Language Learning, 1–29. https://doi.org/10.1080/09588221.2022.2080230
Carrier, M. (2017). Automated Speech Recognition in language learning: potential models, benefits and impact. Training, Language and Culture, 1(1), 46-61. http://doi.org/10.29366/2017tlc.1.1.3
Chan, D. M. & Ghosh, S. (2022). Content-context factorized
representations for automated speech recognition. Interspeech 2022. Doi: https://doi.org/10.48550/arXiv.2205.09872
Coniam, D. (1999). Voice recognition software accuracy with second language speakers of English. System, 27(1), 49–64. https://doi.org/10.1016/S0346-251X(98)00049-9
Cucchiarini, C., Strik, H., & Boves, L. (2002). Quantitative assessment of second language learners’ fluency: Comparisons between read and spontaneous speech. Journal of the Acoustical Society of America, 111(6), 2862–2873. https://doi.org/10.1121/1.1471894
Cucchiarini, C., Van, D. J., & Strik, H. (2010). Fluency in non-native read and spontaneous speech. Proceedings of the DiSS-LPSS Joint Workshop 2010, 15–18. Tokyo, Japan: University of Tokyo.
Derwing, T. M., Munro, M. J. & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34(3), 592–603. https://doi.org/10.2307/3587748
Gu, L., Davis, L., Tao, J., & Zechner, K. (2020). Using spoken language technology for generating feedback to prepare for the Toefl ibt® test: A user perception study. Assessment in Education: Principles, Policy, and Practice, 28(1), 58–76 https://doi.org/10.1080/0969594x.2020.1735995
Ivanov, A. V., Ramanarayanan, V., Suendermann-Oeft, D., Lopez, M., Evanini, K., & Tao, J. (2015). Automated Speech Recognition Technology for dialogue interaction with Non-Native interlocutors. Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. https://doi.org/10.18653/v1/w15-4617
Kanabur, V. & Harakannanavar, S.S. (2019). An extensive review of feature extraction techniques, challenges and trends in automatic speech recognition. Image, Graphics and Signal Processing, 5, 1-12. Doi: https://doi.org/10.5815/ijigsp.2019.05.01
Kholis, A. (2021). Elsa speak app: Automatic speech recognition (ASR) for supplementing english pronunciation skills. Pedagogy: Journal of English Language Teaching, 9(1). Doi: https://doi.org/10.32332/pedagogy.v8i1
Koizumi, R. (2022). L2 speaking assessment in secondary school classrooms in Japan. Language Assessment Quarterly, 19(2), 142–161. https://doi.org/10.1080/15434303.2021.2023542
Laughlin, V. T., Sydorenko, T., & Daurio, P. (2020). Using spoken dialogue technology for L2 speaking practice: what do teachers think? Computer Assisted Language Learning. https://doi.org/10.1080/09588221.2020.1774904
Litman, D., Strik, H., & Lim, G. S. (2018). Speech technologies and the assessment of second language speaking: approaches, challenges, and opportunities. Language Assessment Quarterly, 15(3), 294–309. https://doi.org/10.1080/15434303.2018.1472265
Liu, Y., Li, Y., Deng, G., Juefei-Xu, F., Du, Y., Zhang, C., Liu, C., Li, Y., Ma, L., & Liu, Y. (2023). ASTER: Automatic speech recognition system accessibility testing for stutterers. Doi: https://doi.org/10.48550/arXiv.2308.15742
Papi, S., Gretter, R., Matassoni, M., & Falavigna, D. (2021). Mixtures of deep neural experts for automated speech scoring. Proceedings of Interspeech 2020. Doi: https://doi.org/10.21437/Interspeech.2020-1055
Paul, D., & Parekh, R. (2011). Automated speech recognition of isolated words using neural networks. International Journal of Engineering Science and Technology (IJEST), 3(6), 4993-5000.
Strik, H., Neri, A., & Cucchiarini, C. (2008). Speech technology for language tutoring. Proceedings of LangTech 2008, 73–76.
Tao, J., Evanini, K., & Wang, X. (2014). The influence of automatic speech recognition accuracy on the performance of an automated speech assessment system. EEE Spoken Language Technology Workshop (SLT), pp. 294-299. Doi: https://doi.org/10.1109/SLT.2014.7078590
Tranfield, D., Denyer, D., & Smart, P. (2003). Towards a methodology for developing evidence-informed management knowledge by means of systematic review. British Journal of Management, 14(3), 207–222. Doi: http://doi.org/10.1111/1467-8551.00375
Van, C. D. (2001). Recognizing speech of goats, wolves, sheep, and non-natives. Speech Communication, 35(1-2), 71–79. https://doi.org/10.1016/S0167-6393(00)00096-0
Van, D. J., Cucchiarini, C., & Strik, H. (2010). Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP Journal on Audio, Speech, and Music Processing. Doi: https://doi.org/10.1155/2010/973954/
Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51(10), 883–895. Doi: https://doi.org/10.1016/j.specom.2009.04.009
Additional Files
Published
How to Cite
Issue
Section
License
Authors who publish with the VELES Journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).
- Authors are able to enter into separate, additional contractual arrangements for the distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
VELES Journal is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.