Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/26993
Title: | Applying Automatic Speech to Text in Academic Settings for the Deaf and Hard of Hearing |
Authors: | Weigel, Carla |
Advisor: | Stroinska, Magda Pape, Daniel |
Department: | Cognitive Science of Language |
Keywords: | Deaf;Hard of Hearing;Speech;recognition;captions;note taking;analysis;error;rate;accessibility;computational;phonetics;acoustics;speech-to-text;Otter;Ava;speech signal;speech perception;linguistics;language;lecture;transcription;intelligibility;academia;accuracy |
Publication Date: | 2021 |
Abstract: | In hopes to encourage more D/deaf and hard of hearing (DHH) students to pursue academia, speech-to-text has been suggested to address notetaking issues. This research examined several transcripts created by two untrained speech-to-text programs, Ava and Otter, using 11 different speakers in academic contexts. Observations regarding functionality and error analysis are detailed in this thesis. This project has several objectives, including: 1) to outline how the DHH students’ experience differs from other note-taking needs; 2) to use linguistic analysis to understand how transcript accuracy converts to real-world use and to investigate why errors occur; and 3) to describe what needs to be addressed before assigning DHH students with a captioning service. Results from a focus group showed that current notetaking services are problematic, and that automatic captioning may solve some issues, but some errors are detrimental as it is particularly difficult for DHH students to identify and fix errors within transcripts. Transcripts produced by the programs were difficult to read, as outputs lacked accurate utterance breaks and contained poor punctuation. The captioning of scripted speech was more accurate than that of spontaneous speech for native and most non-native English speakers. An analysis of errors showed that some errors are less severe than others; in response, we offer an alternative way to view errors: as insignificant, obvious, or critical errors. Errors are caused by either the program’s inability to identify various items, such as word breaks, abbreviations, and numbers, or a blend of various speaker factors including: assimilation, vowel approximation, epenthesis, phoneme reduction, and overall intelligibility. Both programs worked best with intelligible speech, as measured by human perception. Speech rate trends were surprising: Otter seemed to prefer fast speech from native English speakers and Ava preferred, as expected, slow speech, but results differed between scripted and spontaneous speech. Correlations of accuracy and fundamental frequencies showed conflicting results. Some reasons for errors could not be determined without knowing more about how the systems were programed. |
Description: | This project discusses the importance of accurate note-taking for D/deaf and hard of hearing students who have accomodation requirements and offers innovative opportunities to improve the student experience in order to encourage more D/deaf and hard of hearing individuals to persue academia. It also includes a linguistic analysis of speech singals that correspond to transcription output errors produced by speech-to-text programs, which can be utilized to advance and improve speech recognition systems. |
URI: | http://hdl.handle.net/11375/26993 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
weigel_carla_e_finalsubmission202109_msc.pdf | 1.23 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.