Applying Automatic Speech to Text in Academic Settings for the Deaf and Hard of Hearing

Weigel, Carla

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/26993

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Stroinska, Magda	-
dc.contributor.advisor	Pape, Daniel	-
dc.contributor.author	Weigel, Carla	-
dc.date.accessioned	2021-10-06T18:01:14Z	-
dc.date.available	2021-10-06T18:01:14Z	-
dc.date.issued	2021	-
dc.identifier.uri	http://hdl.handle.net/11375/26993	-
dc.description	This project discusses the importance of accurate note-taking for D/deaf and hard of hearing students who have accomodation requirements and offers innovative opportunities to improve the student experience in order to encourage more D/deaf and hard of hearing individuals to persue academia. It also includes a linguistic analysis of speech singals that correspond to transcription output errors produced by speech-to-text programs, which can be utilized to advance and improve speech recognition systems.	en_US
dc.description.abstract	In hopes to encourage more D/deaf and hard of hearing (DHH) students to pursue academia, speech-to-text has been suggested to address notetaking issues. This research examined several transcripts created by two untrained speech-to-text programs, Ava and Otter, using 11 different speakers in academic contexts. Observations regarding functionality and error analysis are detailed in this thesis. This project has several objectives, including: 1) to outline how the DHH students’ experience differs from other note-taking needs; 2) to use linguistic analysis to understand how transcript accuracy converts to real-world use and to investigate why errors occur; and 3) to describe what needs to be addressed before assigning DHH students with a captioning service. Results from a focus group showed that current notetaking services are problematic, and that automatic captioning may solve some issues, but some errors are detrimental as it is particularly difficult for DHH students to identify and fix errors within transcripts. Transcripts produced by the programs were difficult to read, as outputs lacked accurate utterance breaks and contained poor punctuation. The captioning of scripted speech was more accurate than that of spontaneous speech for native and most non-native English speakers. An analysis of errors showed that some errors are less severe than others; in response, we offer an alternative way to view errors: as insignificant, obvious, or critical errors. Errors are caused by either the program’s inability to identify various items, such as word breaks, abbreviations, and numbers, or a blend of various speaker factors including: assimilation, vowel approximation, epenthesis, phoneme reduction, and overall intelligibility. Both programs worked best with intelligible speech, as measured by human perception. Speech rate trends were surprising: Otter seemed to prefer fast speech from native English speakers and Ava preferred, as expected, slow speech, but results differed between scripted and spontaneous speech. Correlations of accuracy and fundamental frequencies showed conflicting results. Some reasons for errors could not be determined without knowing more about how the systems were programed.	en_US
dc.language.iso	en	en_US
dc.subject	Deaf	en_US
dc.subject	Hard of Hearing	en_US
dc.subject	Speech	en_US
dc.subject	recognition	en_US
dc.subject	captions	en_US
dc.subject	note taking	en_US
dc.subject	analysis	en_US
dc.subject	error	en_US
dc.subject	rate	en_US
dc.subject	accessibility	en_US
dc.subject	computational	en_US
dc.subject	phonetics	en_US
dc.subject	acoustics	en_US
dc.subject	speech-to-text	en_US
dc.subject	Otter	en_US
dc.subject	Ava	en_US
dc.subject	speech signal	en_US
dc.subject	speech perception	en_US
dc.subject	linguistics	en_US
dc.subject	language	en_US
dc.subject	lecture	en_US
dc.subject	transcription	en_US
dc.subject	intelligibility	en_US
dc.subject	academia	en_US
dc.subject	accuracy	en_US
dc.title	Applying Automatic Speech to Text in Academic Settings for the Deaf and Hard of Hearing	en_US
dc.type	Thesis	en_US
dc.contributor.department	Cognitive Science of Language	en_US
dc.description.degreetype	Thesis	en_US
dc.description.degree	Master of Science (MSc)	en_US
dc.description.layabstract	In hopes to encourage more D/deaf and hard of hearing (DHH) students to pursue academia, automatic captioning has been suggested to address notetaking issues. Captioning programs use speech recognition (SR) technology to caption lectures in real-time and produce a transcript afterwards. This research examined several transcripts created by two untrained speech-to-text programs, Ava and Otter, using 11 different speakers. Observations regarding functionality and error analysis are detailed in this thesis. The project has several objectives: 1) to outline how the DHH students’ experience differs from other note-taking needs; 2) to use linguistic analysis to understand how transcript accuracy converts to real-world use and to investigate why errors occur; and 3) to describe what needs to be addressed before assigning DHH students with a captioning service. Results from a focus group showed that current notetaking services are problematic, and that automatic captioning may solve some issues, but some types of errors are detrimental as it is particularly difficult for DHH students to identify and fix errors within transcripts. Transcripts produced by the programs were difficult to read, as outputs contain poor punctuation and lack breaks between thoughts. Captioning of scripted speech was more accurate than that of spontaneous speech for native and most non-native English speakers; and an analysis of errors showed that some errors are less severe than others. In response, we offer an alternative way to view errors: as insignificant, obvious, or critical errors. Errors are caused by either the program’s inability to identify various items, such as word breaks, abbreviations, and numbers, or a blend of various speaker factors. Both programs worked best with intelligible speech; One seemed to prefer fast speech from native English speakers and the other preferred slow speech; a preference of male or female voices showed conflicting results. Some reasons for errors could not be determined, as one would have to observe how the systems were programed.	en_US
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
weigel_carla_e_finalsubmission202109_msc.pdf Access is allowed from: 2022-01-06		1.23 MB	Adobe PDF	View/Open

Show simple item record