Welcome to the upgraded MacSphere! We're putting the finishing touches on it; if you notice anything amiss, email macsphere@mcmaster.ca

Applying Automatic Speech to Text in Academic Settings for the Deaf and Hard of Hearing

dc.contributor.advisorStroinska, Magda
dc.contributor.advisorPape, Daniel
dc.contributor.authorWeigel, Carla
dc.contributor.departmentCognitive Science of Languageen_US
dc.date.accessioned2021-10-06T18:01:14Z
dc.date.available2021-10-06T18:01:14Z
dc.date.issued2021
dc.descriptionThis project discusses the importance of accurate note-taking for D/deaf and hard of hearing students who have accomodation requirements and offers innovative opportunities to improve the student experience in order to encourage more D/deaf and hard of hearing individuals to persue academia. It also includes a linguistic analysis of speech singals that correspond to transcription output errors produced by speech-to-text programs, which can be utilized to advance and improve speech recognition systems.en_US
dc.description.abstractIn hopes to encourage more D/deaf and hard of hearing (DHH) students to pursue academia, speech-to-text has been suggested to address notetaking issues. This research examined several transcripts created by two untrained speech-to-text programs, Ava and Otter, using 11 different speakers in academic contexts. Observations regarding functionality and error analysis are detailed in this thesis. This project has several objectives, including: 1) to outline how the DHH students’ experience differs from other note-taking needs; 2) to use linguistic analysis to understand how transcript accuracy converts to real-world use and to investigate why errors occur; and 3) to describe what needs to be addressed before assigning DHH students with a captioning service. Results from a focus group showed that current notetaking services are problematic, and that automatic captioning may solve some issues, but some errors are detrimental as it is particularly difficult for DHH students to identify and fix errors within transcripts. Transcripts produced by the programs were difficult to read, as outputs lacked accurate utterance breaks and contained poor punctuation. The captioning of scripted speech was more accurate than that of spontaneous speech for native and most non-native English speakers. An analysis of errors showed that some errors are less severe than others; in response, we offer an alternative way to view errors: as insignificant, obvious, or critical errors. Errors are caused by either the program’s inability to identify various items, such as word breaks, abbreviations, and numbers, or a blend of various speaker factors including: assimilation, vowel approximation, epenthesis, phoneme reduction, and overall intelligibility. Both programs worked best with intelligible speech, as measured by human perception. Speech rate trends were surprising: Otter seemed to prefer fast speech from native English speakers and Ava preferred, as expected, slow speech, but results differed between scripted and spontaneous speech. Correlations of accuracy and fundamental frequencies showed conflicting results. Some reasons for errors could not be determined without knowing more about how the systems were programed.en_US
dc.description.degreeMaster of Science (MSc)en_US
dc.description.degreetypeThesisen_US
dc.description.layabstractIn hopes to encourage more D/deaf and hard of hearing (DHH) students to pursue academia, automatic captioning has been suggested to address notetaking issues. Captioning programs use speech recognition (SR) technology to caption lectures in real-time and produce a transcript afterwards. This research examined several transcripts created by two untrained speech-to-text programs, Ava and Otter, using 11 different speakers. Observations regarding functionality and error analysis are detailed in this thesis. The project has several objectives: 1) to outline how the DHH students’ experience differs from other note-taking needs; 2) to use linguistic analysis to understand how transcript accuracy converts to real-world use and to investigate why errors occur; and 3) to describe what needs to be addressed before assigning DHH students with a captioning service. Results from a focus group showed that current notetaking services are problematic, and that automatic captioning may solve some issues, but some types of errors are detrimental as it is particularly difficult for DHH students to identify and fix errors within transcripts. Transcripts produced by the programs were difficult to read, as outputs contain poor punctuation and lack breaks between thoughts. Captioning of scripted speech was more accurate than that of spontaneous speech for native and most non-native English speakers; and an analysis of errors showed that some errors are less severe than others. In response, we offer an alternative way to view errors: as insignificant, obvious, or critical errors. Errors are caused by either the program’s inability to identify various items, such as word breaks, abbreviations, and numbers, or a blend of various speaker factors. Both programs worked best with intelligible speech; One seemed to prefer fast speech from native English speakers and the other preferred slow speech; a preference of male or female voices showed conflicting results. Some reasons for errors could not be determined, as one would have to observe how the systems were programed.en_US
dc.identifier.urihttp://hdl.handle.net/11375/26993
dc.language.isoenen_US
dc.subjectDeafen_US
dc.subjectHard of Hearingen_US
dc.subjectSpeechen_US
dc.subjectrecognitionen_US
dc.subjectcaptionsen_US
dc.subjectnote takingen_US
dc.subjectanalysisen_US
dc.subjecterroren_US
dc.subjectrateen_US
dc.subjectaccessibilityen_US
dc.subjectcomputationalen_US
dc.subjectphoneticsen_US
dc.subjectacousticsen_US
dc.subjectspeech-to-texten_US
dc.subjectOtteren_US
dc.subjectAvaen_US
dc.subjectspeech signalen_US
dc.subjectspeech perceptionen_US
dc.subjectlinguisticsen_US
dc.subjectlanguageen_US
dc.subjectlectureen_US
dc.subjecttranscriptionen_US
dc.subjectintelligibilityen_US
dc.subjectacademiaen_US
dc.subjectaccuracyen_US
dc.titleApplying Automatic Speech to Text in Academic Settings for the Deaf and Hard of Hearingen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
weigel_carla_e_finalsubmission202109_msc.pdf
Size:
1.21 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: