Skip navigation
  • Home
  • Browse
    • Communities
      & Collections
    • Browse Items by:
    • Publication Date
    • Author
    • Title
    • Subject
    • Department
  • Sign on to:
    • My MacSphere
    • Receive email
      updates
    • Edit Profile


McMaster University Home Page
  1. MacSphere
  2. Open Access Dissertations and Theses Community
  3. Open Access Dissertations and Theses
Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/26993
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorStroinska, Magda-
dc.contributor.advisorPape, Daniel-
dc.contributor.authorWeigel, Carla-
dc.date.accessioned2021-10-06T18:01:14Z-
dc.date.available2021-10-06T18:01:14Z-
dc.date.issued2021-
dc.identifier.urihttp://hdl.handle.net/11375/26993-
dc.descriptionThis project discusses the importance of accurate note-taking for D/deaf and hard of hearing students who have accomodation requirements and offers innovative opportunities to improve the student experience in order to encourage more D/deaf and hard of hearing individuals to persue academia. It also includes a linguistic analysis of speech singals that correspond to transcription output errors produced by speech-to-text programs, which can be utilized to advance and improve speech recognition systems.en_US
dc.description.abstractIn hopes to encourage more D/deaf and hard of hearing (DHH) students to pursue academia, speech-to-text has been suggested to address notetaking issues. This research examined several transcripts created by two untrained speech-to-text programs, Ava and Otter, using 11 different speakers in academic contexts. Observations regarding functionality and error analysis are detailed in this thesis. This project has several objectives, including: 1) to outline how the DHH students’ experience differs from other note-taking needs; 2) to use linguistic analysis to understand how transcript accuracy converts to real-world use and to investigate why errors occur; and 3) to describe what needs to be addressed before assigning DHH students with a captioning service. Results from a focus group showed that current notetaking services are problematic, and that automatic captioning may solve some issues, but some errors are detrimental as it is particularly difficult for DHH students to identify and fix errors within transcripts. Transcripts produced by the programs were difficult to read, as outputs lacked accurate utterance breaks and contained poor punctuation. The captioning of scripted speech was more accurate than that of spontaneous speech for native and most non-native English speakers. An analysis of errors showed that some errors are less severe than others; in response, we offer an alternative way to view errors: as insignificant, obvious, or critical errors. Errors are caused by either the program’s inability to identify various items, such as word breaks, abbreviations, and numbers, or a blend of various speaker factors including: assimilation, vowel approximation, epenthesis, phoneme reduction, and overall intelligibility. Both programs worked best with intelligible speech, as measured by human perception. Speech rate trends were surprising: Otter seemed to prefer fast speech from native English speakers and Ava preferred, as expected, slow speech, but results differed between scripted and spontaneous speech. Correlations of accuracy and fundamental frequencies showed conflicting results. Some reasons for errors could not be determined without knowing more about how the systems were programed.en_US
dc.language.isoenen_US
dc.subjectDeafen_US
dc.subjectHard of Hearingen_US
dc.subjectSpeechen_US
dc.subjectrecognitionen_US
dc.subjectcaptionsen_US
dc.subjectnote takingen_US
dc.subjectanalysisen_US
dc.subjecterroren_US
dc.subjectrateen_US
dc.subjectaccessibilityen_US
dc.subjectcomputationalen_US
dc.subjectphoneticsen_US
dc.subjectacousticsen_US
dc.subjectspeech-to-texten_US
dc.subjectOtteren_US
dc.subjectAvaen_US
dc.subjectspeech signalen_US
dc.subjectspeech perceptionen_US
dc.subjectlinguisticsen_US
dc.subjectlanguageen_US
dc.subjectlectureen_US
dc.subjecttranscriptionen_US
dc.subjectintelligibilityen_US
dc.subjectacademiaen_US
dc.subjectaccuracyen_US
dc.titleApplying Automatic Speech to Text in Academic Settings for the Deaf and Hard of Hearingen_US
dc.typeThesisen_US
dc.contributor.departmentCognitive Science of Languageen_US
dc.description.degreetypeThesisen_US
dc.description.degreeMaster of Science (MSc)en_US
dc.description.layabstractIn hopes to encourage more D/deaf and hard of hearing (DHH) students to pursue academia, automatic captioning has been suggested to address notetaking issues. Captioning programs use speech recognition (SR) technology to caption lectures in real-time and produce a transcript afterwards. This research examined several transcripts created by two untrained speech-to-text programs, Ava and Otter, using 11 different speakers. Observations regarding functionality and error analysis are detailed in this thesis. The project has several objectives: 1) to outline how the DHH students’ experience differs from other note-taking needs; 2) to use linguistic analysis to understand how transcript accuracy converts to real-world use and to investigate why errors occur; and 3) to describe what needs to be addressed before assigning DHH students with a captioning service. Results from a focus group showed that current notetaking services are problematic, and that automatic captioning may solve some issues, but some types of errors are detrimental as it is particularly difficult for DHH students to identify and fix errors within transcripts. Transcripts produced by the programs were difficult to read, as outputs contain poor punctuation and lack breaks between thoughts. Captioning of scripted speech was more accurate than that of spontaneous speech for native and most non-native English speakers; and an analysis of errors showed that some errors are less severe than others. In response, we offer an alternative way to view errors: as insignificant, obvious, or critical errors. Errors are caused by either the program’s inability to identify various items, such as word breaks, abbreviations, and numbers, or a blend of various speaker factors. Both programs worked best with intelligible speech; One seemed to prefer fast speech from native English speakers and the other preferred slow speech; a preference of male or female voices showed conflicting results. Some reasons for errors could not be determined, as one would have to observe how the systems were programed.en_US
Appears in Collections:Open Access Dissertations and Theses

Files in This Item:
File Description SizeFormat 
weigel_carla_e_finalsubmission202109_msc.pdf
Access is allowed from: 2022-01-06
1.23 MBAdobe PDFView/Open
Show simple item record Statistics


Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.

Sherman Centre for Digital Scholarship     McMaster University Libraries
©2022 McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L8 | 905-525-9140 | Contact Us | Terms of Use & Privacy Policy | Feedback

Report Accessibility Issue