Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/29764
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Bruce, Ian | - |
dc.contributor.author | Li, Yujie | - |
dc.date.accessioned | 2024-05-07T19:37:13Z | - |
dc.date.available | 2024-05-07T19:37:13Z | - |
dc.date.issued | 2024 | - |
dc.identifier.uri | http://hdl.handle.net/11375/29764 | - |
dc.description | Speech intelligibility is known as a measure of how much speech information the listener perceives. It is important to note that speech quality and intelligibility are not synonymous, good intelligibility can be achieved with degraded speech quality in some cases. If speech quality is about the ``how", intelligibility is about the ``what". In other words, speech quality is concerned with the ``how" the speech sounds - whether it is clear, lossless, noiseless, etc. While speech intelligibility is concerned with the comprehensibility of speech information, that is, whether the listener can accurately understand the ``what" in the speech signal, i.e., the conveyed messages or vocabulary. After understanding the concept of speech intelligibility, this thesis investigates various methods for predicting speech intelligibility. | en_US |
dc.description.abstract | Speech intelligibility is a measure of the human ability to understand speech signals. While the speech signal can be decomposed into its envelope and temporal fine structure, where the envelope is the contour of the signal amplitude over time, revealing the rhythm and intensity of the speech signal, the temporal fine structure is the rapidly oscillating portion of the signal that carries pitch and timbre information. Studies on speech intelligibility have shown that the acoustic envelope and the temporal fine structure contribute to speech intelligibility and play an important role in quiet and background noise, respectively. In this thesis, two speech signals are selected, one signal retains the envelope, the other signal retains the temporal fine structure, and then the envelope of one signal is combined with the temporal fine structure of the other signal to generate different speech chimera signals. Three methods are applied to evaluate the speech intelligibility, namely Spectro-Temporal Modulation Index (STMI), Neurogram Similarity Index Measure (NSIM), and Cross-Correlation Coefficients (CCC). This thesis describes these three methods in detail, in particular the creation of physiologically based assessment matrices, and then analyzes and compares the results by creating regression models of the predicted values of the different algorithms with experimentally measured subjective perceptions. This thesis shows that the combination of the STMI with either the fine-timing NSIM or the temporal fine structure CCC provides the optimal prediction model for speech chimera signals, and provides some implications for speech intelligibility research. | en_US |
dc.language.iso | en | en_US |
dc.subject | speech intelligibility | en_US |
dc.subject | envelope | en_US |
dc.subject | temporal fine structure | en_US |
dc.subject | mean-rate | en_US |
dc.subject | fine-timing | en_US |
dc.subject | speech chimaeras | en_US |
dc.title | EVALUATING THE INTELLIGIBILITY OF SPEECH CHIMAERAS BASED ON ENVELOPE AND TEMPORAL FINE STRUCTURE CUES | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Electrical and Computer Engineering | en_US |
dc.description.degreetype | Thesis | en_US |
dc.description.degree | Master of Applied Science (MASc) | en_US |
dc.description.layabstract | In daily life, being able to hear and understand the speech of people around us is the basis for normal communication and life. In order to investigate how people use different information in the speech signal to understand speech, this thesis decomposes the speech signal into envelope and temporal fine structure, where the envelope is the contour of the signal amplitude over time, which provides information such as syllable rhythms and intonation patterns, while the temporal fine structure is the fast oscillating part of the signal, which carries pitch and timbre information. In this thesis, the envelope and temporal fine structure of different signals are combined to produce speech chimaeras as test datas, and speech intelligibility is evaluated using different algorithms to select the best performing model, in an effort to contribute to the study of speech intelligibility. | en_US |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Li_Yujie_2024April_MASc.pdf | 6.29 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.