Please use this identifier to cite or link to this item:
http://hdl.handle.net/11375/29764
Title: | EVALUATING THE INTELLIGIBILITY OF SPEECH CHIMAERAS BASED ON ENVELOPE AND TEMPORAL FINE STRUCTURE CUES |
Authors: | Li, Yujie |
Advisor: | Bruce, Ian |
Department: | Electrical and Computer Engineering |
Keywords: | speech intelligibility;envelope;temporal fine structure;mean-rate;fine-timing;speech chimaeras |
Publication Date: | 2024 |
Abstract: | Speech intelligibility is a measure of the human ability to understand speech signals. While the speech signal can be decomposed into its envelope and temporal fine structure, where the envelope is the contour of the signal amplitude over time, revealing the rhythm and intensity of the speech signal, the temporal fine structure is the rapidly oscillating portion of the signal that carries pitch and timbre information. Studies on speech intelligibility have shown that the acoustic envelope and the temporal fine structure contribute to speech intelligibility and play an important role in quiet and background noise, respectively. In this thesis, two speech signals are selected, one signal retains the envelope, the other signal retains the temporal fine structure, and then the envelope of one signal is combined with the temporal fine structure of the other signal to generate different speech chimera signals. Three methods are applied to evaluate the speech intelligibility, namely Spectro-Temporal Modulation Index (STMI), Neurogram Similarity Index Measure (NSIM), and Cross-Correlation Coefficients (CCC). This thesis describes these three methods in detail, in particular the creation of physiologically based assessment matrices, and then analyzes and compares the results by creating regression models of the predicted values of the different algorithms with experimentally measured subjective perceptions. This thesis shows that the combination of the STMI with either the fine-timing NSIM or the temporal fine structure CCC provides the optimal prediction model for speech chimera signals, and provides some implications for speech intelligibility research. |
Description: | Speech intelligibility is known as a measure of how much speech information the listener perceives. It is important to note that speech quality and intelligibility are not synonymous, good intelligibility can be achieved with degraded speech quality in some cases. If speech quality is about the ``how", intelligibility is about the ``what". In other words, speech quality is concerned with the ``how" the speech sounds - whether it is clear, lossless, noiseless, etc. While speech intelligibility is concerned with the comprehensibility of speech information, that is, whether the listener can accurately understand the ``what" in the speech signal, i.e., the conveyed messages or vocabulary. After understanding the concept of speech intelligibility, this thesis investigates various methods for predicting speech intelligibility. |
URI: | http://hdl.handle.net/11375/29764 |
Appears in Collections: | Open Access Dissertations and Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Li_Yujie_2024April_MASc.pdf | 6.29 MB | Adobe PDF | View/Open |
Items in MacSphere are protected by copyright, with all rights reserved, unless otherwise indicated.