EVALUATING THE INTELLIGIBILITY OF SPEECH CHIMAERAS BASED ON ENVELOPE AND TEMPORAL FINE STRUCTURE CUES

Li, Yujie

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/29764

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Bruce, Ian	-
dc.contributor.author	Li, Yujie	-
dc.date.accessioned	2024-05-07T19:37:13Z	-
dc.date.available	2024-05-07T19:37:13Z	-
dc.date.issued	2024	-
dc.identifier.uri	http://hdl.handle.net/11375/29764	-
dc.description	Speech intelligibility is known as a measure of how much speech information the listener perceives. It is important to note that speech quality and intelligibility are not synonymous, good intelligibility can be achieved with degraded speech quality in some cases. If speech quality is about the ``how", intelligibility is about the ``what". In other words, speech quality is concerned with the ``how" the speech sounds - whether it is clear, lossless, noiseless, etc. While speech intelligibility is concerned with the comprehensibility of speech information, that is, whether the listener can accurately understand the ``what" in the speech signal, i.e., the conveyed messages or vocabulary. After understanding the concept of speech intelligibility, this thesis investigates various methods for predicting speech intelligibility.	en_US
dc.description.abstract	Speech intelligibility is a measure of the human ability to understand speech signals. While the speech signal can be decomposed into its envelope and temporal fine structure, where the envelope is the contour of the signal amplitude over time, revealing the rhythm and intensity of the speech signal, the temporal fine structure is the rapidly oscillating portion of the signal that carries pitch and timbre information. Studies on speech intelligibility have shown that the acoustic envelope and the temporal fine structure contribute to speech intelligibility and play an important role in quiet and background noise, respectively. In this thesis, two speech signals are selected, one signal retains the envelope, the other signal retains the temporal fine structure, and then the envelope of one signal is combined with the temporal fine structure of the other signal to generate different speech chimera signals. Three methods are applied to evaluate the speech intelligibility, namely Spectro-Temporal Modulation Index (STMI), Neurogram Similarity Index Measure (NSIM), and Cross-Correlation Coefficients (CCC). This thesis describes these three methods in detail, in particular the creation of physiologically based assessment matrices, and then analyzes and compares the results by creating regression models of the predicted values of the different algorithms with experimentally measured subjective perceptions. This thesis shows that the combination of the STMI with either the fine-timing NSIM or the temporal fine structure CCC provides the optimal prediction model for speech chimera signals, and provides some implications for speech intelligibility research.	en_US
dc.language.iso	en	en_US
dc.subject	speech intelligibility	en_US
dc.subject	envelope	en_US
dc.subject	temporal fine structure	en_US
dc.subject	mean-rate	en_US
dc.subject	fine-timing	en_US
dc.subject	speech chimaeras	en_US
dc.title	EVALUATING THE INTELLIGIBILITY OF SPEECH CHIMAERAS BASED ON ENVELOPE AND TEMPORAL FINE STRUCTURE CUES	en_US
dc.type	Thesis	en_US
dc.contributor.department	Electrical and Computer Engineering	en_US
dc.description.degreetype	Thesis	en_US
dc.description.degree	Master of Applied Science (MASc)	en_US
dc.description.layabstract	In daily life, being able to hear and understand the speech of people around us is the basis for normal communication and life. In order to investigate how people use different information in the speech signal to understand speech, this thesis decomposes the speech signal into envelope and temporal fine structure, where the envelope is the contour of the signal amplitude over time, which provides information such as syllable rhythms and intonation patterns, while the temporal fine structure is the fast oscillating part of the signal, which carries pitch and timbre information. In this thesis, the envelope and temporal fine structure of different signals are combined to produce speech chimaeras as test datas, and speech intelligibility is evaluated using different algorithms to select the best performing model, in an effort to contribute to the study of speech intelligibility.	en_US
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Description	Size	Format
Li_Yujie_2024April_MASc.pdf Open Access		6.29 MB	Adobe PDF	View/Open

Show simple item record