Predicting Speech Intelligibility and Quality from Model Auditory Nerve Fiber Mean-rate and Spike-timing Activity
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This dissertation examines the prediction of speech intelligibility and quality using simulated auditory nerve fiber activity. The relationship of neural mean-rate and spike-timing activity to the perceptual salience of the envelope (ENV) and temporal fine-structure (TFS) of
speech is indistinct. TFS affects neural temporal coding in two ways. TFS produces phase-locked spike-timing responses and narrowband cochlear filtering of TFS generates recovered ENV. These processes, with direct encoding of ENV to mean-rate responses, are the established transduction processes. We postulate that models based on mean-rate (over a time-window of approx. 6 to 16 ms) and spike-timing cues should produce accurate predictions of subjectively graded speech. Two studies are presented.
The first study examined the contribution of mean-rate and spike-timing cues to predicting intelligibility. The relative level of mean-rate and spike-timing cues were manipulated using chimaerically vocoded speech. The Spectro-Temporal Modulation Index (STMI) and
Neurogram SIMilarity (NSIM) were used to quantify the mean-rate and spike-timing activity. Linear regression models were developed using the STMI and NSIM. An interpretable model combining the STMI and the fine-timing NSIM demonstrated the most accurate
predictions of the graded speech.
The second study examined the contribution of mean-rate and spike-timing cues for predicting the quality of enhanced wideband speech. The mean-rate and fine-timing NSIM were used to quantify the mean-rate and spike-timing activity. Linear regression
models were developed using the NSIM measures and optimization of the NSIM was investigated. A quality-optimized model with intermediate temporal resolution had the best predictive performance.
The modelling approach used here allows for the study of normal and impaired hearing. It supports the design of hearing-aid processing algorithms and furthers the understanding how TFS cues might be applied in cochlear implant stimulation schemes.