The presence of background noise or nonlinear distortions encountered in real-world situations often reduces the intelligibility of speech signals. Several objective measurements and prediction procedures have been developed to assess speech intelligibility in noise. Most of the existing measures are, however, suitable for only a subset of specified forms of distortion. This study developed a reliable, reference-free speech intelligibility metric that uses the properties of an acoustic signal to predict the effects of a wide range of distortions that influence speech intelligibility in quiet and noisy conditions. The bispectral speech intelligibility metric (BSIM), was developed by extracting the features from the spectrogram of speech signals using the third-order statistics, which are collectively known as the bispectrum. Speech intelligibility scores predicted by the BSIM were compared to behavioral speech intelligibility scores in quiet and noise. The performance of the BSIM was also compared with that of several widely used speech intelligibility metrics. Results showed that the BSIM can successfully predict nonlinear distortions, such as peak-clipping and center-clipping, as well as time domain distortions, such as phase-jitter and reverberation. Unlike existing metrics, such as the articulation index and speech transmission index, the BSIM successfully captured the effect of fluctuating noise on speech intelligibility and predicted the effects of the degradation of noisy speech processed by the ideal time-frequency segregation method. The BSIM presents a reliable, reference-free, and objective measure of speech intelligibility that can provide real-time predictions of the effect of signal processing and acoustics distortion on speech intelligibility in quiet and noise. In addition, the BSIM could be used to analyze algorithms that process noisy speech.
- Higher order statistics
- Speech intelligibility