Speech Processing - Nonlinear Models and Tools
The linear approach to speech processing has dominated speech technology ever since it's inception [1]. However, it has been more recently recognised that there are "hidden dynamics" in speech that cannot be detected by linear signal processing tools. These dynamics appear to take the form of nonlinear and long-scale correlations, and manifest themselves as extremely subtle changes in, for example, speech pitch and amplitude [2].
We have developed a very sensitive statistical approach for testing against the null of linearity.
This approach allows us to identify these hidden dynamics that arise from underlying nonlinear
dynamics [3]. By measuring the linear correlations in the speech signal, randomised, replica signals
can be generated that are forced to contain only linear correlations. Then, using a calibrated
nonlinear statistic, we measure the nonlinear correlations in both the original and the replica signals.
By doing this, we can accurately detect the correlations due to nonlinear dynamics. We are also able
to estimate the confidence level of this technique. The thick line in the figure to the left shows the
nonlinear measure applied to the original signal, and the filled grey area shows the range of values
that would be expected, at the 95% confidence level, if we used a linear measure instead. The results
are presented versus the time delay. They differ by a large amount, implying that we can reject the
null hypothesis and conclude that the signal was most likely generated by a nonlinear process,
which is undetectable using linear methods.
Without taking account of these dynamics, it turns out, synthesised speech sounds very unnatural
[4]. Because the acoustic effects are very subtle, it is usually difficult to detect the specifics of the
dynamics using the unaided human ear. Yet it is these specifics that probably help differentiate one
individual from another. Therefore, this test could be used as a reliable biometric marker. It could
also be of use in biomedical applications such as speech therapy.
We have produced new nonlinear speech signal processing methods based around the use of discrete variational integrators. Variational integration names a class of numerical integration techniques that respect the energy properties of particular models [5]. We have produced a model of nonlinear vocal fold oscillation and, using the variational integration method, constructed a nonlinear predictor for speech signals [6]. This allows effective nonlinear model parameter estimation which could be used in speech analysis tasks such as biometrics and speech compression.
For more details, please contact Max Little.
.[1] P. Kroon, W. Kleijn (1995), Linear-prediction based analysis-by-synthesis coding in Speech Coding and Synthesis, W. Kleijn and K. Paliwal, Editors, Elsevier: Amsterdam; New York. pp. 79-119.
[2] G. Kubin (1995), Nonlinear processing of speech in Speech Coding and Synthesis, W. Kleijn and K. Paliwal, Editors, Elsevier: Amsterdam; New York. pp. 557-610.
[3] M.A. Little, P.E. McSharry, I.M. Moroz, S.J. Roberts (2006), Testing the assumptions of linear prediction analysis in normal vowels. Journal of the Acoustical Society of America, 119(1): pp. 549-558.
[4] J. Hanquinet, G. Francis, J. Schoentgen (2005), Synthesis of disordered voices in Proceedings of 3rd International Conference on Non-Linear Speech Processing, NOLISP'05: Barcelona, Spain. pp. 168-173.
[5] E. Hairer, C. Lubich, G. Wanner (2002), Geometric numerical integration : structure-preserving algorithms for ordinary differential equations. Springer series in computational mathematics, 31, Berlin ; New York: Springer. xiii, 515 p.
[6] M. Little, P. McSharry, I. Moroz, S. Roberts (2005), A simple nonlinear model of vocal fold dynamics in Proceedings of 3rd International Conference on Non-Linear Speech Processing, NOLISP'05: Barcelona, Spain. pp. 188-203.