Objectives and outcomes
Understanding the principles and algorithms of computer speech recognition. Implementation of speech recognition solutions on different platforms. At the end of the course, students will be able to define different features of a speech signal, use the obtained features to train a suitable model, and use the trained model for speech recognition. They will be able to implement speech recognition software in different system environments (personal computers, dedicated computer systems, etc.).
Lectures
Speech modelling. Acoustic signal processing. Sampling, A/D conversion and framing. Filtering and window functions. Fourier transform and power spectrum of the input signal. Changing the frequency axis and Mel scale filtering. Transition to the logarithmic domain. The inverse cosine transformations, cepstral coefficients and their derivations. Feature vectors of an acoustic signal. Markov models. Hidden Markov Models (HMM). State and transition probabilities. Emission probabilities and Gaussian mixtures. The acoustic models. Phonetic modelling. Robustness in noisy environments. Semi-continuous HMM, tied states and clustering. HMM training. Baum-Welch and Forward-Backward algorithms. Speech normalisation. Language models. N-gram averaging. Basic search algorithms. Time-synchronous Viterbi beam search. Stack and A* search. Lexical tree search for large vocabulary speech recognition. Grammar-based search. Multipass search strategies. Use of neural networks and deep neural networks. Hybrid Deep Neural Network and HMM.
Practical classes
Spectrogram analysis using audio signal software. Implementation of software that can record speech. Sound signal compression algorithms. Signal processing, clipping, filtering, etc. Implementation and application of the (fast) Fourier transform. Implementation of a speech recognition system that is / is not related to the speaker and the formation of tests to demonstrate the correctness of the system’s operation. Implementation and testing of HMM systems as well as search algorithms. Working with deep neural networks and combining them with classical speech recognition systems.