ICT COST Action 277
Nonlinear Speech Processing. Chair: M. Faundez, Spain
Descriptions are provided by the Actions directly via e-COST.
As stated in the Memorandum of Understanding, the main objective of the Action was to improve the voice services in telecommunication systems, through the development of new nonlinear speech processing techniques.
To achieve its main goal, the Action initially split the work into four directions, namely:
ΓΆβ‚¬Λ˜ Speech Coding;
ΓΆβ‚¬Λ˜ Speech Synthesis;
ΓΆβ‚¬Λ˜ Speaker Identification and Verification;
ΓΆβ‚¬Λ˜ Speech Recognition.
Furthermore, Voice Analysis and Conversion as well as Speech Enhancement also received partial coverage during the Action.
The main results are as follows. Within the speech synthesis area two key approaches have been described, both based around a nonlinear oscillator approach. In addition, discussions have taken place to try and decide which sort of phonemic database would be most appropriate. In the analysis area, a new analysis model which is likely to have impact on the synthesis area was proposed. In the coding area, the development of techniques suitable for packet based systems started, and an approach trying to apply speaker recognition techniques to speech coding also started. In the speaker recognition field, the relevance of Bandwidth extension algorithms, blind inversion of nonlinear distortions, new non-linear parameterizations based on neural networks, and the relevance of watermarking on speaker recognizer systems were all tested and reported in mainstream journals and conferences.
A short selection of the areas where the methods studied by the Action can benefit international science and technology are:
ΓΆβ‚¬Λ˜ Security, Crime investigation (Speaker identification/verification);
ΓΆβ‚¬Λ˜ Interactive multimedia services on packet-switched networks such as the evolving mobile radio networks or the Internet, Voice over IP (Speech coding);
ΓΆβ‚¬Λ˜ Human-Computer Interfaces (Speech Synthesis/Speech Recognition);
ΓΆβ‚¬Λ˜ Applications for the Blind (Speech Synthesis/Speech Recognition);
ΓΆβ‚¬Λ˜ Educational Applications (Speech Synthesis/Speech Recognition);
ΓΆβ‚¬Λ˜ Clinical Phonetics Applications (Voice Analysis and Conversion);
ΓΆβ‚¬Λ˜ Mobile Telephony, Voice transmission over noisy channels (Speech enhancement).
* content provided by e-COST.
Data is synchronised once per night.