Real-Time AI-Assisted Voice Reconstruction
University of Kent academics develop real-time AI-assisted voice reconstruction for the voice impaired
Speech is the most natural, efficient and convenient mechanism for people to interact with others. We tend to take it for granted, yet 20% of the UK population will experience speech communication difficulties at some point in their lives. While there are numerous potential medical causes, the effect on speech is confined to impairment of one or more vocal articulators; namely mouth, tongue, vocal tract, glottis or lungs. These effects are often very well understood.
This EIRA funded Proof of Concept project aimed to develop a wearable computer system utilising AI and Machine Learning techniques to reconstruct the voice, using a miniature “system on a chip” connected to a microphone to pick up a users’ impaired speech, analyse the signal, identify impaired components in the speech and repair those computationally, then regenerate the reconstructed speech audibly through a loudspeaker.
The reconstruction of the missing or damaged portion of the vocalisation is complex with the impaired aspects of speech, for example, someone with pitch-loss due to glottal damage will have different characteristics to someone with a different vocal impairment, such as vocal tract resonance loss, nasality or plosive formation.
The speech analysis and speech synthesis involved in this process needed to be virtually instantaneous – as near real-time as the system could support, and so optimisation of the algorithms and voice synthesis was of high importance. Research conducted up to now has resulted in a collection of well-characterised reconstruction algorithms, which would be deployed as part of the solution.
The lead researcher from the University of Kent was Prof. Ian Mcloughlin, who has an extensive research background in this field, with numerous papers on the topic published internationally. He applied a great deal of prior knowledge in developing the solution utilising recent developments and research. The initial hurdle was finding a hardware platform with sufficient processing power to support the analysis and reconstruction algorithms.
Once the hardware platform was selected, developing and testing the machine learning and reconstruction processing happened relatively quickly. Following this stage of development comprehensive testing, refinement and calibration was carried out. The testing was conducted on two groups, first by participants simulating voice loss and second by patients at a clinic specialising in the treatment of voice lost conditions.
Prof. Ian Mcloughlin said:
“Technology has touched our lives in so many ways in recent years, including amazing advances in speech processing algorithms to enable anywhere-anytime telephony and conversational personal assistants. However, the technology has been very slow to impact speech impairments – and so in this project we pushed the cutting edge forwards a little to demonstrate how those same kinds of algorithms can be used to assist those who have one common class of speech impairments.”
The team achieved their main objective of developing a prototype wearable device, containing the basic features of a real-time system capable of transforming an impaired voice, analysed, reconstructed and relayed via loudspeaker, while the person is speaking. However, the prototype needs refinement as it is currently large and power-hungry, but it is easily wearable. The resulting speech is quite clear, but artificial sounding.
During development it was discovered that different microphones resulted in different output levels and characteristics, so it was necessary for optimisation to reconfigure the system for each input device at start-up to account for this. The system works more or less as expected and is undergoing initial trials both in the UK (volunteer simulated trials) and in New Zealand (clinical conditions).
Further work is needed to tune the output to the individual. In terms of operating quality, the prototype doesn’t perform as anticipated, but further refinement is considered feasible. The system has been demonstrated at conferences in France, Singapore and China. The device was also demonstrated to a large audience at Interspeech 2019 (in Graz, Austria in September 2019). A paper has been published “Glottal Flow Synthesis for Whisper-to-Speech Conversion” in the IEEE Transactions on Audio Speech and Language Processing journal.
The prototype Proof of Concept device has achieved its main objectives of proving that a miniaturised AI / Machine Learning system could reconstruct an impaired voice. The next stage of development would be to seek funding and a business partner to support further development of a potential commercial system, which could be easily deployed into a clinical environment.
Photo by Jason Leung on Unsplash