UCSF Team Reports Progress with Speech Prosthesis
by James Cavuoto, editor
April 2019 issue
One of the most tantalizing challenges that has confronted neuroprosthetics researchers has been developing a speech prosthesis that can restore a user’s voice using a brain computer interface. Such a system would offer significant value to people who have lost vocal function as a result of ALS, stroke, TBI, or other neurological disorders. One early researcher, Philip Kennedy of Emory University, was sufficiently motivated to perfect a phoneme-recognition algorithm using an implanted microelectrode that he chose have the device implanted in his own brain. Kennedy has reported findings from his own brain data at the Society for Neuroscience meetings.
More recently, a team of investigators at UC San Francisco reported significant progress developing a BCI that controls a virtual vocal tract to generate natural-sounding synthetic speech. Their approach uses an anatomically detailed computer simulation including the lips, jaw, tongue and larynx. Reporting in Nature, Edward Chang and collaborators demonstrated that neural activity in the brain’s speech centers could be used to drive a speech prosthesis.
“This study demonstrates that we can generate entire spoken sentences based on an individual’s brain activity,” said Chang, a professor of neurological surgery and member of the UCSF Weill Institute for Neuroscience. “This is an exhilarating proof of principle that with technology that is already within reach, we should be able to build a device that is clinically viable in patients with speech loss.”
The research was led by Gopala Anumanchipalli, a speech scientist, and Josh Chartier, a bioengineering graduate student in the Chang lab. It builds on a recent study in which the pair described for the first time how the human brain’s speech centers choreograph the movements of the lips, jaw, tongue, and other vocal tract components to produce fluent speech. From that work, Anumanchipalli and Chartier realized that previous attempts to directly decode speech from brain activity might have met with limited success because these brain regions do not directly represent the acoustic properties of speech sounds, but rather the instructions needed to coordinate the movements of the mouth and throat during speech.
“The relationship between the movements of the vocal tract and the speech sounds that are produced is a complicated one,” Anumanchipalli said. “We reasoned that if these speech centers in the brain are encoding movements rather than sounds, we should try to do the same in decoding those signals.”
In their new study, Anumancipali and Chartier asked five epilepsy patients with intact speech to read several hundred sentences aloud while the researchers recorded brain activity ECoG electrodes. Based on the audio recordings of participants’ voices, the researchers used linguistic principles to reverse engineer the vocal tract movements needed to produce those sounds: pressing the lips together here, tightening vocal cords there, shifting the tip of the tongue to the roof of the mouth, then relaxing it, and so on.
This detailed mapping of sound to anatomy allowed the team to create a realistic virtual vocal tract for each participant that could be controlled by their brain activity. This comprised two “neural network” machine learning algorithms: a decoder that transforms brain activity patterns produced during speech into movements of the virtual vocal tract, and a synthesizer that converts these vocal tract movements into a synthetic approximation of the participant’s voice.
The synthetic speech produced by these algorithms was significantly better than synthetic speech directly decoded from participants’ brain activity without the inclusion of simulations of the speakers’ vocal tracts. The algorithms produced sentences that were understandable to hundreds of human listeners in crowdsourced transcription tests conducted on the Amazon Mechanical Turk platform.
The transcribers were more successful when they were given shorter lists of words to choose from, as would be the case with caregivers who are primed to the kinds of phrases or requests patients might utter.