NeuroSpeech

AI Gives Voice to the Voiceless: New System Turns Brain Signals into Speech

đź”· Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

In the annals of I just can’t believe it, you can now read someone’s brain, attach a voicebox, and then output speech. This research used an electrode brain implant in epilepsy patients, so Elon’s Neuralink will definitely look to do this one day.

Who: A team of researchers from New York University, spanning across electrical engineering, biomedical engineering, neurology, and neurosurgery departments. This collaboration brings together expertise in machine learning, speech processing, and clinical neuroscience.

Why:

  • To develop a brain-computer interface (BCI) that can restore communication for people who have lost the ability to speak due to neurological conditions.

  • Current speech decoding systems face challenges with limited training data, complex speech variations, and unnatural-sounding synthetic voices.

  • This research aims to address these limitations and create a more accurate and natural-sounding speech prosthesis.

How:

  • The researchers developed a new framework with two main parts:

    • ECoG decoder: This component uses deep learning models (like ResNet, Swin Transformer, or LSTM) to translate brain signals (electrocorticography or ECoG) into a set of interpretable speech parameters such as pitch, loudness, and formant frequencies.

    • Speech synthesizer: This component takes the speech parameters and generates a spectrogram (a visual representation of sound frequencies), which is then converted into a natural-sounding voice.

The ECoG decoder generates timevarying speech parameters from ECoG signals. The speech synthesizer generates spectrograms from the speech parameters. A separate spectrogram inversion algorithm converts the spectrograms to speech waveforms.

  • The system is trained in two steps:

    • Speech-to-speech auto-encoder: This step uses only speech data to pre-train a speech encoder and the speech synthesizer. The encoder learns to extract speech parameters from spectrograms, and the synthesizer learns to recreate spectrograms from these parameters.

    • ECoG decoder training: The ECoG decoder is then trained using both brain signals and the pre-trained speech encoder's output to predict the speech parameters that best match the intended speech.

What did they find:

  • The system successfully decoded speech from 48 participants with high accuracy, even when limited to using only past information (causal decoding), which is crucial for real-time applications.

  • Convolutional (ResNet) and transformer (Swin) models achieved the best performance, outperforming recurrent (LSTM) models.

  • The system worked well with both high- and low-density electrode grids, suggesting it could be implemented with less invasive implants in the future.

  • Excitingly, the system also decoded speech from the right hemisphere, offering hope for patients with left-hemisphere damage (the typical speech center) who have lost speech ability.

, Comparison between left- and righthemisphere participants using causal models. No statistically significant differences

What are the limitations and what's next:

  • The current system requires training data with paired brain signals and speech recordings, which may not be available for some patients. Future research will explore ways to train the system using imagined speech or older speech recordings.

  • The research focused on single-word decoding. The team plans to extend the system to decode continuous speech and sentences.

  • Further investigation is needed to confirm the viability of right-hemisphere decoding in patients with left-hemisphere damage.

Why it matters:

  • This research represents a significant step forward in developing speech prosthetics for people who have lost the ability to speak.

  • The high accuracy, natural-sounding voice, and potential for less invasive implants offer hope for improved communication and quality of life for these individuals.

  • The ability to decode speech from the right hemisphere opens new possibilities for treating patients with damage to the left hemisphere.

Additional notes:

  • The research was published in Nature Machine Intelligence, a prestigious scientific journal.

  • The research team has made their decoding framework open-source to encourage further development and collaboration within the scientific community.

1  Chen, X., Wang, R., Khalilian-Gourtani, A. et al. A neural speech decoding framework leveraging deep learning and speech synthesis. Nat Mach Intell (2024). https://doi.org/10.1038/s42256-024-00824-8

Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.