Jessie Levine smiles and shakes her head when she hears the outgoing voicemail message on her iPhone.
“I sound young! And fast!” she marvels. “That person never, ever expected to talk like this.”
The message was recorded before Levine was diagnosed with Lou Gehrig’s disease, or ALS, in early 2015, and before the progressive motor neuron disease caused her speech to become slow and slurred. But as her ability to talk deteriorates, she’s exploring a new way to restore her voice via speech synthesis, or the artificial production of human speech.
The technology has been around for decades, but as devices shrink in size, efforts to customize them are expanding. Multiple companies and research groups are using speech synthesis engines to create voices from spoken samples, usually thousands of recorded sentences.
For example, CereProc, based in Edinburgh, Scotland, created a voice for the late film critic Roger Ebert several years before his death in 2013 by mining commentary tracks he’d recorded for movies.
But VocaliD, a Belmont, Massachusetts, company, is taking a different approach by creating custom voices using just a small sample from the recipient, even if they can’t speak.
Starting with just a tiny snippet of someone’s voice—a few seconds of saying “Ahhhh”—the company matches recipients with a “donor voice”—in Levine’s case, maybe a relative—and then blends the two together. The result is a sound file that can be plugged into any text-to-speech device.
“I have two sisters, one of whom has a lisp like I have, which I had before I had ALS. The other one, we all have this stuffiness to our speech,” said Levine, 45, the manager of Sullivan County, New Hampshire. “It never occurred to me that I could use their voices, adapt it to me, and then be able to use that.”
Company founder and CEO Rupal Patel is a speech technology professor on leave from Northeastern University. Her research found that people with severe communication disorders preserve the ability to control aspects of their voices, such as pitch and loudness. Those characteristics—what Patel calls the “melody of speech”—are also important for speaker identity, she said.
“There is a level of empowerment that comes with having the freedom to be able to communicate in your own voice, and that’s such an important thing, which I think has been overlooked,” Patel said.
To read the rest of this article, published in Phys.org, please click here.