The new service from Google Cloud Platform highlights how it is leveraging models and technology from the search giant's Deepmind subsidiary.
Google Cloud outlined Cloud Text-to-Speech a machine learning service that uses a model by Google's Deepmind subsidiary to analyze raw audio.
With the move, developers will get more access to the text to natural sounding speech technology used in Google Assistant, Search, Maps and others.
According to Google, Cloud Text-to-Speech can be used to power call center voice response systems, enabling the Internet of things devices to talk and converting text-based media into spoken formats.
Google Cloud Text-to-Speech allows customers to choose from 32 different voices in 12 languages. You can also customize for pitch, speaking rate, volume gain, and format.
The primary competition for Google Cloud Text-to-Speech will be Amazon Web Services' Polly, which enables 47 voices. Polly is also used for use cases in call centers and applications.
The rollout of the service also highlights how Google is leveraging Deepmind technology for Google Cloud Platform. The Deepmind technology used in Cloud Text-to-Speech is called WaveNet. A year ago, WaveNet would create raw audio waveforms from scratch using a neural network trained by speech samples.
When given text, WaveNet would generate speech from scratch one sample at a time for accuracy.
But with an update, WaveNet is running on Google Cloud's TPU infrastructure and can generate raw waveforms 1,000 times faster than before. Fidelity and speed allow WaveNet to create more human audio.
No comments:
Post a Comment