The Search Engine AI That Reads Your Books
Sogou is a Chinese search engine. The company which is now public, raised $585 million in its IPO, offering to accelerate the use of artificial intelligence in its business. As a part of this initiative, the search engine is creating an artificial intelligence model to read novels in the voice of their authors.
Additionally, the company stated that the model would also make virtual avatars that seem like they speak alongside the audio generated from the text to speech model. This is similar to the simultaneous video and audio replacing deep-fake video technology.
At the China Online Conference, the search engine firm announced that they will be making a "lifelike" avatar of the Chinese authors Yue Guan and Bu Xin Tian Shang Diao Xian Bing. The avatar is created from video recordings taken of the authors alongside audio recordings.
In the west, demand for audiobooks has surged since they have greater usability than books in most scenarios (eg you can listen to them while doing work). Similarly, demand for China has risen in recent years, however, authors might not have the necessary equipment or possibly even be unable to read out long passages clearly. Sogou's model aims to fix that by creating automatically generating "professional" audio of the authors talking.
Text to speech has been a feature of most applications for a long time, Google has an API that does exactly this. However, if used to generate an audiobook, the result would be very harsh and synthetic, this is because the audio generated from the programs tends to be very robotic sounding. Customers prefer a more professional and human-sounding voice, potentially a famous celebrity or ideally the author himself.
The improvement in artificial intelligence techniques and increased data gathering has meant that the potential for these AI models is always on the rise and maybe one day will match humans. And now, advances in machine learning and speech-to-text technologies mean that digitised voices are becoming more lifelike.
However, there are limitations to these models, it is very hard for AI to express emotion in its voice. For example, when reading a passage, the AI would have to determine the emotion of the passage and decide upon the tone and then generate audio in that tone. Currently, artificial intelligence struggles to detect intent in text. These are currently very young and as they mature, scientists will hopefully successfully tackle each issue they face.