Google’s Duplex AI Can Independently Make Phone Calls

Google Duplex is an AI technology developed to conduct a natural conversation and completing ‘real-world’ tasks autonomously over the phone. It aims to make the conversations your Google Assistant would have with other people as natural and comfortable as possible so they don’t have to adapt to the machine. The technology is directed towards completing specific tasks, such as scheduling certain types of appointments.

It is often frustrating having to talk to stilted computerised voices that don’t engage in a natural conversation flow, therefore forcing the person to have to adjust to the machine. However, developing a system that adjusts to the person instead, comes with several challenges: natural language is difficult to apprehend, natural behaviour is tricky to model, latency expectations require fast processing and generating natural sounding speech appropriate intonations, is difficult. In natural spontaneous speech, people talk faster and less clearly than they do when they speak to a machine. This means that the system also has to incorporate speech disfluencies such as “umm”s and “uh”s. It’s also important for latency to match people’s expectations.

When people talk to each other, they use more complex sentences than when talking to computers. In normal conversations, the same sentence can have different meanings depending on the context. Additionally, during phone calls, this problem is aggravated due to background noises and sound quality issues.

Google has been working on solving this problem for many years with the aid of various combinations of a concatenative text to speech (TTS) engines (using Tacotron and WaveNet) they can control intonation depending on the circumstance to sound natural. At the core of Duplex is a recurrent neural network (RNN) designed to cope with these challenges, built using TensorFlow Extended (TFX). Google has immensely trained their Duplex’s RNN on a corpus of anonymized phone conversation data.

Incoming sound is processed through an ASR system. This produces text that is analysed with context data and other inputs to produce a response text that is read aloud through the TTS system.

Incoming sound is processed through an ASR system. This produces text that is analysed with context data and other inputs to produce a response text that is read aloud through the TTS system.

Use cases of Duplex include: booking at appointment or getting a reservation using your Google Assistant, reducing the number of no-shows to appointments (as a business) by getting Duplex to call and remind customers about their scheduled appointments and thirdly, it will allow hearing-impaired users or users who don’t speak the local language, to carry out more tasks over the phone.

A user asks the Google Assistant for an appointment, which the Assistant then schedules by having Duplex call the business.

A user asks the Google Assistant for an appointment, which the Assistant then schedules by having Duplex call the business.

Enterprise, GoogleGavin Aren