From algorithmic trading systems running Wall Street to computer-generated sports reports and even apps that can recognise a song playing in a restaurant, there’s no question that artificial intelligence (AI) is fundamentally changing our world. However, machine learning systems bring their own host of issues – and big tech is attempting to solve these problems as it develops new applications. “As we develop this technology, which has the power to influence and impact positively thousands or millions of people, we have to be mindful that there may be risk associated with its misuse,” says Olivier Bousquet, Lead at Google AI Europe.
At Google’s AI in Action conference, held last month at its Zurich office, we spoke to some experts about a range of opportunities and challenges raised the technology, from its potential to give a voice to those who can’t speak to the spectre of reinforcing societal stereotypes.
AI: A bridge for communication
One of the positive applications for AI is they ways it can help deaf people who have speech impediments to communicate. Dimitri Kanevsky is a Research Scientist with Google. Deaf and with a severe speech impediment, Kanevsky was nevertheless able to give a keynote presentation and answer media questions with the help of Project Euphonia. Kanevsky spent about 25 hours recording 15,000 phrases for Euphonia. Now, he can speak into a smartphone and people in the room can read what he is trying to communicate as it appears on a screen in real time. “People who have MS, are deaf, have had a stroke are used to train speech recognition models. If you have enough different voices recorded, we start to have clusters for every person. If a new person comes, maybe they can immediately use a cluster that fits them, or maybe they only need to add a few hours of input.”
The technology relies upon a neural network, which relies on converging branches of maths and science: probability theory, physics, algebra and geometry.
Euphonia is one of many accessibility initiatives in the works at Google built on AI and neural networks. Another is Parrotron, which helps people with non-standard speech interact with voice assistants such as Home. “Parrotron trains and helps to directly map one audio set to another.” There’s also a tactile device that simulates audio into a series of taps on the user’s arm.
But it’s the combination of Live Transcribe and Euphonia that really excites Kanevsky. “This is an absolutely revolutionary system for parents who have deaf babies and for speech therapists who now have new opportunities. It’s very difficult to train such a person to speak well enough that another person understands them, but it’s much easier to train a person who understands. Speech therapists now understand that instead of spending many hours and years trying to teach a deaf person to speak, they can spend much less time teaching them to speak… Parents, for the first time, now have an opportunity for their babies. They can be fully integrated into school and university, because they now have a system to communicate. They’re also much more motivated to start speaking.”
Breaking language barriers
There are parallels between Kanevsky’s work and that of MacDuff Hughes, Engineering Director for Google Translate. Instead of taking manual input from speakers of various languages, Google Translate parses the web to find freely available translations and parallel data between languages. “Massive multilingual machine translation is attempt to put an order of magnitude more languages into a single neural network than we’ve done before. Our big hope is that we’ll develop models that can generalise what it means to understand a language, what it means to learn a language, and hopefully move from the hundreds into the eventually thousands of languages.”
One of the problems
with a translation model based on parsing the internet for examples is its
potential to reinforce certain societal stereotypes. Hughes offers the examples
of translating the words doctor and secretary from English to German. “If I’m going to translate ‘the doctor’, it would be valid to say der
arzt [male doctor] or die Ärztin [female doctor]. The
basic design of machine learning systems like ours is to find the most likely
“And when it looks at doctor, it will say the most likely answer is masculine because that’s the scenario represented by the majority of online examples. If you say the secretary, it might say the most probable answer based on all the text we’ve seen is sekretärin, a feminine. And that’s fine once if but when we do this millions of times, we’re reinforcing some societal stereotypes, and we prefer not to do that.”
While Translate used to require an active internet connection to work, the ability to download language packs takes the process offline – something that speeds up translations and, arguably more importantly, doesn’t require any communication between your phone and Google’s servers.
Françoise Beaufays is a Principal Scientist at Google, where she leads a team of engineers and researchers working on speech recognition and mobile keyboard input. One of the quandaries with Google’s prediction algorithm, which suggests the next word in a sentence based on a user’s language patterns, is swear words. While the onscreen keyboard needs to offer users the freedom to say what they like, many would be offended by being suggested expletives. “What we ended up doing is making the keyboard very conservative in what it can present as a words by default, but if you go to your settings, you can say ‘I’m okay with swear words’ and then you’ll be able to type those words without being autocorrected away from them, but we still kept in our logic something that prevents swear words to be predicted.”
As with Google Translate and Project Euphonia, the Google Keyboard (GBoard) relies on Federated Learning – your device handles the processing of requests without connecting to a Google server, and can suggest words to you even if your phone is in airplane mode. “By accessing the data right on people’s device without exposing it to the Google servers, we could train a model that matched their use cases better,” explains Beaufays.
Predicting the words and phrases you’ll type next is one thing, but suggesting emojis may be another matter. “After launching the next-word prediction model, we did something similar for emoji prediction.
“The main complications here are, if you think about it, each of us tends to use a very small number of emojis. There are thousands of them, but they’re hard to search for. And so having a good emoji prediction model that can bring in the right context and give you a little bit of variety is something very pleasant to use.”
AI: Battling bias
Bousquets offers a metaphor by way of example. “Imagine you’re trying to train a machine learning system on how to recognize pictures of fruits. You might go to your local supermarket and take pictures of oranges and apples. However, your AI system may not be so good at recognizing tropical fruits because your supermarket doesn’t have them. Similarly, if your pictures of apples show them neatly racked up on shelves, the software may not recognize one hanging from a tree.”
His example highlights AI problematic issues of recognising people with different skin tones, accents or clothing. “There is a lot of work [to be done] on defining precisely what is fairness, defining or understanding how an algorithm can come to be biased and how to fight that, developing algorithms that are not biased.”