Innovation Heroes

TRANSCRIPT - The ABCs of Teaching AI to Talk

June 8, 2021

Peter 

This episode is brought to you by Cisco Webex. Visit shi.com/cisco to get started.

 

Matthew 

Evolution prefers simplicity. Evolution prefers elegance. And that-- the kind of underlying mechanisms of language have this kind of deep simplicity to them.

 

Peter 

Welcome to SHI's Innovation Heroes, a podcast exploring the people and businesses giving us hope in our drastically disrupted world. I'm your host, Peter Bean.

[music plays]

 

Peter 

We're more interconnected than ever before. But one thing we continue to struggle with is making the way we interact with technology easier to do-- and more human. This human element is so crucial when it comes to living with technology. Just ask the crew in "2001: A Space Odyssey".

 

Dr. Bowman 

Open the pod bay doors, HAL.

 

HAL 9000 

I'm sorry, Dave, I'm afraid I can't do that.

 

Peter 

The idea that language can sometimes be a barrier to worldwide connection isn't new. But letting computers take the lead on helping us bridge that gap is. And there's a whole world of exciting innovation heroes out there to prove it.

[music plays]

 

Peter 

In today's episode, we're bringing you not one, not two, but three guests. First, we'll be speaking to Sonny Hudson, collaboration specialist for partners at Cisco. He's going to tell us about the amazing things Cisco is doing for the future of language, and with Webex. Then we'll be speaking to two linguistics professors from Rutgers University to nerd out on how the tech actually works, and how it's just getting started. So, without further ado, I'd like to welcome in our first guest, Cisco's Sonny Hudson. Sonny, welcome to the show.

 

Sonny 

Thanks, Peter. Thank you for having me.

 

Peter 

You know that I've been deeply invested in video conferencing for years, right? We talk about it all the time, way before the pandemic caused it to rapidly accelerate as our primary mode of communication. Can you talk about some of the exciting new capabilities that Cisco is bringing to market, and how the pandemic accelerated them?

 

Sonny 

For years, as you said, we've been trying to push video as our main method of communicating with people, because people get so much more from it than just audio only. But we found, since the pandemic, that we can actually use it to make places safer, more secure, better-- from a pandemic standpoint, and a returning to work standpoint-- by the technologies we can integrate into that video. For example, we can do things like making sure that people can see for themselves if they have too many people in a room. They can set it in advance, so that a conference room might be good for, say, four people or five people, but then if they exceed that number of people they can actually see something on the screen to tell them, "You've really got too people to be safe, in view of the pandemic, and cleanliness." They can also see information on things like when was the last time the room was sanitized for their safety. A lot of these things just have led us to let video be the main way to communicate with people about how they can stay safe, communicate those messages, make sure that they're looking at it in a way that this is more than just a one-on-one device, or even a one-to-many devices. It's something that's gonna really help them as we go into our return to work in our hybrid workplace.

 

Peter 

I've also noticed that Cisco's been making investments in the last couple of years into voice and voice AI. You already have a leadership position when it comes to voice-controlled room systems. Is there anything exciting coming down the road from Cisco in that respect, as well, that might help customers with their return to office?

 

Sonny 

I think it what you speak to is very important. Anything we can do to help people minimize the times they have to touch something, touch the doorway, touch the control panel you'll find in an empty room, all these kinds of things, we really like to do that to protect them during the times of the pandemic and the return to work to make sure that we don't backtrack on all the progress we've made to opening offices up. So, to that end, we do a lot of things-- like, we can actually start and stop meetings with our voice. We don't even have to touch the control panel, touch a PC, et cetera. We can just say something as simple as, "Webex, start my meeting," or "Webex, stop my meeting." So, we're going to continue with that, we're going to continue to expand on those kinds of capabilities to make sure that everything we do will make it so that it can be intuitive for the person to be able to control their meetings, control their interactions, start the meetings, stop the meetings, expand the meetings, schedule the meetings. Everything we can do to help make it voice activated and less reliant on touch.

 

Peter 

Yeah, it's definitely a huge plus. I want to ask you about a topic that I'm very passionate about, something that that really matters to me. One of the silver linings of the pandemic is how we've all been able to come together in new ways, right? The conversation around accessibility and inclusivity have jumped lightyears ahead. Do you think the pandemic has changed the way tech companies like Cisco look at accessibility, and making the world an easier place to work in for everyone?

 

Sonny 

I think there's actually been many things that have helped drive the need, and the recognition of the need for inclusivity, and making sure everyone's available to get the most of the business communications from their technology. For example, when you look at things like Webex meetings and the video conferencing associated with it, making sure that people can stretch the picture, if you will, to decide what the layout should be. How many people do they need to see? Do they need to make sure that they've pinned someone who's maybe an interpreter for someone that is hearing impaired, so that the American Sign Language interpreter is always in view so that they can be seen and shared? Considerably, we've seen the need that-- now that we're not all sitting in a room together as often in meetings, if we're going to be working remotely, or as will now be coming true, more of a hybrid workplace where some people are in offices, some continue to be remote, making sure that people can hear, and see, and learn in their chosen and favorite language. When everyone's in a room together, they kind of make do, and maybe they can see hand gestures or, you know, ask the person next to them, "What did they say? Because it wasn't in a language I understand as well." At Cisco, we've taken it to the point where-- like, right now when I'm talking, if someone is more of a French speaker as a native language, or Russian speaker as a native language, whatever it may be, we've added over 100 other languages that people can hear what I'm saying in English, they can hear their chosen language. And very soon we'll be flipping the script a little bit, not just as English going out to other languages, many other languages coming into English. It's all about inclusivity, making sure everyone's able to participate, everyone's able to better learn and understand in the language that they're most comfortable with. And again, pandemic has driven a lot of awareness around that, especially when it's come to things like the education market, our kids learning in remote ways instead of in the classroom every day. It's really changed everyone's view on inclusivity, and how to make sure that we're doing everything we can to make sure that they can participate and learn the best way they can.

 

Peter 

What do you see as some of the biggest barriers that still exist in the field of computer language, and what would you change today if you had a magic wand?

 

Sonny 

You know, I think some of the biggest barriers-- and I think in the technology field that we're in, really points it out. It's all the acronyms we have, just as one shining example. We use so many acronyms-- and all businesses and industries do. Natural language doesn't always know what to make of them. Is that a word, or is that an acronym? For it to figure that out, I think, takes some learning. So, I'm not sure how the smart people would make it work. But I would envision something like, let's help that machine learning by allowing a person actually go in and maybe define what some acronyms are. Let's talk about what those acronyms are, what the acronym stands for, what it actually means, and maybe even how it sounds so that it would recognize-- the next time someone says that acronym out loud, it would recognize it as a sort of sub word, if you will, so it would know what to make of it and record it correctly. I think that's important. I think the other thing that's important to understand is almost setting aside how many different languages there are in the world, and then how many people want to hear different languages, even within English here in the United States we have so many different regional dialects, city, and regional accents. I have a little bit of a Southern accent. What are we going to do to make-- to take that into account and hope that the machine learning gets a little bit better at that hearing those mistake things-- not our little idioms that are from the South or from the North, but actually just when saying the same words, capture them when it's not all a Midwest newscaster type of cadence, if you will, in an accent. We need to learn from that so we can get the most from it. I think once we're doing that, we'll see more accuracy in let's just say simple English-to-English closed captioning, if you will, but more importantly-- today if you end up with a bad capture of what that English word is, you can imagine how it's now exacerbated when it goes to translate it into 100 and some other languages. It looks like it's inaccurate, but maybe the translation is very accurate, it's just accurate for the wrong word, because it captured it wrong in English. So, I think we've got still work to do there. I think artificial intelligence and machine learning with some human guidance and direction, I think can help it accelerate that learning.

 

Peter 

I want to play futurist with you for a second. What do you see Webex looking like-- and in general, the way we talk to computers and interact with computers in, say, 20 years?

 

Sonny 

Well, one, I think the situation we just described in some of the accuracies of the learning, I think we will far have exceeded that and corrected for those-- for those challenges that we see today. And I think that will spread across all languages. I expect it will even come out that, as I'm speaking English right now, no one-- someone, who say, wants to hear this in French won't have to necessarily read a closed caption in French for the translation. I think they'll be able to hear my voice coming out in French on their end.

 

Peter 

That would be spectacular.

 

Sonny 

Exactly. They may also still want to see it in a print screen and be able to capture the statement, but I think they'd want to hear it in that language. I think by this-- by the same token that we'll see more and more things that even as we use our keyboards today for inputting much of the information, I think we'll see a continued rise of the ability of just talking and letting the computer, and then similarly Webex, take that information and broadcast it without the need for the keyboard input and everything. So, I think that will help accelerate this person-to-person, voice-to-voice type of world, where it's less touch, less finger input, if you will, and more just voice. Not just starting the meetings, conducting the entire meeting to voice doing any commands that you would now do for things such as muting the microphone, or changing speakers, or sharing documents, whatever it might be, probably become as simple as, "Webex, share my PowerPoint document with Peter." So, I think the future is very bright in voice capabilities.

 

Peter 

This episode is brought to you by Cisco WebEx.

[music plays]

I think we can all agree that the last year and a half has come with a lot of changes to the way we work. But the good news is, there's been some major developments to the tools we use to get work done. And they're not only making this shift feasible, they're giving us something so much better than what we had before. Take Cisco's Webex. With everything changing around us so quickly, it's amazing to see the ways Cisco continually ups the game at every single step. The new Cisco Webex is designed to keep you working and in the moment with all of its amazingly simple and powerful video and live collaboration features. And trust me, I use all the collaboration tools you can think of-- all of them. I particularly like Webex because they've shown me time and time again that they're always thinking about not only what I need to keep working, but also some things I haven't even thought of yet. Things like built-in artificial intelligence to help you with those repetitive tasks like taking notes or following up on actions. I also love the new live translation and transcription feature for over 100 languages, which if you listen to all of today's episode, you know will be a game changer across the globe. There's also the little things Webex just does great, like smart presence and fully integrated file sharing for all your content and workflows. Perhaps most importantly, Cisco Webex is optimized to help customers make the transition to the hybrid workplace that's quickly approaching as our new normal. And to do it with employee safety, security, and productivity top of mind. Simply put, Webex brings everyone together to do exceptional work. With one easy to use and secure app, it lets you call, message, and meet with the utmost quality for planned and impromptu meetings. And it's not just information workers like me that Cisco helps. WebEx is making it easier for healthcare workers, students, and businesses of all sizes to work together safer, smarter, and more intelligently than ever before. That's something we can all celebrate. If you're looking to learn more about Webex and where the world of work is heading, get in touch with SHI today to get started. Visit shi.com/cisco for more information.

[music plays]

 

Peter 

So that's a great example of what's really out happening in the world today. But I think it's time I dusted off my nerd cap for a deep dive into the theories, philosophies, and research making all of it possible. I'm joined by two of the brightest brains at Rutgers for a virtual fireside about AI learning, language, and the human brain.

 

Adam 

My name is Adam Jardine, I'm an Assistant Professor in the Department of Linguistics at Rutgers University and I study the computational principles that underlie sound patterns in natural language.

 

Matthew 

My name is Matthew Stone. I'm the Professor and Chair of the Computer Science Department at Rutgers, New Brunswick. My research is interfaces that communicate with you like you're talking to a person.

 

Peter 

So, you're definitely the right people to help us understand this technology and where it's going. So that's where I want to start. I was wondering if you could give our listeners a high-level explanation of how real time translation to text works, and how is it the same or different from teaching computers to actually talk?

 

Matthew 

So, for a real time translation system, you need to first recognize the words that the speaker is saying, find a corresponding sentence in the target language, and then realize the sounds of that language as waveform. And with today's technology, that involves large-scale statistical computation that's based on patterns analyzed from massive amounts of data. In many cases, it's data of pairs of texts in the source language and the target language together with vast databases of speech in the two languages, all analyzed using neural network methods to make as accurate predictions as possible.

 

Adam 

One of the projects that my advisor was working on at the University of Delaware before he left was robots that could interact with children on the autism spectrum, giving children who have difficulties with social interaction and language partners that can always be there and help them practice and learn those skills. But of course, you know, that that does require agents that can use language naturally like a human would. And we're making great strides in that kind of research, but it's still very much ongoing.

 

Peter 

Digging into how people are using this, I think there are some pretty obvious ones, right? You know, that most people are aware of. And I'm curious if you might be able to share some of the exciting applications that you see coming for this technology in the coming years?

 

Matthew 

Let me talk about tutoring. This is something that's probably more than 10 years away, but it's something that would be really powerful. I work in computer science, and of course our undergraduate courses are famous for being really hard. And yet, many of the ideas would be really attractive and empowering to people in high school, in middle school, even. And it's really difficult to learn those ideas well because it takes one-on-one interaction, and practice, and correction, and interaction to really learn that effectively. Some people are lucky enough to learn it from friends, from older siblings, or their parents, but the people that don't have that are really left out of the technology conversation. What if you could just learn from your computer? And that would mean understanding your concepts, understanding the mistakes that you're making, understanding the problems that will challenge you, correcting your solutions, but doing it fluidly, gently, and encouragingly in the way that an expert human tutor does. That's the best way to learn.

 

Peter 

Anything that would open up access to learning and education in that type of facet would be...I mean, it would be massive. People could learn anywhere, anytime, and like you said, without judgment. That would be huge. I'm curious if you can talk our listeners through some of the next key barriers that we have to overcome to bring us to a human level of virtual speech?

 

Adam 

You know, one of the big barriers is that for a language like English, we have tons and tons of data to work with. So, Google has amassed this corpus of all of the books written in English from, you know, 1800 onwards. It has, you know, massive stores of speech data. And so, we have that for standard varieties of English. But we don't have that kind of data for what are called "low resource languages", which are languages for which, you know, we just don't have that kind of data. These are languages with tons of speakers. So, these are languages like Arabic, Hindi, you know, you name any-- basically any language that's not English or Mandarin Chinese doesn't have the kind of data that current state-of-the-art models are working on, are using to learn language. And so, the next big barrier is well, how do we learn with less data?

 

Peter 

So, how do we learn with less data?

 

Adam 

It requires a learner who is more attuned to the specifics of the learning problem that it is being presented with. So right now, the state-of-the-art in practically any field in which machine learning and artificial intelligence is being applied is neural networks, which are these extremely general models that can tailor them, somewhat, to the learning problem at hand. But it's very hard to hard code specifics about the learning problem that you're attacking into neural networks. So, to talk about language, you know, language has a very specific structure. You know, that's-- as linguists, that's what we study. And so, if we are able to build learners that are aware of that structure ahead of time, then they're able to generalize much more quickly.

[music plays]

 

Peter 

Okay, HAL-ey, what do I have on my calendar?

 

HAL-ey 

[computerized voice] Hi, Peter. You have two podcast recordings, one dentist appointment, and a mani-pedi at 3pm.

 

Peter 

Awesome! Hmm, I'm going to need something to listen to at the salon. Hey, HAL-ey, subscribe and download the latest episodes of Innovation Heroes.

 

HAL-ey 

[computerized voice] Hang on. All right, Peter. I've added that to your grocery list.

 

Peter 

No, I said, "Subscribe me to the podcast Innovation Heroes by SHI."

 

HAL-ey 

[computerized voice] All right, Peter. Playing "We Can Be Heroes" by David Bowie. [music plays]

 

Peter 

No, I said subscribe me to--

 

HAL-ey 

You are now subscribed to "cat facts."

 

Peter 

[enunciating] In-no-vation--

 

HAL-ey 

[computerized voice] "Nation", a noun meaning--

 

Peter 

No, no! Stop!

[music cuts out]

 

HAL-ey 

[computerized voice] Alright. No need to get pushy.

 

Peter 

Oh, forget it. I'll do it myself. [notification pings] Done.

[music plays]

Avoid the pain and subscribe now to SHI's Innovation Heroes everywhere you get your podcasts. [music fades out]

 

Peter 

I did a little bit of research and reading into how computers learn, right? How humans learn. And one of the things I stumbled on is, you know, the way that kids learn language is really, really different than adults, than computers. It's a lot more efficient, a lot more effective. Is there something in that, in how children's minds work, that we can learn from to help us to accelerate this path?

 

Matthew 

So, lots of phenomena in nature actually arise from simple rules. And this has been known since D’Arcy Thompson. The form of seashells, or the pattern of petals on a flower, these are things that you can describe very simply. All indications are that language is the same kind of thing, in that there are these very elegant, formal grammars that are those kinds of simplest devices that have the kinds of properties, that anything like language could have, that turn out to describe natural languages like English unbelievably well. And that can't be an accident. It has to be because evolution prefers simplicity, evolution prefers elegance, and that the kind of underlying mechanisms of language have this kind of deep simplicity to them. And we actually see this in the sound patterns of language, and the syntactic structures of language, and even in the vocabulary, sort of semantic structure of language. So just because we don't know it yet doesn't mean that it's a deep, unsolvable problem, or a kind of ineffable magic any more than a seashell or a flower petal is. It's the same part of nature.

 

Peter 

Is there a better way for us to teach computers than just these huge data sets? You kind of mentioned that earlier, right, that we need a better way to learn. Is there one today, and are we close to one?

 

Adam 

So, there's been a lot of research on the linguistic side, at least in terms of sound patterns, on learning mechanisms that learn the very simple grammar formalisms. There are a lot of results recently that show that, you know, if the learner is only considering things that these grammars can produce, you know, if it's just trying to put together these simple pieces instead of trying to learn whatever, then it is possible to learn sound patterns, and it is possible to learn syntactic patterns, and, you know, the patterns of how we put words together into sentences. I think this is more on the scientific side and less on the engineering side in industry, but that may change soon as, you know, technology companies, you know, run into this wall of how much they can do with big data, or, you know, as they try to work more with low resource languages, et cetera. If you ask somebody at Google, I'm sure they would say that "Well, yeah, you know, we can do a lot with the current models," or you know, some sort of incremental improvement of the current models that we have. But if you ask me, then no, I don't think we're going to have substantial improvements until we start thinking about, you know, models that are more sensitive to the particular learning problems that we are working on. And so, with language, with models that are more sensitive to the particulars of linguistic structure.

 

Matthew 

I'm a big fan of incremental progress. I think there's nothing wrong with it. And Google's been excellent at making our lives gradually better and better through these kinds of incremental processes. You know, starting out with web search that-- where you have to match words, and then being able to use synonyms, being able to rely on other people's search results and browsing history. Technology becomes more and more useful because we learn a lot about people's use cases, about how to apply the machine learning techniques that we know and that work well, and in sort of having good design and a useful product that generates positive feedback loops by learning from user behavior. We shouldn't underestimate these things, and the potential to greatly improve the experience of using technology. Even if we know that we're not going to have Commander Data in our living rooms in 10 years.

 

Adam 

It should be pointed out that, I mean, Google, and Microsoft, and Apple have gotten-- and Facebook, made a lot of progress without understanding the particulars of language. It's incredible to me, actually.

 

Peter 

I'm curious about that. So how did they get so far without understanding how language works? Based on everything I've heard on this call, that's the key to getting this right.

 

Matthew 

Well, one of the most important reasons is just the sheer vastness of the data that they're working with. People learn language with tens of millions of words of experience. That's how much a child hears. Tens of millions of words sounds like a lot. And it does, it's like six years of listening to language. But contrast that tens of millions with the size of the corpora that Google is working with, which are in at least hundreds of billions of words. This is 100,000 years of listening to language. And because it's so big, essentially, lots of things happen and you've seen all of the frequent phenomena. And you can kind of memorize what happens. With some good statistics, you can do more than memorize, you can actually distill the essence of those vast numbers of patterns in ways that generalize productively enough so that you really are prepared for just about anything that is likely to happen.

 

Peter 

In our lifetimes, do you believe, based on everything you know, today, and the challenges facing us ahead of in the future, that we will achieve that level of-- and I hate to reference Star Trek-- actually, I love to reference Star Trek, but will we get to a Universal Translator in our lifetimes?

 

Adam 

Based on what I've seen so far in my life, I wouldn't be surprised if we got that far. You know, even human translators, trained interpreters, trained simultaneous interpreters, right, they're not perfect. And even native speakers talking to each other misunderstand each other, right? And so, there's no such thing as, like, a perfect language device.

 

Peter 

Right.

 

Adam 

I'm reluctant to say that we won't be able to do it, just because of the incredible developments that I've seen in the past, you know, 20, 30 years.

 

Matthew 

You know, one thing that might be reaching its limit is this idea that we're going to collect all the data that we need by essentially selling ads and selling products. It might be that for-- to really have this kind of Universal Translator, we need to have a kind of different model of creating the technology, which is much more participatory, where people are collaborating to contribute resources to analyze the data with the annotations that are necessary for machine learning models, and to kind of collaborate on the progress and the code infrastructure. Because the kind of economics of creating a Universal Translator are very different from economics of creating a web browser or a social network that will sell ads to you. To some extent, the reason why these languages with many millions of speakers are under resourced is an economic problem, rather than a technological problem.

[music plays]

 

Peter 

Like most things, the steps forward in AI language learning and development is more small increments than giant leaps. But the conversations happening in academia and industry around language tech are helping push each other in philosophical and practical ways. The voice tech of the present isn't perfect. But what we're learning about language from these growing pains is helping us to make a smarter, more human future.

[music plays]

 

Peter 

Innovation Heroes is an SHI podcast, with new episodes streaming every second Thursday on Apple, Spotify, Google, and everywhere else. If you like this episode and you want to be our hero, leave us a 5-star review on your podcast listening app of choice. On the next Innovation Heroes, Michael Wilcox, Chief Information Security Officer, Field, at Stratascale joins me to talk about what's just beyond the horizon in the world of security and what being safe in the world of remote and hybrid work looks like. Be our hero, listen and subscribe to Innovation Heroes today.

[music plays]

 

Peter 

This episode is brought to you by the new Cisco Webex. Visit shi.com/cisco to learn more.

Podbean App

Play this podcast on Podbean App