Robots & AI: A story of Expectation VS Reality

Disclosure

If you’re reading this because you had a really complex, high quality conversation with one of our robots, you were most likely not talking to an ‘AI’ but to another person remotely operating that robot, that doesn’t mean our robots don’t contain AI components or capabilities, we just recognise that currently nothing beats NI, the natural intelligence of the human mind. Learn about TinMan, our Telepresence software.

AI, Artificial Intelligence

There’s a lot of excitement surrounding AI at the moment, and a lot of hype that can lead to disappointment. So what is it? Some people call robots ‘AI’s’, lets be clear, robots are not AI, and AI is not a robot. AI is most commonly some lines of computer code. Unfortunately code doesn’t make a great image for the media, so there is a tendency to add a picture of a humanoid robot to articles about AI – Arrrgh!

 

For useful background read the  Wikipedia definition here->

The definition of AI is loose and always changing, broadly its about taking some data in, for example sounds that form speech, or camera images that might contain faces or objects to be identified, then processing that data to extract useful information, like text for words being spoken or the location of a face in an image.

 

What Next?

The next step is to make some decisions based on that data and generate some appropriate output, words to say, facial expressions or movements – that’s where it gets very tricky. Whilst speech recognition, and object recognition are getting pretty good, we do not yet have a ‘generalized’ strong AI, one that could be used to make inferences, decisions and extract intent, in the way that humans can – nothing even close.

 

As of June 2018 it does not exist. A fully conversation machine like HAL in 2001, or the OS in the movie ‘Her’ just hasn’t been achieved …yet.

Engineered Arts & Artificial Intelligence

Lets look at AI in the context of Engineered Arts robot range. Hopefully we can clear the haze and help you decide which AI functions, (if any!), might be useful for your application. All our robots run on the same Tritium software framework. It’s a modular design, which makes it easy to add new programs to improve interaction. These can be cloud based services like Google Cloud Speech-to-Text API, or programs running on the robots built in PC like face recognition.

 

Speech Recognition

 

Automated Speech Recognition, ASR, converts sound to strings of text, there are many software solutions for this, our current favourite is the Google Cloud Speech-to-Text API, which is a pay by the minute cloud based service – the accuracy and multi language support are very good. There’s an old saying in computing ‘Garbage in, garbage out’, if the sound quality is poor, or there is noise in the background or multiple people are speaking at once it will not work well at all.

 

There are ways to capture quality audio like beam forming microphones which we use on our robots, humans tend to lip read as well, (even if you don’t realise you’re doing it). Try having a conversation in a noisy bar with your eyes shut and see how much harder it gets. To our knowledge, there is currently no commercially available ASR solution that uses both images and sound, its called multi modal ASR.

 

ASR does not actually tell us what that string of words means – just what the words are and what order they are in, tone of voice, sarcasm, and context can completely change the meaning of a sentence. ASR currently gives us no clues about those things.

Computer Vision

Computer Vision, CV, encompasses many different tasks like recognising a face or other objects in a picture, again there are many third party solutions, our current favourites are SHORE, developed by Fraunhofer in Germany, it provides the position of faces in a picture and an estimation of age, gender and facial expressions, it does not recognise individuals. SHORE is a pay for licensed option for our robots. We also use Visage from Visage Technologies AB in Sweden – it provides more detailed facial motion data than SHORE.

 

If you have a particular CV application you want to use, then it can be added to the Tritium framework.

Conversation

This is where the trouble begins!

 

There are also plenty of ‘ChatBot’ solutions around, they mostly work in the same way, looking for key words like nouns in a sentence and then modifying preset phrases to form a reply, it’s also possible to parse information from web based sources like Wikipedia. We can provide a pre installed version of Bruce Wilcox’s ‘ChatScript’ which is MIT Licensed and free to use. Don’t expect any ‘ChatBot’ to come close to interacting with a real human, that goal is nowhere near to being reached. The video below, shows interaction with SociBot, that is almost all operated through TinMan (human operated)

 

 

Everything Else

There are so many other things that make interaction with a robot fun and enjoyable:

Voice quality, recorded human voices are generally way better than computer synthesised ones, but lack flexibility, our robots support both options which can be mixed as needed. Also remember that a lot of our communication is non verbal a raised eyebrow can be worth a thousand words.

 

Putting this all together to make a natural interaction is an extreme challenge – if you have one of our robots and want to try your own ideas the API is open for you to experiment, or tell us about your interaction ideas and we will try to bring them to life.

What RoboThespian “thinks” about it…

 

CogX Debate, Featuring Will Jackson