Engineered Arts & Artificial Intelligence
Lets look at AI in the context of Engineered Arts robot range. Hopefully we can clear the haze and help you decide which AI functions, (if any!), might be useful for your application. All our robots run on the same Tritium software framework. It’s a modular design, which makes it easy to add new programs to improve interaction. These can be cloud based services like Google Cloud Speech-to-Text API, or programs running on the robots built in PC like face recognition.
Automated Speech Recognition, ASR, converts sound to strings of text, there are many software solutions for this, our current favourite is the Google Cloud Speech-to-Text API, which is a pay by the minute cloud based service – the accuracy and multi language support are very good. There’s an old saying in computing ‘Garbage in, garbage out’, if the sound quality is poor, or there is noise in the background or multiple people are speaking at once it will not work well at all.
There are ways to capture quality audio like beam forming microphones which we use on our robots, humans tend to lip read as well, (even if you don’t realise you’re doing it). Try having a conversation in a noisy bar with your eyes shut and see how much harder it gets. To our knowledge, there is currently no commercially available ASR solution that uses both images and sound, its called multi modal ASR.
ASR does not actually tell us what that string of words means – just what the words are and what order they are in, tone of voice, sarcasm, and context can completely change the meaning of a sentence. ASR currently gives us no clues about those things.