Blog: How to program a speaking Social Robot
– shrinking the gap between expectations and reality.
When I was offered an internship as a “Robot personality designer” with Furhat Robotics, a small swedish company that produces one of the worlds most sophisticated interactive Social Robots, I did not think twice. I was super happy to try the role, as part of a swedish campaign on future jobtitles. I got to take some baby steps in robot programming, and a starstruck selfie with one of the creators, Samer Al Moubayed.
This is the story of how I got connected with the Furhat.
The Furhat I have been assigned to connect with looks at me with big eyes and a friendly, open expression. It is not really possible to mistake it for a human. But the facial mimic, the natural posture of the head, the soft, 3d animated image, projected from the inside on it’s silicone mask, and the fact that it can speak to me, creates a surprisingly likeable and realistic persona.
It is easy to like the Furhat. But how easy will it be to program it? I will get back to that in a moment. First a description of what we are dealing with.
Below the movable head is a wide-angle camera, stereo microphones, and double loudspeakers, giving the current 3,5 kg torso great abilities to talk, register, attend and react to what is going on around it. It looks really sophisticated.
Making the Furhat move and speak in a way that makes sense, to me and you as social beings, is just one part. Behind the seemingly simple and human-like robotic features of this basic Furhat model lies hours and hours of complex developing, start-up struggles, and expert knowledge on human social interaction, on the next level past the PhD’s.
Crowd sourcing and development of AI resources online are some elements that the team believe will help in creating great applications for the Furhat platform.
In fact, that’s also one of the reasons why I got the possibility to sign up at the Furhat Robotics’ developer pages, download and use the latest developer kit and try some robot interaction programming, not only as an intern but also at home.
I personally find robotics super interesting, have attended talks about ethics, algorithms, and legal implications in AI decision making, and read massive amounts of reviews on topics like ”When does algorithms become digital persons, and should these persons be legal subjects?” But this was my first interaction with the real deal.
During one intense day I got introduced to the Furhat unit I was going to work with, and an introduction to the SDK (developer kit) I was supposed to try out, I got a go-to-person, and cheers from the crew like “come on, you can do it!” (conversations are mostly in english here since the skilled team is an international bunch of people).
Then I just had to throw myself in the water and try to swim, while the team did their best to transfer detailed knowledge on some small secrets and bits that makes the robot speak and behave in a natural way. The result of my first day as a robot programmer was not very impressing. And my first demo was, to put it kindly, very teaching. But the Furhatters kept on cheering me.
I thought I was ready for this. But I did not expect the interaction to feel so real. Or the interaction programming to be so demanding.
Maybe you’ve heard the news about non-biased recruiting through using robotics, machine learning and AI-based processes? Or the possibility of interviewing refugees at EU borders with robots?
It has been all over the media, at least in Sweden, and as a journalist and tech geek gone UX-writer, I already knew (before visiting Furhat Robotics) that voice interaction as well as interactive social robots have taken the stage as the next big hype. But the Furhat is clearly not just a show robot or a hype. It’s manifestations are rather models just enough defined to show us how social robot interaction can, and most likely will, become useful in daily urban lives during the next five to ten years.
OK, we are impressed. But, can this really work? Well, TNG Recruiting and Furhat Robotics are not just crazy pioneers who dare stick their necks out and risk to be mocked because of their visions. They are extremely knowledgeable researchers who should be taken very seriously. So are the researchers at a Swedish university who programmed a Furhat to sound depressed, mimic depression, and then used it to train their students with very good results.
In Frankfurt, Germany, Furhats were introduced in 2018 as service robots at the international airport, and at a central railway station, where they provide travelers with information. During fall 2019 Furhats will also be introduced in schools in Stockholm. But they are not yet ready to be sold to consumers.
The robots in the promotion videos are of cause a bit extra cheeky, but it’s still not so far from reality. It all depends on the programming.
The next big question is if the social robot can live up to the expectations. There are a multitude of cerative ideas, people would like to use it in teaching, research, health care, service and many other ways. Possibilities span from roles as a test leader and information officer, to storyteller and interpreter. With a robot connected to the internet, the options and visions of what can be done are almost unlimited for the developers. But how soon can we expect the robots to be fully interactive? Or on the market for consumers?
I managed to ask the very busy CEO Samer Al Moubayed for a selfie in front of one of the robotmodels, Petra. I look crazy on the photo, but that’s ok. After all, it’s not everyday you get to meet with a famous robot creator.
I got to know that since 2011, when their first robot model was exhibited at the London Science Museum, the researchers and developer teams at Furhat Robotics have managed to create, and scale, their first commercial interactive platform which has been bought and rented by teaching institutions and business organisations world wide. And the interest in the B2B cooperation projects is soaring. Among mentioned and interested companies are Honda, Intel, Merck, Toyota, KPMG and Disney. But consumers will have to wait, maybe five more years.
Samer Al Moubayed, researcher, specialist in computer science and speech technology (his doctoral thesis had the title ”Bringing the avatar to life”), and former intern at Disney Research, founded Furhat Robotics in 2014 together with research colleagues Gabriel Skantze, Jonas Beskow and Preben Wik. They were at that point all experts in different academic areas at the KTH Royal Institute of Technology, in Stockholm. Through EIT Digital Accelerator they could start growing with the help of investments and business angels.
Today the founders have been travelling around the world a couple of times, to talk about their robot and their research. In the video below (from 2016) Samer Al Moubayed, together with founding colleagues from the KTH Royal Institute of Technology, demonstrates some early models to a reporter from Bloomberg.
You see the furhat in the video? To make the head look less mechanical and more human, just before a demo was about to take place in the lab, the researchers were looking for something to cover the head with (this was at an earlier stage of the development, and the back of the robot’s head did not have a plastic skull, just a mish-mash of cables and wires). It was winter and cold outside, and so what they finally found and put on the head was a furhat. And that’s how the robot got it’s name.
An extra touch of magic (and wit), when programming a social robot, will probably always be needed. Making the Furhat speak in ways that makes sense, to me and you as social beings, gets easier when the tonality doesn’t have to be dead serious. Making people laugh a little can help bridging the glitches in interactive conversations when resources and voice recognition are still not good enough. Taking breathing pauses when speaking also makes the robot sound more human.
Socially weird answers to users’s questions is what developers do not want, those could either make, or break, the established connection with the user.
But through machine learning and artificial intelligence the systems keeps on learning what works in live conversations with humans, and what does not work. Large amounts of data are continuosly gathered and analysed. There are a multitude of resources possible to share and use online, to create interactivity and adapt the functions of robots to the environment where they are supposed to be placed. Google is far ahead in creating sharable online resources for developers, for example right now 40 different languages for interaction, and a multitude of voices. In this way, reality keeps closing in on expectations for spoken interaction.
To help robots have conversations with humans, developers use different aspects of NLU (natural language understanding), machine learning, artificial intelligence, and advanced research on language and social interaction between people.
Lots of this knowledge has been built into the design of the first commercial Furhat model, and the definitions of its’ attention span, mimics, movements, facial expressions and voice. The platform even comes with a set of random jokes, that are prepared to be built into conversations.
The Furhat with its soft and toony facial appearance, making it impossible to mistake the robot for a real person, is perceived as very “human” and non-threatening by most users. In one review, by a reporter from Forbes, Furhat is compared to other similar robots, and ranked to be the best one by far. The video below (from the article) shows early Furhat robot models in use as “test instructors” at a science museum in Sweden.
Social robotics, and truly realistic interacting units (not just “show robots” programmed to carry out a single specific conversation), is an extremely exciting but very complex area of science.
To predict what a person could possibly want to say to a robot, or ask it about, is almost impossible. There are many solutions to this, but if, or when, the robot is unable to respond to the user in a “human” way, the magic spell could be broken very quickly. The way developers choose to tackle that challenge could be decisive. This is extra true in the school world, where kids will probably keep scrutinizing the robots’ performance without mercy.
Another challenge is to “popularize” the software used with the Furhat. As an end-user of a Furhat you would like to be able to put all the exciting functions of the robot in use, while simultaneously having a not too difficult time learning how to program it. If the learning curve is too steep, no one will be happy. Programming should be fun.
Ok, enough said, I will now get back to playing with the Furhat SDK, and have the virtual robot do things my way (great feeling). Right now my goal is to create a conversation that could go on for at least 30 to 60 seconds without any noticeable glitches.
As an experienced radio producer I also know one thing that is extremely important when programming a talking robot. The fact that humans breathe will, now and then, make them stop talking for some milliseconds before they keep on. To mimic this will make their speech seem more natural, and less mechanical. To make this happen you have to create small pauses in the robots talked sentences. Small details, big impact.
Maybe you will soon be able to talk with a Furhat, in a nearby place.
Want to try programming a social robot? More reading for developers and designers: https://www.furhatrobotics.com/developers/
Sources: Furhat Robotics, Forbes, Bloomberg, Jusektidningen Karriär, Computer Sweden, the newssite Entreprenör, Dagens Industri, Tidningen Innovation. KTH Royal Institute of Technology.
Photo © Aminata Merete Grut, except for the selfie, where Samer Al Moubayed took the photo (with my iPhone). Videos from Bloomberg, TNG and Furhat.