A virtual “talking head,” which can express a full range of human emotions and could be used as a digital personal assistant, has been developed by researchers at Toshiba’s Cambridge Research Lab and the University of Cambridge’s Department of Engineering.
Known as Zoe, the talking head is based on the face of actress Zoe Lister, who plays Zoe Carpenter in the Channel 4 series Hollyoaks. Zoe can display emotions such as happiness, anger, and fear, and changes its voice to suit the feeling that the user wants it to simulate.
Users can type in a message, specifying the requisite emotion or combination of emotions, and the face will recite the text. According to its designers, Zoe is the most expressive controllable avatar ever created, replicating human emotions with “unprecedented realism”.
To make the system as realistic as possible, the research team collected a dataset of thousands of sentences, which they used to train the speech model. They also tracked Lister’s face while she was speaking using computer vision software.
This was then converted into voice and face-modelling, mathematical algorithms which gave the researchers the voice and image data they needed to recreate expressions on a digital face, directly from the text.
The program used to run Zoe is just tens of megabytes in size, which means that it can be easily incorporated into even the smallest computer devices. This means that the could be used as a personal assistant in smartphones and tablets, or to “face message” friends.
The framework behind Zoe is also a template that, before long, could enable people to upload their own faces and voices, so that in the future users will be able to customise and personalise their own, emotionally realistic, digital assistants.
“This technology could be the start of a whole new generation of interfaces which make interacting with a computer much more like talking to another human being,” said Professor Roberto Cipolla from the Department of Engineering, University of Cambridge.
“It took us days to create Zoe, because we had to start from scratch and teach the system to understand language and expression. Now that it already understands those things, it shouldn’t be too hard to transfer the same blueprint to a different voice and face.”
The team who created Zoe are currently looking for applications, and are also working with a school for autistic and deaf children, where the technology could be used to help pupils to “read” emotions and lip-read.
Ultimately, the system could have multiple uses – including in gaming, in audio-visual books, as a means of delivering online lectures, and in other user interfaces, according to the researchers.