Towards the end of E3 this year, I enthusiastically tweeted that I “just saw the most amazing thing I think I've ever seen at an E3, and it wasn't a game. It's a tech that will be in a game.” Lots of you were interested to hear about it, so why has it taken me eight days to get around to actually telling you? Well, it’s because I wanted to be able to show you at least a small glimpse of what it does. I can breathlessly tell you how impressed I was until I’m blue in the face, but I wanted some kind of illustration to go along with it.
The technology is called MotionScan, and it’s from an Australian company called Depth Analysis. Their meeting room at E3 was tucked away in a darkened corner in the west hall of the LA Convention Center with just a little sign tacked to the door. When I was beckoned in, I ran into a gobsmacked-looking Jade Raymond on her way out, who was effusing about what she’d just seen as she said goodbye to the folks inside. Earlier that day I’d heard that representatives from studios across the games industry had seen the demo, and left looking similarly flabbergasted.
So what is it? Well, the bottom line is that it’s a groundbreaking 3D motion-capture system. No...wait...stay awake, come back. It’s not as dull as it sounds. Seriously. Unlike every other motion capture thing you’ve ever seen, this is a full performance capture system. It doesn’t just track movement, or grab animation data from actor’s faces as they speak their lines, it captures everything about an actor’s performance, and generates a fully-textured 3D model based on what it sees and hears.
Depth Analysis' Oliver Bao (left) and Team Bondi's Brendan McNamara (right).
Unlike the systems that we’ve all seen in countless boring magazine stories for the past 10 years with the little white balls glued to spandex body suits, MotionScan is much more sophisticated. It uses 32 high definition cameras (divided into 16 stereoscopic pairs) to capture every angle of an actor’s performance at 30 frames per second. From this data it generates a fully-textured 3D model (at the moment it’s just their heads, but later it will be full body) that incorporates every nuance, mannerism, and emotional detail from the performance.
To demonstrate this, Depth Analysis head of research Oliver Bao was joined by Team Bondi founder and director Brendan McNamara, who is overseeing the first game that will make use of the technology, Rockstar's L.A. Noire. To illustrate the system, they showed a series of performances from actors being used in the game, with video of the actor’s actual performance alongside the data captured with MotionScan. The first demo was simple. An actor spoke some lines and smiled, and it was eerily realistic. As Bao and McNamara advanced through subsequent demos, the performances became more and more emotionally engaging until they eventually showed me a scene in which a character was shown distraught about the murder of his wife. As he sobbed through his lines, every line in his face broadcast the angst his character was feeling. His eyes welled-up, and tears streamed down his face. As the scene played out, Bao demonstrated that it was a realtime 3D model by moving the actor’s disembodied head around the screen, and applying different lighting effects.
The effect was startling, and the performance genuinely moving. This is far beyond the “eye contact” we were promised in Mass Effect, or the clumsiness of some of the scenes in Heavy Rain, this was a real actor pouring his heart and soul into an emotional performance that was then fully captured in a 3D model. It’s not just this kind of emoting that it’s good for though. McNamara noted in our time together that it opens up a whole new way of approaching the way characters are presented in games, and subsequently how narratives are written and conveyed. “Traditional motion capture could never bring to life the subtle nuances of the chaotic criminal underworld of LA Noire in the same way as MotionScan,” he said. “MotionScan allows me to immerse audiences in the most minute details of LA Noire’s interactive experience, where the emotional performances of the actors allow the story to unfold in a brand new way.”
To elaborate, he discussed the way that a detective story is affected by this technology. Because much of LA Noire is a criminal investigation, a big part of what the player must do is judge whether characters are telling the truth or not. Previously, the devices available to game designers for conveying this have been quite clumsy. We’d need obvious dialogue cues or less than subtle visual hints, but in LA Noire, it’ll all be in the performance. “You need to be able to tell when someone is lying,” he explained. “And if you look at these performances, they’re so realistic you’re going to be able to tell if the guy is trying to avoid you, or not look you in the eye.” With this he cued up a brief scene from an interrogation in the game, and it was possible to tell purely from the lines around the character’s eyes, and the way he was moving his eyebrows that something might be amiss.
The impact that this technology is going to have on game development (and the movie business too, as it is being pitched to studios and effects houses as well. Plus, actors are going to be pretty happy about another potential outlet for their craft) is potentially huge. It doesn’t require markers or phosphorescent paint on the actors faces, and there’s no need for animators or artists to clean up details after scenes have been shot. The fact that the system captures a full performance means that hair, makeup, and even prosthetics can be captured. During my time with Bao and McNamara, they showed me one character that was covered in cuts bruises that had been applied by a professional makeup artist, and it was far more striking than anything that could have been added in post-production because they moved realistically with the character’s skin.
Actor John Noble (you may know him as Walter Bishop in Fringe) during his capture session at Depth Analysis.
Here he is as as 3D model generated by MotionScan. Every line, crease and muscle movement is captured in his performance.
For games with large casts of characters, like LA Noire which features more than 200, the potential for streamlining the production process is tremendous. Actors will be able to provide animation and graphical data as they’re delivering their lines, and will help eliminate the huge amount of effort that currently goes into lip-syncing 3D models that have been created by a studio’s art team. McNamara also noted that soon the system will be able to capture up to three actors at the same time, so full conversations can be filmed and the appropriate physical and emotional reactions recorded.
While I only got to spend 10 or 15 minutes with the technology, it was very clear that it marks the beginning of a new chapter in the way that realistic game characters will be presented. If used properly, it will push the boundaries of what we expect from game performances, and hopefully elevate the demand for good writing, good dialog, and real emotional content. While it has obvious applications for cut-scenes, narrative elements, and in-game conversation, the real benefit will come in the seamless transition between action and static scenes, and the fact that there'll potentially be no break in realism.
Hopefully this description, and the images I've posted here manage to go some way to conveying just how significant and impressive this system is. I think the real proof will come when we finally get to see LA Noire in motion.