Luca Fascione is a multifaceted visual effects artist and the Head of Technology and Research at Weta Digital. His trailblazing achievements were honored earlier this year with a Scientific and Engineering Academy Award. I recently had a chance to interview Luca to discuss apes, motion, and emotion.
Q: When did it first become clear that an unmet need existed and what was your development process like?
A: Weta Digital is very active in research and innovation around movie-making technology. Our Senior VFX Supervisor Joe Letteri likes to keep a rolling focus on areas where we can improve the quality for the movies we contribute to, especially in the space of creatures. We have many research disciplines at Weta: physical simulation (fluid simulation, for things like explosions and water, or rigid body dynamics for destruction scenes), physically based rendering (light transport and material simulation, so that our pictures can closely match the footage they need to integrate into), virtual cinematography (performance capture and virtual stage workflows). Every few years we identify a project that demands a larger scope, often inspired by the upcoming productions slated for the studio, and we put significant time and resource into making a true step advancement.
FACETS was one such project.
FACETS (the system we use to capture facial performance, as opposed to the body) was built as part of the Research and Development preparation ahead of Avatar, because we wanted to improve the process for capturing faces. The old process, used on films like King Kong, was closer to an ADR session:1 Andy Serkis would do Kong's body one day, and then on a different day he would work through Kong's facial performance. At that time, the face capture process was a "normal" 3D capture session, the only difference being that the markers were much smaller and glued directly to the actor’s skin, instead of velcro-strapped to his capture suit as they are for the body capture. As the markers were much smaller, the volume in which a performance could be recorded was correspondingly smaller, which meant Andy had to be effectively sitting in a chair trying to keep his head relatively still while acting. This made it extremely difficult for the system to provide valuable data for Andy’s more extreme movements, as well as introducing many practical problems in terms of timing and consistency. Further, once the capture sessions ended, the work to extract motion and animation curves from the data was extremely labor and computer intensive, requiring a very skilled operator and many iterations.
Although Kong had a substantial amount of visual effects work, especially for its time, there aren't that many facial-driven shots and the process was focused on a single digital character. When Avatar came, it was immediately clear our existing workflow would never be practical at the scale required for dozens of Na’vi characters on screen at any given time. The “capture body, then capture face” idea was just too hard, and besides, a large portion of the shots in the movie required capturing multiple characters at once: doing it all in separate body/face sessions would have been a logistical nightmare. We also knew that a combined body/face session would be so much stronger, based on what we’d seen as well as the feedback gathered during face capture sessions and in discussion with the performers. Facial and body movements are synchronized in many unexpected ways that are not immediately apparent unless you study them. Additionally, the post-processing of data to be handed off to animation and support for advanced motion editing were also a clear requirement. It quickly became apparent that as the work for Avatar would increase by well over a tenfold in this segment, requiring a corresponding increase from our existing system was just not possible.
So as Kong wrapped, we sat down under the direction of Joe Letteri and Dejan Momcilovic (Head of Motion Capture) and rethought the whole facial capture process end-to-end from the ground up, leaving no stone unturned. By the time we were done, we had developed a new system that was far more reliable than before and included a number of important new features. The new system was built to work wirelessly, which added enormous flexibility on set and the whole post-processing phase had gone from a multiple day process to running in real time on stage while the performance was being captured. We also replaced the glued-on reflective marker capture with a video feed capture in which face-paint was used, organized so that it was possible to use less than half as many markers as in the previous process. This is very important to make it possible to have all the actors ready for capture on time in the morning.
© 2009 Twentieth Century Fox Corporation. All rights reserved.
Q: How do you approach nuanced elements such as muscle, skin and blood flow?
A: The output of a performance capture session is information about the motion of the actor on set. Once the capture session finishes, this motion is used to help drive our virtual character, which often at times is not human (consider Caesar from the Planet of the Apes trilogy, Gollum from The Hobbit, King Kong, Jake and Neytiri from Avatar, the BFG and so on). Animators use the motion data to help define how the digital character would move to express the same performance the actor gave on set. This also requires the animator to take into account the physical nature of the character itself. Our digital models contain an accurate description of the muscle structure in our characters, as well as the other layers of fat and connective tissue. Joining the motion with our internal structures, the exterior shape of the skin is further aided by our Tissue simulation software. This, in turn, produces corresponding blood flow and strain data that can be used by the material system in charge of skin appearance during image generation.
Q: How do you navigate around the uncanny valley?
A: The uncanny valley is a phenomenon that affects human and heavily humanistic characters far stronger than anything else, because so many of our brain’s processes have evolved to recognize and react to the most minute details of facial appearance, and indeed the effect is known to be stronger when observer and observed persons are of the same ethnicity. Perception of subtle differences of sameness is very acute. Reversing this notion, though, you can see how you can get a lot of help from the mere fact your character might be “almost” human, in a way or another, I’m thinking of Gollum or the giants in BFG: the different proportions are enough to reduce the uncanny valley effect by a good amount.
Essentially, if you are doing a fully human digital character, every aspect of it needs to be nearly perfect – the further you get from human, the more margin for error or artistic license you have.
Our work on Furious 7 really reinforced that notion for us. Ultimately, it is not the capture system that directly affects this, it is the skill of the animator that uses the data derived from a specific human performance and turns it into the performance of a digital character.
Q: What are some of the issues you encountered when first using an actor-mounted camera rig and how did you solve them?
A: The most prominent problems with head mounted rigs are slippage and lighting. Slippage is the situation in which the head rig moves as it sits on the head, because it has limited grip: the hair on the actor makes an excellent mechanical lubricant, and you can’t tighten it too much to avoid making it too uncomfortable to wear. Further, even if you could glue the head rig to the actor’s skin, the skin itself has plenty of slippage over the skull anyway. With respect to lighting, it’s desirable for the head camera footage to be as evenly-lit as possible, to improve tracking quality. The issue is that the only way to do this is to have lights on the head rig itself, which inevitably end up being extremely distracting for the actors, potentially impacting the quality of their performance. This means the only alternative in this space is integrating higher quality cameras with nice even illumination all around (which is hard to achieve, because the camera needs to be very small, and very light). In Avatar there were motion capture sessions that were separate from principal photography sessions, so we had more flexibility than we do now. For films like The Hobbit trilogy and the Planet of the Apes series, we decided that capturing the performance of the actor during principal photography was critical and had to further refine our system to enable this workflow.
© 2009 Twentieth Century Fox Corporation. All rights reserved.
Q: How does the system allow actors to collaborate on-set?
A: Head-mounted cameras enable actors to act normally, similar to how it would be the case if they had prosthetics or make up, greatly improving their ability to interact with one another achieving a whole new level of emotional exchange in their performance. Being able to apply motion capture contextually with principal photography expands the working space to incorporate the other aspects of filmmaking and enables directors to make creative decisions about a performance while they’re on set with the actors. FACETS is part of our performance capture system that, in the years since King Kong, has transitioned from supporting an “after the fact” process where you add a creature “on top” of your movie, to a new integrated process in which real-life and digital characters are together on set in front of the eyes of the director. The new possibilities in terms of the creative power that the new approach allows are such an improvement over the previous approach that now it would be unthinkable to go back.
Q: How do you approach solving and retargeting human characters and non-human characters?
A: The purpose of the solve is not to merely replicate the motion of the skin’s surface (as tracked by the markers), but to use marker movements to infer the specific muscle activations in the actor’s face that had triggered such movements, resulting in the creation of a specific expression.
Once the actor’s muscle activations are known, there is a process of mapping that generates new targets for the virtual character in order to achieve a corresponding emotional meaning. The key insight here is in understanding that the objective of this process is not to deform skin or activate corresponding muscles by the same amount, but instead to achieve a performance transfer between actor and virtual character that carries through the “emotional messaging” that is true to the intention of the performer at the time. Different face shapes and features between actors and virtual characters, as well as different speed or intensity of expression changes will require subtly different treatment for the transfer to happen so that the audience receives the message as intended.
© 2009 Twentieth Century Fox Corporation. All rights reserved.
Q: What were some of your thoughts and feelings the first time you saw Avatar in a theatre?
A: Well, Avatar was huge, of course. And the immediate reaction was indeed extremely powerful. But the thing that stuck with me the most, past the first screening, was how it instantly permeated our culture and how people keep watching it. I travel internationally a fair bit for work, and often walk airplane aisles to stretch my legs: as I do so, there is invariably a number of people that are watching Avatar, no matter where in the world I happen to be, every time I see it I’m amazed.
Q: What’s next for you?
A: Well, after the FACETS project was delivered, I moved on to write a few other tools used on Avatar, one was used in the appearance modelling of skin on our movies around Avatar times, another one was a large scale out-of-core ray tracer called PantaRay, one that we also used on Avatar to build a lighting technology based on Spherical Harmonics. After that I ran a second ray tracing project, Manuka, which resulted in the production renderer in use at Weta Digital since
The Hobbit: The Battle of the Five Armies. At present I am Head of Technology and Research for the studio at large.
1 ADR stands for “Automated Dialogue Replacement:” This is the process in which actors record new lines of dialogue for movies to replace original ones, either to improve a performance, or to provide dialogue in a different language than the original. The process is also known as dubbing and is, in fact, far from automatic.