Performance-Driven Facial Animation
Performance-based Facial Animation ✦ Creating an animation of a realistic and expressive human face is one of greatest challenges in computer graphics. ✦ The human face is an extremely complex biomechanical system that is very difficult to model. ✦ While some animators may be able to produce realistic facial animation, the consistent production of large amounts of flawless animation is not practical. ✦ Simply mimicking the desired expression is far faster, easier, and more natural than adjusting dozens of sliders.
Applications in Entertainment ✦ In 2000, LifeFx had a ground-breaking short animation, “Young at Hear” – marking perhaps the first time that a CG actor actually fooled a few people.
Motion Capture + Keyframes ✦ Both Final Fantasy (2000) and an early test for Shrek chose to use motion capture for the body animation but manual animation for the face. ✦ Gollum in the Lord of the Rings trilogy was done with traditional keyframe animation, it was heavily guided by reference video of the actor Andy Serkis who “played” Gollum.
Motion Capture in Special Effects ✦ The Matrix sequels used performance-driven virtual actors in a number of special effects shots. ✦ They used optical flow in each of 5 HD cameras, and applied stereo to reconstruct the 3D motion of the face mesh. ✦ The Polar Express in 2004 was the breakthrough movie for performance-driven facial animation ✦ The Avatar in 2009 truthfully captured the facial expression using a new “head rig” technology.
The Uncanny Valley ✦ Current technology is suitable for animated films where the characters have a somewhat “cartoony” feel. ✦ Some CG humans have been described as “creepy”, “the living dead”.
Face Tracking ✦ Stereo. ✦ Feature-based tracking. ✦ Appearance-based tracking. ✦ Model-based tracking.
Face Retargeting ✦ Often the digital character that needs to be animated is not a digital replica of the performer. ✦ The process of adapting the recorded performance to the target character is called motion retargeting or cross-mapping.
Parameterization ✦ The important issue for retargeting is the choice of facial model parameterization (or “rig”). ✦ A rig provides a parameterization of the facial expressions of a digital face. ✦ It describes facial expressions with a small number of parameters and limits the range of expressions to the allowed range of these parameters. ✦ There are many different approaches to parameterization of a digital face: blendshape, PCA, or raw mesh.
Blendshape Parameterization ✦ The blendshapes provide a linear parameterization of the face deformations. ✦ The space of potential faces is the linear space spanned by the blendshapes (or a portion of this space if the weights are bounded). ✦ Retargeting is reduced to estimating a set of blending weights for the target face at each frame of the source animation. ✦ Typical types of blendshapes include: whole-face blendshapes, delta blendshapes, or local blendshapes.
PCA Parameterization ✦ Principal component analysis (PCA) of motion capture data automatically produces a blendshape model. ✦ The advantages of a PCA model are that it accurately represents the data and that it is obtained automatically. ✦ A disadvantage is that the model is quite poorly suited for artist manipulation, because the individual targets resulting from PCA tend to have widespread effects that are difficult to describe. ✦ Also, motion capture of the target is often not available by definition.
What is PCA? ✦ Principal Component Analysis uses orthogonal transformation to convert a set data points into a set of values of linearly uncorrelated variables called principal components. ✦ The first principal component has the largest possible variance, and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to the preceding components. ✦ Can be computed via Singular Value Decomposition: M = U Σ V
Mapping ✦ The retargeting problem can be posed as a function estimation problem where the goal is to create a mapping that produces a target expression for each source expression. ✦ In the case where the target face is parameterized by a rig, the function maps source expressions into target parameters. ✦ An important issue for the source face is to determine which components of the source animation is affecting the target face. ✦ Linear mapping is the simplest choice, but might cause undesired artifacts.
Scattered Data Interpolation ✦ A function is estimated that maps the source parameter space onto the target parameter face. ✦ Kernel-based techniques such as Radial Basis Function (RBF) interpolation are widely used for nonlinear mapping. ✦ Partitioning of target space via a Delaunay Triangulation is one way to solve the problem of scattered data interpolation.
What is a Radial Basis Function? ✦ A radial basis function (RBF) is a real-valued function whose value depends only on the distance from the origin. ✦ For example, a spherical Gaussian function can be a RBF. ✦ Radial basis functions are typically used to build up function approximations of the form: N X y ( x ) = w i φ ( k x � x i k ) i =1
Art Direction ✦ The need for user input is clear when the source and the target character are very different. ✦ The scattered data interpolation framework supports user input quite naturally since the correspondences between source and target expressions can be determined by a user. ✦ It could be critical for animating a hero character in a feature film but not necessary for animating chatroom avatars.
State of the Art Digital Ira Project
Depth Sensors ✦ Microsoft Kinect provides consumer-grade depth sensors for facial animation applications. ✦ A new research topic in facial animation focuses on real-time retargeting applications. ✦ Realtime Performance-Based Facial Animation, Weise et al., Siggraph 2011.
Realtime Facial Retargeting ✦ A system that enables any user to control the facial expressions of a digital avatar in realtime using a cheap depth sensor. ✦ A face tracking algorithm that combines geometry and texture registration with pre-recorded animation priors in a single optimization. ✦ The technique emphasizes its usability, performance, and robustness.
Overview ✦ Traditional facial animation solves the tracking and retaregeting problems separately. ✦ The proposed technique combine these two problems in one single optimization.
Blendshape Representation ✦ Facial expressions is represented as weighted sum of blendshape meshes. ✦ A blendshape model provides a compact representation of the facial expression space, significantly reducing the dimensionality of the optimization problem. ✦ Can reuse existing blendshape animations, that are ubiquitous in movie and game production. ✦ The output is a temporal sequence of blendshape weights, which can be directly imported into commercial animation tools.
Acquisition Hardware ✦ All input data is acquired using the Kinect system. ✦ The Kinect supports simultaneous capture of a 2D color image and a 3D depth map, based on invisible infrared projection. ✦ Data quality is much lower than state-of-the-art performance capture systems based on markers and/or active lighting.
Offline Model Building ✦ Given the user’s expressions captured offline, create a set of user- specific blendshapes by adapting the generic blendshapes. ✦ A pre-defined sequence of example expressions performed by the user is recorded by the Kinect sensor.
Online Tracking ✦ Decouple the rigid from the non-rigid motion. ✦ Directly estimate the rigid transform of the user’s face before performing the optimization of blendshape weights. ✦ For rigid tracking, align the reconstructed mesh of the previous frame with the acquired depth map of the current frame using ICP. ✦ For non-rigid tracking, estimate the blendshape weights that capture the dynamics of the facial expression of the recorded user.
Statistical Model ✦ Let D = ( G , I ) be the input data at the current frame consisting of a depth map G and a color image I . Infer from D the most probable blendshape weights x for the current frame given the sequence X n of the n previously reconstructed blendshape vectors. ✦ Formulate this inference problem as a maximum a posteriori (MAP) estimation: ✦ Using Bayes’ rule: likelihood prior
Prior Distribution ✦ The prior term is modeled as a mixture of Probabilistic Principal Component Analyzers (MPPCA). ✦ It’s a mixture of Gaussian models with low-dimensional covariance matrices.
Likelihood Distribution ✦ By assuming conditional independence, the likelihood distribution is modeled as the product of two Gaussians: p ( D | x ) = p ( G | x ) p ( I | x ). ✦ Let B be the blendshape matrix. Each column of B defines a blendshape base mesh such that Bx generates the blendshape representation of the current pose. ✦ Denote v i = ( Bx ) i as the i -th vertex of the reconstructed mesh.
Optimization ✦ The MAP problem can be solved by an optimization, by minimizing the negative logarithm of the MAP equation. ✦ Since the gradients can be computed, the problem can be solved efficiently by an iterative gradient solver.
Recommend
More recommend