Karcher means of positive definite matrices Yongdo Lim Sungkyunkwan University January 14, 2014
Overview There has recently been considerable interest in defining a “mean” (averaging, barycenter, centroid) on manifolds/ metric spaces. A natural and attractive candidate of averaging procedures is the “least squares mean”. This mean has appeared under a variety of other designations: center of gravity, Frechet mean, Cartan mean, Riemannian center of mass, Riemannian geometric mean, or frequently, Karcher mean , the terminology we adopt, and plays a central role in image processing (subdivision scheme), medical imaging(MRI), radar system, statistical biology (DNA/genome), to cite a few. Our purposes in this talk include the following: (1) Monotonicity conjecture (2) Deterministic approximations to the Karcher mean. (3) Strong Law of Large Number and no dice theorem.
√ The Geometric Mean ab The subject of (binary) means for positive numbers or line segments has a rich mathematical lineage dating back into antiquity. The Greeks, motivated by their interest in proportions and musical ratios, defined at least eleven different means (depending on how one counts), the arithmetic, geometric, harmonic, and golden being the best known. √ A geometric construction for the geometric mean ab of a , b > 0 is given by Euclid in Book II in the form of “squaring the rectangle,” i.e., constructing a square of the same area as a given rectangle of sides a and b . The study of various means and their properties on the positive reals has remained an active area of investigation up to the present day (e.g., SIAM Review 1970-Gauss and Carlson’s iterative means and elliptic integrals.)
Positive Definite Matrices Positive definite matrices have become fundamental computational objects in many areas of engineering, computer science, physics, statistics, and applied mathematics. They appear in a diverse variety of settings: covariance matrices in statistics, elements of the search space in convex and semidefinite programming, kernels in machine learning, density matrices in quantum information, data points in radar imaging, and diffusion tensors in medical imaging, to cite only a few. A variety of computational algorithms have arisen for approximations, interpolation, filtering, estimation, and averaging. • A Hermitian matrix A is positive (semi)definite if all eigenvalues of A are (nonnegative) positive. The set of all k × k positive definite matrices is an open convex cone of the Euclidean space of Hermitian matrices equipped with � X | Y � = Tr ( XY ) .
The Riemannian Trace Metric In recent years, it has been increasingly recognized that the Euclidean distance is often not the most suitable for the space P = P k of positive definite matrices and that working with the appropriate geometry does matter in computational problems. It is thus not surprising that there has been increasing interest in the trace metric δ , the distance metric arising from the natural Riemannian structure on P making it a Riemannian manifold, indeed, a symmetric space of negative curvature: � k � 1 2 � log 2 λ i ( A − 1 B ) δ ( A , B ) = || log A − 1 / 2 BA − 1 / 2 || 2 = , i = 1 where λ i ( X ) denotes the i -th eigenvalue of X in non-decreasing order. For positive reals ( k = 1), δ ( a , b ) = | log a − log b | .
Basic Geometric Properties We recall some basic properties of P endowed with the trace metric [ S. Lang 1999 or Lawson and L., 2000 Amer. Math. Monthly article]. (1) The matrix geometric mean A # B = A 1 / 2 ( A − 1 / 2 BA − 1 / 2 ) 1 / 2 A 1 / 2 is the unique metric midpoint between A and B . (2) There is a unique metric geodesic line through any two distinct points A , B ∈ P given by the weighted means γ ( t ) = A # t B = A 1 / 2 ( A − 1 / 2 BA − 1 / 2 ) t A 1 / 2 . (3) (Congruence Invariance) Congruence transformations A �→ CAC ∗ for C invertible are isometries of P . (4) Inversion A �→ A − 1 is an isometry. (5) (Monotonicity; L¨ owner-Heinz inequality) A � B , C � D ⇒ A # t B � C # t D . Here, A � B ⇐ ⇒ B − A is positive semidefinite.
A Big Question The big question on averaging of positive definite matrices is that Given n positive definite matrices, what is the best way to average them ( i.e., find their mean ) in such a way that the answer is again positive definite? Once one realizes that the matrix geometric mean Λ 2 ( A , B ) = A # B := A 1 / 2 ( A − 1 / 2 BA − 1 / 2 ) 1 / 2 A 1 / 2 is the metric midpoint of A and B for the trace metric δ , it is natural to use an averaging technique over this metric to extend this mean to n -variables. First M. Moakher (2005) and then Bhatia and Holbrook (2006) suggested the least squares mean, taking the mean to be the unique minimizer of the sum of the squares of the distances: n � δ 2 ( X , A i ) . Λ n ( A 1 , . . . , A n ) = arg min X ∈ P i = 1
Some Background This idea had been anticipated by ´ Elie Cartan, who showed among other things such a unique minimizer exists if the points all lie in a convex ball in a Riemannian manifold, which is enough to deduce the existence of the least squares mean globally for P . The mean is frequently called the Karcher mean in light of its appearance in his work on Riemannian manifolds (1977). Indeed, he considered general probabilistic means that included weighted least squares mean: n � w i δ 2 ( X , A i ) , Λ n ( w ; A 1 , . . . , A n ) = arg min X ∈ P i = 1 where w = ( w 1 , . . . , w n ) is a probability vector.
Monotonicity Conjecture In a 2004 LAA article called “Geometric Means” T. Ando, C.K. Li and R. Mathias gave a construction that extended the two-variable matrix geometric mean to n -variables for each n � 3 and identified a list of ten properties (ALM axioms) that this extended mean satisfied. Both contributions–the construction and the axiomatic properties–were important and have been influential in subsequent developments. Question: Do the Ando-Li-Mathias properties extend to the least squares mean? In particular, Bhatia and Holbrook (2006) asked whether the least squares mean was monotonic in each of its arguments (Multivariable L¨ owner-Heinz inequality). Computer calculations indicated “Yes.” Λ n ( A 1 , . . . , A n ) � Λ n ( B 1 , . . . , B n ) ( Monotonicity ) if A i � B i , ∀ i .
ALM Axioms A geometric mean of n positive definite matrices is a function G : P n → P satisfying 1 n for commuting A i ’s. (P1) G ( A 1 , . . . , A n ) = ( A 1 · · · A n ) 1 n G ( A 1 , . . . , A n ) . (P2) G ( a 1 A 1 , . . . , a n A n ) = ( a 1 · · · a n ) (P3) G ( A σ ( 1 ) , . . . , A σ ( n ) ) = G ( A 1 , . . . , A n ) , ∀ σ . (P4) G ( A 1 , . . . , A n ) � G ( B 1 , . . . , B n ) for A i � B i , ∀ i . (P5) G is continuous. (P6) G ( M ∗ A 1 M , . . . , M ∗ A n M ) = M ∗ G ( A 1 , . . . , A n ) M . (P7) G is jointly concave. n ) − 1 = G ( A 1 , . . . , A n ) . (P8) G ( A − 1 1 , . . . , A − 1 (P9) Det G ( A 1 , . . . , A n ) = ( � n 1 n . i = 1 Det A i ) � n � n i ) − 1 � G ( A 1 , . . . , A n ) � 1 (P10) ( 1 i = 1 A − 1 i = 1 A i . n n • The ten properties are known as Ando, Li, Mathias axioms for multivariable geometric means of positive definite matrices.
NPC Spaces The answer of the monotonicity conjecture is indeed “yes,” but showing it required new tools: the theory of nonpositively curved metric spaces, techniques from probability and random variable theory, and the fairly recent combination of the two, particularly by K.-T. Sturm (2003). The setting appropriate for our considerations is that of globally nonpositively curved metric spaces, or NPC spaces for short: These are complete metric spaces M satisfying for each x , y ∈ M , there exists m ∈ M such that for all z ∈ M d 2 ( m , z ) � 1 2 d 2 ( x , z ) + 1 2 d 2 ( y , z ) − 1 4 d 2 ( x , y ) . ( NPC ) Such spaces are also called (global) CAT(0)-spaces, Hadamard or Bruhat-Tits spaces (e.g., Hilbert spaces, symmetric cones of finite rank, Phylogenetic Trees, Booklets, products, Gromov-Hausdorff limits)
Metric Geodesics The theory of such NPC spaces is quite extensive. In particular the m appearing in d 2 ( m , z ) � 1 2 d 2 ( x , z ) + 1 2 d 2 ( y , z ) − 1 4 d 2 ( x , y ) ( NPC ) is the unique metric midpoint between x and y . By inductively choosing midpoints for dyadic rationals and extending by continuity, one obtains for each x � = y a unique metric minimal geodesic γ : [ 0, 1 ] → M satisfying d ( γ ( t ) , γ ( s )) = | t − s | d ( x , y ) , γ ( 0 ) = x , γ ( 1 ) = y . • Any (some) classical problems based on Hilbert spaces arises in NPC spaces; convex and stochastic analysis, probabilistic measure theory, optimal transport, optimization, metric geometry, averaging (e.g., Fermat-Weber problem).
Weighted Means in NPC-Spaces For the minimal (metric) geodesic γ : [ 0, 1 ] → M with γ ( 0 ) = x and γ ( 1 ) = y , we denote γ ( t ) by x # t y and call it the t - weighted mean of x and y . The midpoint x # 1 / 2 y we denote simply as x # y . We remark that by uniqueness x # t y = y # 1 − t x ; in particular, x # y = y # x . We note that x # t y = ( 1 − t ) x + ty for x , y ∈ R n , and thus x # t y can be thought of as a generalization of the latter. In P the minimal geodesic from A to B for the trace metric extends to a geodesic line γ : R → P , and for each t , γ ( t ) = A # t B , the t -weighted geometric mean.
Recommend
More recommend