7th Conference on Multivariate Distributions with Applications Manifold Matching: Joint Optimization of Fidelity & Commensurability Carey E. Priebe Department of Applied Mathematics & Statistics Johns Hopkins University August, 2010 Maresias, Brazil 1 / 23
Collaborators David J. Marchette Zhiliang Ma Sancar Adali &c. ——————– Support: AFOSR, NSSEFF, ONR, HLTCOE, ASEE 2 / 23
Problem Formulation Given x i 1 ∼ · · · ∼ x ik ∼ · · · ∼ x iK , i = 1 , . . . , n 3 / 23
Problem Formulation Given x i 1 ∼ · · · ∼ x ik ∼ · · · ∼ x iK , i = 1 , . . . , n • n objects are each measured under K different conditions • x i 1 ∼ · · · ∼ x ik ∼ · · · ∼ x iK denotes K matched feature vectors representing a single object O i • x ik ∈ Ξ k 3 / 23
Problem Formulation Given x i 1 ∼ · · · ∼ x ik ∼ · · · ∼ x iK , i = 1 , . . . , n • n objects are each measured under K different conditions • x i 1 ∼ · · · ∼ x ik ∼ · · · ∼ x iK denotes K matched feature vectors representing a single object O i • x ik ∈ Ξ k • K new measurements { y k } K k =1 , y k ∈ Ξ k 3 / 23
Problem Formulation Given x i 1 ∼ · · · ∼ x ik ∼ · · · ∼ x iK , i = 1 , . . . , n • n objects are each measured under K different conditions • x i 1 ∼ · · · ∼ x ik ∼ · · · ∼ x iK denotes K matched feature vectors representing a single object O i • x ik ∈ Ξ k • K new measurements { y k } K k =1 , y k ∈ Ξ k Question Are { y k } K k =1 matched feature vectors representing a single object measured under K conditions? 3 / 23
Hypotheses Ξ 1 · · · Ξ K Object O 1 x 11 ∼ · · · ∼ x 1 K . . . . . . . . . . . . Object O n ∼ · · · ∼ x n 1 x nK 4 / 23
Hypotheses Ξ 1 · · · Ξ K Object O 1 x 11 ∼ · · · ∼ x 1 K . . . . . . . . . . . . Object O n ∼ · · · ∼ x n 1 x nK • Each space Ξ k comes with a dissimilarity δ k , yielding dissimilarity matrices ∆ 1 , · · · , ∆ K 4 / 23
Hypotheses Ξ 1 · · · Ξ K Object O 1 x 11 ∼ · · · ∼ x 1 K . . . . . . . . . . . . Object O n ∼ · · · ∼ x n 1 x nK • Each space Ξ k comes with a dissimilarity δ k , yielding dissimilarity matrices ∆ 1 , · · · , ∆ K • Given new measurements { y k } K k =1 we can obtain within-condition dissimilarities δ k ( y k , x ik ) , i = 1 , . . . , n, k = 1 , . . . , K 4 / 23
Hypotheses Ξ 1 · · · Ξ K Object O 1 x 11 ∼ · · · ∼ x 1 K . . . . . . . . . . . . Object O n ∼ · · · ∼ x n 1 x nK • Each space Ξ k comes with a dissimilarity δ k , yielding dissimilarity matrices ∆ 1 , · · · , ∆ K • Given new measurements { y k } K k =1 we can obtain within-condition dissimilarities δ k ( y k , x ik ) , i = 1 , . . . , n, k = 1 , . . . , K • Goal ( K = 2 ): determine whether y 1 and y 2 are a match 4 / 23
Hypotheses Ξ 1 · · · Ξ K Object O 1 x 11 ∼ · · · ∼ x 1 K . . . . . . . . . . . . Object O n ∼ · · · ∼ x n 1 x nK • Each space Ξ k comes with a dissimilarity δ k , yielding dissimilarity matrices ∆ 1 , · · · , ∆ K • Given new measurements { y k } K k =1 we can obtain within-condition dissimilarities δ k ( y k , x ik ) , i = 1 , . . . , n, k = 1 , . . . , K • Goal ( K = 2 ): determine whether y 1 and y 2 are a match H 0 : y 1 ∼ y 2 versus H A : y 1 ≁ y 2 (we control the probability of missing a true match) 4 / 23
what are these “conditions” and what does it mean to be “matched” • let condition be language for a text document, and “matched” mean “on the same topic” • let condition be modality for an photo, and “matched” mean “of the same person” – indoor lighting vs outdoor lighting – two cameras of different quality – passport photos and airport surveillance photos • let condition 1 be wiki text document and condition 2 be wiki hyperlink structure • let condition 1 be text document and condition 2 be photo • . . . or just a single space with multiple dissimilarities 5 / 23
(not matched) The English is clear enough to lorry drivers — but the Welsh reads “I am not in the office at the moment. Send any work to be translated.” < http://news.bbc.co.uk/2/hi/uk_news/wales/7702913.stm > 6 / 23
Manifold Matching I Conditional distributions are induced by maps π k from “object space” Ξ Ξ π 1 π K · · · Ξ 1 Ξ K Conditional spaces Ξ k are not commensurate 7 / 23
Manifold Matching I Conditional distributions are induced by maps π k from “object space” Ξ Ξ π 1 π K · · · Ξ 1 Ξ K ∃ ϕ ? Conditional spaces Ξ k are not commensurate 7 / 23
Dirichlet Setting Let S p be the standard p -simplex in R p +1 Let Ξ 1 = S p and Ξ 2 = S p (but the fact that the two spaces are the same is unknown to the algorithms ...) Let α i ∼ iid Dirichlet (1) represent n “objects” or “topics” Let X ik ∼ iid Dirichlet ( rα i + 1) represent K languages (WCHs) 8 / 23
Dirichlet Setting Let S p be the standard p -simplex in R p +1 Let Ξ 1 = S p and Ξ 2 = S p (but the fact that the two spaces are the same is unknown to the algorithms ...) Let α i ∼ iid Dirichlet (1) represent n “objects” or “topics” Let X ik ∼ iid Dirichlet ( rα i + 1) represent K languages (WCHs) • r controls “what it means to be matched” (document variability & translation quality analogy) Ξ 1 Ξ 2 1 1 r r X i 1 α i α i X i 2 8 / 23
Manifold Matching II Matched points are used to define maps ρ k to the same space X (with distance d ) Ξ π 1 π K · · · Ξ 1 Ξ K · · · ρ 1 ρ K X Reject for d ( � y 1 , � y 2 ) “large” 9 / 23
Manifold Matching II Matched points are used to define maps ρ k to the same space X (with distance d ) Ξ π 1 π K · · · Ξ 1 Ξ K · · · ρ 1 ρ K X = R d Reject for d ( � y 1 , � y 2 ) “large” 9 / 23
canonical correlation • Multidimensional scaling yields high-dimensional embeddings: ∆ 1 �→ X ′ 1 and ∆ 2 �→ X ′ 2 • Canonical correlation finds U 1 : X ′ 1 �→ X 1 and U 2 : X ′ 2 �→ X 2 to maximize correlation • Out-of-sample embedding: y 1 �→ y ′ 1 , y 2 �→ y ′ 2 y 1 = U T 1 y ′ y 2 = U T 2 y ′ 2 are in R d • Both � 1 and � with same coordinate system (i.e., they are commensurate) • Reject for d ( � y 1 , � y 2 ) “large” 10 / 23
procrustes ◦ mds • Multidimensional scaling yields low-dimensional embeddings: ∆ 1 �→ X 1 and ∆ 2 �→ X 2 • Procrustes ( X 1 , X 2 ) yields Q ∗ = arg min � X 1 − X 2 Q � F Q T Q = I y ′ • Out-of-sample embedding: y 1 �→ � y 1 , y 2 �→ � 2 y 2 = Q ∗ � y ′ 2 are in R d • Both � y 1 and � with same coordinate system (i.e., they are commensurate) • Reject for d ( � y 1 , � y 2 ) “large” 11 / 23
fidelity & commensurability Fidelity is how well the mapping preserves original dissimilarities; our within-condition fidelity error is given by � 1 x jk ) − δ k ( x ik , x jk )) 2 . � n � ǫ f k = ( d ( � x ik , � 2 1 ≤ i<j ≤ n Commensurability is how well the mapping preserves matchedness; our between-condition commensurability error is given by � ǫ c k 1 k 2 = 1 x ik 2 ) − δ k 1 k 2 ( x ik 1 , x ik 2 )) 2 . ( d ( � x ik 1 , � n 1 ≤ i ≤ n Alas, δ k 1 k 2 does not exist; however, our story seems to suggest that it might be reasonable to let δ k 1 k 2 ( x ik 1 , x ik 2 ) = 0 for all i, k 1 , k 2 . NB: There is also between-condition separability error given by � 1 x jk 2 ) − δ k 1 k 2 ( x ik 1 , x jk 2 )) 2 . � n � ǫ s k 1 k 2 = ( d ( � x ik 1 , � 2 1 ≤ i<j ≤ n 12 / 23
Methodological Comparison • canonical correlation optimizes commensurability without regard for fidelity • procrustes ◦ mds optimizes fidelity without regard for commensurability 13 / 23
Methodological Comparison • canonical correlation optimizes commensurability without regard for fidelity • procrustes ◦ mds optimizes fidelity without regard for commensurability • compare: joint optimization of fidelity & commensurability . . . 13 / 23
Omnibus Embedding Approach n × n n × n n 1 × 2 n = × n 1 ∆ 1 W × u 1 u 2 2 n M n × n 1 v 2 T ∆ 2 n × n × 1 W v 1 v 1 y 1 u 1 T T y 2 u 2 T v 2 T • Under “matched” assumption, impute dissimilarities δ 12 ( x i 1 1 , x i 2 2 ) to obtain an omnibus dissimilarity matrix M • Embed M as 2 n points in R d • Let u i 1 = δ 1 ( y 1 , x i 1 ) and v i 2 = δ 2 ( y 2 , x i 2 ) • Under H 0 : y 1 ∼ y 2 , impute v i 1 = δ 12 ( y 1 , x i 2 ) and u i 2 = δ 12 ( y 2 , x i 1 ) 1 ) T and ( u T • Out-of-sample embedding of ( u T 1 , v T 2 , v T 2 ) T yields � y 1 and � y 2 14 / 23
Simulation Results n=100, p=3, d=2, r=100, c=0.1, q=3 ROC curves: β against α 1.0 0.8 0.6 power 0.4 0.2 pom cca 0.0 jofc 0.0 0.2 0.4 0.6 0.8 1.0 alpha Simulation results indicate that joint optimization of fidelity & commensurability via omnibus embedding approach is (for this case) superior to canonical correlation and procrustes ◦ mds 15 / 23
Recommend
More recommend