Consistent Kernel Mean Estimation for Functions of Random Variables Ilya Tolstikhin jointly with C.-J. Simon-Gabriel, A. ` Scibior, and B. Sch¨ olkopf (NIPS 2016) Dagstuhl December 2016
Motivation Given: ◮ Independent random variables X ∈ X and Y ∈ Y ; ◮ i.i.d. samples { X i } N i =1 and { Y j } N j =1 ; ◮ Any function f : X × Y → Z . Construct a flexible representation for the distribution of Z = f ( X, Y ) . Let’s represent distributions using their mean embeddings. The simplest estimator is: √ := 1 � N µ (1) � � ˆ f ( X i , Y i ) , · N − consistent i =1 k Z . Z N Experiments show that the U-statistic estimator performs better: √ 1 � N µ (2) � � ˆ := i,j =1 k Z f ( X i , Y j ) , · . N − consistent Z N 2
Motivation Given: ◮ Independent random variables X ∈ X and Y ∈ Y ; ◮ i.i.d. samples { X i } N i =1 and { Y j } N j =1 ; ◮ Any function f : X × Y → Z . Construct a flexible representation for the distribution of Z = f ( X, Y ) . Let’s represent distributions using their mean embeddings. The simplest estimator is: √ := 1 � N µ (1) � � ˆ f ( X i , Y i ) , · N − consistent i =1 k Z . Z N Experiments show that the U-statistic estimator performs better: √ 1 � N µ (2) � � ˆ := i,j =1 k Z f ( X i , Y j ) , · . N − consistent Z N 2
Motivation Experiments show that the U-statistic estimator performs better: √ 1 � N µ (2) � � ˆ := i,j =1 k Z f ( X i , Y j ) , · . N − consistent Z N 2 Unfortunately, N 2 may be computationally prohibitive. Sch¨ olkopf et. al (2015) : take n ≪ N and use reduced set methods to � N i =1 k ( X i , · ) ≈ � n 1 i =1 w i k ( X ′ 1. Approximate i , · ) ; N � N j =1 k ( Y j , · ) ≈ � n 1 j =1 v j k ( Y ′ 2. Approximate j , · ) ; N 3. Use the following estimator: n � f ( X ′ i , Y ′ � � µ Z := ˆ w i v j k Z j ) , · . i,j =1 Question: is ˆ µ Z consistent?
Motivation Experiments show that the U-statistic estimator performs better: √ 1 � N µ (2) � � ˆ := i,j =1 k Z f ( X i , Y j ) , · . N − consistent Z N 2 Unfortunately, N 2 may be computationally prohibitive. Sch¨ olkopf et. al (2015) : take n ≪ N and use reduced set methods to � N i =1 k ( X i , · ) ≈ � n 1 i =1 w i k ( X ′ 1. Approximate i , · ) ; N � N j =1 k ( Y j , · ) ≈ � n 1 j =1 v j k ( Y ′ 2. Approximate j , · ) ; N 3. Use the following estimator: n � f ( X ′ i , Y ′ � � µ Z := ˆ w i v j k Z j ) , · . i,j =1 Question: is ˆ µ Z consistent?
New results Answer: yes, ˆ µ Z is indeed consistent. Assume: Proof based on [SS16] ◮ X and Z are compact; ◮ f : X → Z is continuous; ◮ k X , k Z are continuous p.d. kernels on X and Z ; ◮ k X is c 0 -universal; ◮ There exists C s.t. � i | w i | ≤ C independently of n . Then: N N � � � � w i k X ( X i , · ) → µ X ⇒ w i k Z f ( X i ) , · → µ Z . H k X H k Z i =1 i =1 ◮ Importantly, w 1 , . . . , w N and X 1 , . . . , X N can be interdependent. ◮ Finite sample guarantees for X = R d , Z = R d ′ and Mat´ ern kernels. ◮ Applications: probabilistic programming, privacy-preserving ML, . . .
New results Answer: yes, ˆ µ Z is indeed consistent. Assume: Proof based on [SS16] ◮ X and Z are compact; ◮ f : X → Z is continuous; ◮ k X , k Z are continuous p.d. kernels on X and Z ; ◮ k X is c 0 -universal; ◮ There exists C s.t. � i | w i | ≤ C independently of n . Then: N N � � � � w i k X ( X i , · ) → µ X ⇒ w i k Z f ( X i ) , · → µ Z . H k X H k Z i =1 i =1 ◮ Importantly, w 1 , . . . , w N and X 1 , . . . , X N can be interdependent. ◮ Finite sample guarantees for X = R d , Z = R d ′ and Mat´ ern kernels. ◮ Applications: probabilistic programming, privacy-preserving ML, . . .
Related results. . . ◮ Minimax Estimation of Kernel Mean Embeddings T., Sriperumbudur, Muandet, 2016, arXiv Task: � X k ( x, · ) dP ( x ) based on the i.i.d. sample { X i } N Estimate i =1 Result: for translation-invariant kernels you can not do it faster than N − 1 / 2 . ◮ Minimax Estimation of MMD with Radial Kernels T., Sriperumbudur, Sch¨ olkopf, 2016, NIPS Task: Estimate � µ P − µ Q � H k based on i.i.d. samples { X i } N i =1 and { Y i } M i =1 Result: for radial kernels you can not do it faster than N − 1 / 2 + M − 1 / 2 .
Recommend
More recommend