A survey on mixing coe cients: computation and estimation. Vitaly - PowerPoint PPT Presentation

A survey on mixing coe� cients: computation and estimation. Vitaly Kuznetsov Courant Institute of Mathematical Sciences, New York University October 29, 2013 1 / 24

Introduction Binary classi� cation Receive a sample X 1 , . . . , X m with labels in { 0 , 1 } . Choose a hypothesis h that has a good expected performance on unseen data. X 1 , . . . , X m are typically assumed i.i.d. 2 / 24

Introduction (continued) Much of the learning theory operates under the assumption that data comes from an i.i.d. source. In certain scenarios this assumption is not appropriate, e.g. time series analysis. To extend learning theory to this scenarios we need to � nd a suitable relaxation of i.i.d. requirement. One common approach found in literature is imposing various \ mixing conditions" . Under these mixing conditions the strength of dependence between random variables is measured using \ mixing coe� cients" . 3 / 24

Outline Mixing conditions and coefficients: definitions and basic properties. Computational aspects. Estimating mixing coefficients. Discussion. 4 / 24

How can we measure dependence between random variables? Common measures of dependence are so called “mixing” coefficients. Originally introduced to prove laws of large numbers for sequences of dependent variables. 5 / 24

α mixing coe� cient between two σ -algebras Given a probability space (Ω , F , P ) and two sub σ -algebras σ 1 and σ 2 , define α -mixing coefficient α ( σ 1 , σ 2 ) = sup | P ( A ) P ( B ) − P ( A ∩ B ) | A , B where supremum is taken over all A ∈ σ 1 and B ∈ σ 2 . 6 / 24

ϕ mixing coe� cient Define ϕ -mixing coefficient ϕ ( σ 1 | σ 2 ) = sup | P ( A ) − P ( A | B ) | A , B where supremum is taken over all A ∈ σ 1 and B ∈ σ 2 . Note that ϕ coefficient is not symmetric. 7 / 24

β mixing coe� cient De� ne β -mixing coe� cient between two σ -algebras σ 1 and σ 2 : β ( σ 1 , σ 2 ) = E sup | P ( A ) − P ( A | σ 2 ) | A where supremum is taken over all A ∈ σ 1 . We can rewrite β -mixing coe� cient as follows: I J � � 1 β ( σ 1 , σ 2 ) = | P ( A i ) P ( B j ) − P ( A i ∩ B j ) | 2 sup i =1 j =1 where supremum is taken over all � nite partitions { A 1 , . . . , A I } and { B 1 , . . . , B J } of � such that A i ∈ σ 1 and B j ∈ S 2 . 8 / 24

Alternative de� nitions of β mixing coe� cient This leads to yet another characterization of β -mixing coe� cient: β ( σ 1 , σ 2 ) = � P σ 1 ⊗ P σ 2 − P σ 1 ⊗ σ 2 � where � · � denotes the total variation distance, i.e. � P − Q � = sup A | P ( A ) − Q ( A ) | . Assuming distributions P and Q have densities f and g respectively � 1 � P − Q � = | f − g | 2 9 / 24

Relations between mixing coe� cients We have the following: 2 α ( σ 1 , σ 2 ) ≤ β ( σ 1 , σ 2 ) ≤ ϕ ( σ 1 , σ 2 ) The second inequality is immediate from the de� nition. Proof of the � rst inequality: | P ( A ) P ( B ) − P ( A ∩ B ) | + | P ( A ) P ( B c ) − P ( A ∩ B c ) | + | P ( A c ) P ( B ) − P ( A c ∩ B ) | + | P ( A c ) P ( B c ) − P ( A c ∩ B c ) | ≤ 2 β ( σ 1 , σ 2 ) 10 / 24

From two variables to stochastic processes (i) Let { X t } ∞ t = −∞ be a doubly infinite sequence of random variables. Notation: X j i = ( X i , X i + 1 , . . . , X j ) P j i is the joint probability distribution of X j i σ j i is the σ -algebra generated by X j i 11 / 24

From two variables to stochastic processes (ii) De� ne the following mixing coe� cients t α ( σ t −∞ , σ ∞ α ( a ) = sup t + a ) −∞ , σ ∞ t β ( σ t β ( a ) = sup t + a ) −∞ , σ ∞ t ϕ ( σ t ϕ ( a ) = sup t + a ) We say that a sequence of random variables X ∞ −∞ is α , β or ϕ mixing if the corresponding mixing coe� cient → 0 as a → ∞ . These coe� cients measure dependence between future and the past separated by a time units. 12 / 24

Stationary stochastic processes A stochastic process X ∞ −∞ is (strictly) stationary for any t ∈ Z and k , n ∈ N the distribution of X t + n is the t same as the distribution of X t + k + n . t + k For stationary processes mixing coe� cients can be simpli� ed to −∞ , σ ∞ α ( a ) = α ( σ 0 a ) −∞ , σ ∞ β ( a ) = β ( σ 0 a ) −∞ , σ ∞ ϕ ( a ) = ϕ ( σ 0 a ) 13 / 24

Connections to machine learning Theorem (M. Mohri, A. Rostamizadeh, 2009): Let H = {X → Y} be a set of hypothesis and L be an M -bounded loss function. Let S be a sample of size 2 µ a from a stationary β -mixing process on X × Y , for any δ > 4( µ − 1) β ( a ) with probability at least 1 − δ ′ the following holds for all h ∈ H � m log 4 E [ L ( h ( X ) , Y )] ≤ 1 � L ( h ( X i ) , Y i ) + ^ δ ′ R S µ ( L ◦ H ) + 3 M m 2 µ i =1 where ^ R S µ denotes the empirical Rademacher complexity and δ ′ = δ − 4( µ − 1) β ( a ). Other results of the similar nature by R. Meir, M. Mohri and A. Rostamizadeh, I. Steinwart et. al. to name a few. 14 / 24

Can we compute mixing coe� cients? Theorem (M. Ahsen, M. Vidyasagar, 2013): Suppose X and Y are discrete random variables with known joint and marginal probability distributions. Then computing α -mixing coe� cient is NP - hard. (equivalent to \ partition problem" ). Ahsen and Vidyasgar also give e� ciently computable upper and lower bounds. 15 / 24

Can we compute mixing coe� cients? (continued) Theorem (M. Ahsen, M. Vidyasagar, 2013): Suppose X and Y are discrete random variables with known joint distribution θ ij and marginal probability distributions µ i and ν j . Then one has that � � 1 β ( σ ( X ) , σ ( Y )) = | γ ij | 2 � 1 ϕ ( σ ( X ) , σ ( Y )) = max max( γ ij , 0) ν j j i where γ ij = θ ij − µ i ν j . Thus, β ( σ ( X ) , σ ( Y )) and ϕ ( σ ( X ) , σ ( Y )) both are computable in polynomial time. 16 / 24

Estimation of mixing coe� cients: naive approach (i) Question: Given i.i.d. samples ( X 1 , Y 1 ) , . . . , ( X m , Y m ) from a joint distribution of real-valued ( X , Y ), can we estimate any of the mixing coe� cients? De� ne the following estimators of the joint and marginal distributions: m 1 � ^ � ( x ) = I X i ≤ x m i =1 m 1 � ^ � ( y ) = I Y i ≤ y m i =1 m 1 ^ � � ( x , y ) = I X i ≤ x , Y i ≤ y m i =1 Let ^ β and ^ ϕ be estimators of β and γ based on empirical c.d.f.’s. 17 / 24

Estimation of mixing coe� cients: naive approach (ii) Theorem (M. Ahsen, M. Vidyasagar, 2013): β = m − 1 ϕ ≥ ^ → 1 as m → ∞ ^ m Justification: Under empirical probability distributions each sample has mass 1 / m . Marginals are also uniform and hence product distribution assigns mass of 1 / m to each point in the grid ( x i , y j ). The conclusion now follows from the above formula for discrete β . 18 / 24

Estimation of mixing coe� cients: histograms (i) A histogram estimator ^ f of a density f based on a sample X 1 , . . . , X m is J ^ p j � ^ f ( x ) = I B j ( x ) mw j j =1 where B j ’s are bins partitioning the region with observations m � ^ p j = I B j ( X i ) counts number of samples in bin B j i =1 w j is the width of the j-th bin 19 / 24

Estimation of mixing coe� cients: histograms (ii) Given m samples choose J m intervals on R so that each bin contains ⌊ m / J m ⌋ or ⌊ m / J m ⌋ + 1 samples from both X and Y . Theorem (M. Ahsen, M. Vidyasagar, 2013): Suppose ( X , Y ) ∼ θ , X ∼ µ and Y ∼ ν with θ being absolutely continuous with respect to µ ⊗ ν . Then ^ β converges to β provided that J m / m → 0. If in addition, the density f ∈ L ∞ then ^ α and ^ ϕ also converge to α and ϕ respectively. The measure-theoretic arguments used in the proof establish consistency of the estimators but do not yield error rates. 20 / 24

Estimation of mixing coe� cients: stochastic processes (ii) Theorem (D. McDonald, C. Shalizi, M. Shervish, 2011): Let X m 1 be a sample from a stationary β -mixing process. For m = 2 µ m b m and d ≤ µ m we have that � − µ m ǫ 2 � − µ m ǫ 2 � � P ( | ^ 1 2 β d ( a ) − β d ( a ) | ≥ ǫ ) ≤ 2 exp + 2 exp 2 2 + 4( µ m − 1) β ( b m ) | ^ | ^ � � where ǫ 1 = ǫ/ 2 − E [ f d − f d | ] and ǫ 2 = ǫ − E [ f 2 d − f 2 d | ]. Proof is based on blocking technique. 22 / 24

Estimation of mixing coe� cients: stochastic processes (iii) | β d ( a ) − β ( a ) | a measure-theoretic argument can be used to show that this → 0 as d → ∞ . Under the assumption that densities f d and f 2 d are in the Sobolev space H 2 McDonald, Shalizi and Shervish argue that ^ f 2 d and ^ f d are consistent. Choosing d m = O (exp( W (log n )), w m = O ( m − k m ) where 1 W (log m ) + 2 log m k m = log m ( 1 2 exp( W (log n )) + 1) and W is an inverse of w exp( w ), they show that estimator of β based on histograms is consistent. 23 / 24

A survey on mixing coe cients: computation and estimation. Vitaly - PowerPoint PPT Presentation

A survey on mixing coe cients: computation and estimation. Vitaly Kuznetsov Courant Institute of Mathematical Sciences, New York University October 29, 2013 1 / 24 Introduction Binary classi cation Receive a sample X 1 , . . . , X m

OPE coe ffi cients, string field theory vertex and integrability Romuald A. Janik Jagiellonian

Entropy and mixing for Z d SFTs Ronnie Pavlov University of Denver www.math.du.edu/ rpavlov

Bayesian Subnational Estimation using Complex Survey Data: Overview, Motivation and Survey

Entropy and mixing for Z d SFTs Ronnie Pavlov University of Denver www.math.du.edu/ rpavlov

Interpretation of regression coe ffi cients Correlation and Regression Is that textbook

Variance Estimation for Survey-Weighted Data Using Bootstrap Resampling Methods: 2013

Transport coe ffi cients of QGP in strong magnetic fields Daisuke Satow (Frankfurt U. ! )

AUTOMATIC MIXING Dissonance suppression during harmonic mixing A journey through the DJ world by

Entropy and mixing for Z d SFTs Ronnie Pavlov University of Denver www.math.du.edu/ rpavlov

Duality based error estimation for electrostatic force computation Author: Simon Pintarelli

Dense optical flow estimation in image sequences and disparity map computation for stereo pairs

Math 211 Math 211 Lecture #7 Mixing Problems September 10, 2003 2 Mixing Problem #1 Mixing

Seebeck and Nernst coe ffi cients of the heavy-electron metals Kamran Behnia Ecole Suprieure de

A Survey on Analog Models of Computation Amaury Pouly Joint work with Olivier Bournez

Survey of estimation method of washout effect of radioactive materials from soil by rainfall Dr.

Herding behaviour in digital currency markets: An integrated survey and empirical estimation

Survey of Fast Methods for large-scale tree estimation J E S U S S A N D O V A L Introduction

Using Social Network Information In Survey Estimation Thomas S ue and Raymond Chambers

Generalized Approximate Survey Propagation for Hig igh-dimensional Estimation Luca Saglietti Yue

Bayesian Subnational Estimation using Complex Survey Data: Introduction to R Zehang Richard Li

UNMANNED AERIAL VEHICLE (UAV) SURVEY FOR YEAR-END MINING RECLAMATION ESTIMATION Prepared For

Mixing transition in time-dependent Mixing transition in time-dependent flows flows Presented

Entropy and mixing for Z d SFTs Ronnie Pavlov University of Denver www.math.du.edu/ rpavlov

Domain Estimation of Survey Discontinuities Nikos Tzavidis 1 Joint work with Paul Smith