Stein’s method for diffusion approximation Thomas Bonis DataShape team, Inria
K-nearest-neighbor graph We draw n points in R d , X 1 , . . . , X n ∼ dν = fdλ
K-nearest-neighbor graph We draw n points in R d , X 1 , . . . , X n ∼ dν = fdλ We add an edge between each point and its K-nearest-neighbors
K-nearest-neighbor graph We draw n points in R d , X 1 , . . . , X n ∼ dν = fdλ We add an edge between each point and its K-nearest-neighbors When K, n → ∞ , K/n → 0 (+other condition on K ), can we recover f using only the graph structure?
Random walk on K-nearest-neighbor graph A random walk on the graph captures information on ν .
Random walk on K-nearest-neighbor graph A random walk on the graph captures information on ν . The random walk approximates the diffusion process with generator f − 2 /d ( ∇ (log f ) . ∇ + 1 2∆) .
Random walk on K-nearest-neighbor graph A random walk on the graph captures information on ν . The random walk approximates the diffusion process with generator f − 2 /d ( ∇ (log f ) . ∇ + 1 2∆) . Diffusion has invariant measure µ with density proportional to f 2+2 /d .
Random walk on K-nearest-neighbor graph A random walk on the graph captures information on ν . The random walk approximates the diffusion process with generator f − 2 /d ( ∇ (log f ) . ∇ + 1 2∆) . Diffusion has invariant measure µ with density proportional to f 2+2 /d . Does π , the invariant measure of the random walk, converge to µ ?
Random walk on ǫ -graph Edge between X i and X j if � X i − X j � ≤ ǫ . The random walk approximates the diffusion process with generator ∇ (log f ) . ∇ + 1 2∆ . Diffusion has invariant measure µ with density proportional to f 2 . π ( X i ) proportional to the degree of X i (the ball density estimator → f ). Since we have more points where f is large, π converges to a measure with density proportional to f 2 .
Stein discrepancy Let γ be the gaussian measure Z be drawn from γ . Then, ∀ φ , E [ − Z. ∇ φ ( Z ) + ∆ φ ( Z )] = 0 .
Stein discrepancy Let γ be the gaussian measure Z be drawn from γ . Then, ∀ φ , E [ − Z. ∇ φ ( Z ) + ∆ φ ( Z )] = 0 . Let X be drawn from ν . We say that a measure ν admits a Stein kernel τ ν with respect to γ if there exists τ ν such tha, ∀ φ , E [ − X. ∇ φ ( X )+ < τ ν ( X ) , Hess ( φ )( X )) > HS ] = 0 .
Stein discrepancy Let γ be the gaussian measure Z be drawn from γ . Then, ∀ φ , E [ − Z. ∇ φ ( Z ) + ∆ φ ( Z )] = 0 . Let X be drawn from ν . We say that a measure ν admits a Stein kernel τ ν with respect to γ if there exists τ ν such tha, ∀ φ , E [ − X. ∇ φ ( X )+ < τ ν ( X ) , Hess ( φ )( X )) > HS ] = 0 . Intuitively, if τ ν is close to I d then ν is close to γ . The distance between τ ν and I d is quantified by: S ( ν, µ ) 2 = E [ � τ ν ( X ) − I d � 2 ] .
Bounding the Wasserstein distance with S
Bounding the Wasserstein distance with S Theorem [Ledoux, Nourdin, Peccati 2015] Let ν be a measure admitting a Stein kernel τ ν and let S ( ν ) be the associated Stein discrepancy. We have: W 2 ( ν, γ ) ≤ S ( ν )
Bounding the Wasserstein distance with S Theorem [Ledoux, Nourdin, Peccati 2015] Let ν be a measure admitting a Stein kernel τ ν and let S ( ν ) be the associated Stein discrepancy. We have: W 2 ( ν, γ ) ≤ S ( ν ) Problem: in the general case, discrete measures do not admit a Stein kernel. Example: if the Rademacher measure admited a Stein kernel, there would exist τ such that for any smooth function φ φ ′ ( − 1) − φ ′ (1) + τ (1) φ ′′ (1) + τ ( − 1) φ ′′ ( − 1) = 0 . Can be dealt with using a smoothing procedure (relying on the zero-bias distribution), but not practical in high dimensions.
Generalizing the Stein kernel X ∼ ν . There exists an operator L such that ∀ φ, E [ L φ ( X )] = 0 , compare L with − x. ∇ + ∆ .
Generalizing the Stein kernel X ∼ ν . There exists an operator L such that ∀ φ, E [ L φ ( X )] = 0 , compare L with − x. ∇ + ∆ . Let X and X ′ be drawn from ν , then ∀ φ , E [ φ ( X ) − φ ( X ′ )] = 0 , and by a Taylor-Expansion ∞ E [ X ′ k ] � φ ( k ) ] = 0 . E [ k ! k =0
Another bound on W 2 Theorem [Dimension 1] Let ν be a measure of R and r.v. X , ( X t ) t ≥ 0 drawn from ν . Let Y t = X t − X , for any h > 0 , � ∞ e − t E [( 1 h E [ Y t | X ] + X ) 2 ] 1 / 2 dt W 2 ( ν, γ ) ≤ 0 � ∞ e − 2 t E [ Y 2 1 − e − 2 t E [( 1 t | X ] − 1) 2 ] 1 / 2 dt √ + 2 h 0 � ∞ e − kt k !(1 − e − 2 t ) ( k − 1) / 2 E [ E [ Y k t | X ] 2 ] 1 / 2 dt � + √ h 0 k> 2
Another bound on W 2 Theorem [Dimension 1] Let ν be a measure of R and r.v. X , ( X t ) t ≥ 0 drawn from ν . Let Y t = X t − X , for any h > 0 , � ∞ e − t E [( 1 h E [ Y t | X ] + X ) 2 ] 1 / 2 dt W 2 ( ν, γ ) ≤ Rescaling 0 � ∞ e − 2 t E [ Y 2 1 − e − 2 t E [( 1 t | X ] − 1) 2 ] 1 / 2 dt √ + 2 h 0 � ∞ e − kt k !(1 − e − 2 t ) ( k − 1) / 2 E [ E [ Y k t | X ] 2 ] 1 / 2 dt � + √ h 0 k> 2
Another bound on W 2 Theorem [Dimension 1] Let ν be a measure of R and r.v. X , ( X t ) t ≥ 0 drawn from ν . Let Y t = X t − X , for any h > 0 , First moment close to − X . � ∞ e − t E [( 1 h E [ Y t | X ] + X ) 2 ] 1 / 2 dt W 2 ( ν, γ ) ≤ Rescaling 0 � ∞ e − 2 t E [ Y 2 1 − e − 2 t E [( 1 t | X ] − 1) 2 ] 1 / 2 dt √ + 2 h 0 � ∞ e − kt k !(1 − e − 2 t ) ( k − 1) / 2 E [ E [ Y k t | X ] 2 ] 1 / 2 dt � + √ h 0 k> 2
Another bound on W 2 Theorem [Dimension 1] Let ν be a measure of R and r.v. X , ( X t ) t ≥ 0 drawn from ν . Let Y t = X t − X , for any h > 0 , First moment close to − X . � ∞ e − t E [( 1 h E [ Y t | X ] + X ) 2 ] 1 / 2 dt W 2 ( ν, γ ) ≤ Rescaling 0 Second moment close to 1 . � ∞ e − 2 t E [ Y 2 1 − e − 2 t E [( 1 t | X ] − 1) 2 ] 1 / 2 dt √ + 2 h 0 � ∞ e − kt k !(1 − e − 2 t ) ( k − 1) / 2 E [ E [ Y k t | X ] 2 ] 1 / 2 dt � + √ h 0 k> 2
Another bound on W 2 Theorem [Dimension 1] Let ν be a measure of R and r.v. X , ( X t ) t ≥ 0 drawn from ν . Let Y t = X t − X , for any h > 0 , First moment close to − X . � ∞ e − t E [( 1 h E [ Y t | X ] + X ) 2 ] 1 / 2 dt W 2 ( ν, γ ) ≤ Rescaling 0 Second moment close to 1 . � ∞ e − 2 t E [ Y 2 1 − e − 2 t E [( 1 t | X ] − 1) 2 ] 1 / 2 dt √ + 2 h 0 � ∞ e − kt k !(1 − e − 2 t ) ( k − 1) / 2 E [ E [ Y k t | X ] 2 ] 1 / 2 dt � + √ h 0 k> 2 Higher moments small, start at 0 and grow as t increases. √ Roughly, Y t has to be bounded by t .
Another bound on W 2 Theorem [Dimension 1] Let ν be a measure of R and r.v. X , ( X t ) t ≥ 0 drawn from ν . Let Y t = X t − X , for any h > 0 , � ∞ e − t E [( 1 h E [ Y t | X ] + X ) 2 ] 1 / 2 dt W 2 ( ν, γ ) ≤ 0 � ∞ e − 2 t E [ Y 2 1 − e − 2 t E [( 1 t | X ] − 1) 2 ] 1 / 2 dt √ + 2 h 0 � ∞ e − kt k !(1 − e − 2 t ) ( k − 1) / 2 E [ E [ Y k t | X ] 2 ] 1 / 2 dt � + √ h 0 k> 2 Similar result (in dimension 1 only !) for W p , p ≥ 1 .
Another bound on W 2 Let µ be the invariant measure of an operator L = b. ∇ + < a, ∇ 2 > . Under technical conditions on L (a curvature dimension inequality), a similar result holds.
Another bound on W 2 Let µ be the invariant measure of an operator L = b. ∇ + < a, ∇ 2 > . Under technical conditions on L (a curvature dimension inequality), a similar result holds. A similar result holds under technical conditions on L (for example under a curvature dimension inequality). If • E [ Y t | X ] is close to b ( X ) . • E [ Y 2 t | X ] is close to a ( X ) . • E [ Y k t ] are small for k > 2 . then W 2 ( ν, µ ) is small.
Convergence rates in the Central Limit Theorem Consider i.i.d random variables X 1 , . . . , X n with measure ν and E [ X 1 ] = 0 , E [ X 2 1 ] = 1 . The Central Limit Theorem gives n S n = n − 1 / 2 � → N (0 , 1) . i =1 How fast does it converge?
Convergence rates in the Central Limit Theorem Consider i.i.d random variables X 1 , . . . , X n with measure ν and E [ X 1 ] = 0 , E [ X 2 1 ] = 1 . The Central Limit Theorem gives n S n = n − 1 / 2 � → N (0 , 1) . i =1 How fast does it converge? Let ˜ X 1 , . . . , ˜ X n i.i.d. copies of X 1 , . . . , X n and I a uniform r.v. on 1 , . . . , n . We pose ( S n ) t = S n + n − 1 / 2 ( ˜ X I − X I ) .
Convergence rates in the Central Limit Theorem Consider i.i.d random variables X 1 , . . . , X n with measure ν and E [ X 1 ] = 0 , E [ X 2 1 ] = 1 . The Central Limit Theorem gives n S n = n − 1 / 2 � → N (0 , 1) . i =1 How fast does it converge? Let ˜ X 1 , . . . , ˜ X n i.i.d. copies of X 1 , . . . , X n and I a uniform r.v. on 1 , . . . , n . We pose ( S n ) t = S n + n − 1 / 2 ( ˜ X I − X I )1 X i , ˜ √ √ tn ] . X i ∈ [ − tn,
Recommend
More recommend