Graphon Estimation: Minimax Rates and Posterior Contraction Chao Gao Yale University @Leiden, March 2015
Stochastic Block Model z : { 1 , 2 , ..., n } ! { 1 , 2 , ..., k } A ij ⇠ Bernoulli( θ ij ) θ ij = Q z ( i ) z ( j ) Goal: recover θ ij
Biclustering (Hartigan, 1972) z 1 : { 1 , 2 , ..., n } ! { 1 , 2 , ..., k } z 2 : { 1 , 2 , ..., m } ! { 1 , 2 , ..., l } E ( A ij ) = θ ij = Q z 1 ( i ) z 2 ( j ) Goal: recover θ ij
Nonparametric Regression y i = f ( x i ) + ✏ i x i 2 D , ✏ i ⇠ N (0 , 1) 2 D ⇠ Common assumption: f is smooth on D . Goal: recover f from both x and y
A More Challenging Problem y i = f ( x i ) + ✏ i x i 2 D , ✏ i ⇠ N (0 , 1) 2 D ⇠ Common assumption: f is smooth on D . Goal: recover f from only y
• 1D Problem • 2D Problem • Minimax Rate for Stochastic Block Model • Minimax Rate for Graphon Estimation • Adaptive Bayes Estimation
1D Problem x i = i y i = f ( x i ) + ✏ i , i = 1 , 2 , .., n n, n o F = f : f ( x ) = q 1 for x 2 (0 , 1 / 2] , f ( x ) = q 2 for x 2 (1 / 2 , 1] n ! 1 ⇣ 1 ( ˆ X f ( x i ) � f ( x i )) 2 inf sup n. E n ˆ f f ∈ F i =1
1D Problem x i = i y i = f ( x i ) + ✏ i , i = 1 , 2 , .., n n, Without observing x , the problem is equivalent to y i = ✓ i + ✏ i . Θ = { ✓ : half ✓ i is q 1 , half ✓ i is q 2 } n ! 1 (ˆ X ✓ i � ✓ i ) 2 inf sup ⇣ 1 . E n ˆ θ θ ∈ Θ i =1
2D Problem ⇠ i = i y ij = f ( ⇠ i , ⇠ j ) + ✏ ij , n, i, j = 1 , 2 , .., n F collects f such that q 1 ( x, y ) 2 [0 , 1 / 2) ⇥ [0 , 1 / 2) q 2 ( x, y ) 2 [0 , 1 / 2) ⇥ [1 / 2 , 1] f ( x, y ) = q 3 ( x, y ) 2 [1 / 2 , 1] ⇥ [0 , 1 / 2) q 4 ( x, y ) 2 [1 / 2 , 1] ⇥ [1 / 2 , 1]
2D Problem 0 1 @ 1 A ⇣ 1 ( ˆ X f ( ⇠ i , ⇠ j ) � f ( ⇠ i , ⇠ j )) 2 inf sup n 2 . E n 2 ˆ f f ∈ F 1 ≤ i,j ≤ n F How about without knowing the design? 0 1 @ 1 A ⇣ 1 X ( ˆ f ( ⇠ i , ⇠ j ) � f ( ⇠ i , ⇠ j )) 2 inf sup n. E n 2 ˆ f f ∈ F 1 ≤ i,j ≤ n
2D Problem Let θ ij = f ( ξ i , ξ j ). Does θ ij have any structure? { θ i 1 , θ i 2 , ..., θ in } are from the same row for each i . { θ 1 j , θ 2 j , ..., θ nj } are from the same column for each j .
2D Problem ⇠ ij 2 [0 , 1] 2 , y ij = f ( ⇠ ij ) + ✏ ij , i, j = 1 , 2 , .., n { } Without knowing the design? 0 1 @ 1 ( ˆ X f ( ⇠ ij ) � f ( ⇠ ij )) 2 A ⇣ 1 . inf sup E n 2 ˆ f f ∈ F 1 ≤ i,j ≤ n
Stochastic Block Model A ij ⇠ Bernoulli( ✓ ij ) n o Θ 2 = θ : θ ij = Q z ( i ) z ( j ) , with z : [ n ] ! [2] 0 1 @ 1 A ⇣ 1 (ˆ X ✓ ij � ✓ ij ) 2 inf sup n. E n 2 ˆ θ θ ∈ Θ 2 1 ≤ i,j ≤ n
Stochastic Block Model A ij ⇠ Bernoulli( ✓ ij ) n o Θ k = θ : θ ij = Q z ( i ) z ( j ) , with z : [ n ] ! [ k ] Theorem 1.1. Under the stochastic block model, we have 8 9 ; ⇣ k 2 1 n 2 + log k < = (ˆ X θ ij � θ ij ) 2 inf sup , E n 2 n ˆ θ θ ∈ Θ k : i,j ∈ [ n ] for any 1 k n .
Stochastic Block Model n Let k ⇣ n δ , for δ 2 [0 , 1]. 8 n − 2 δ = 0 , k = 1 , > > > > > > > n − 1 > δ = 0 , k > 1 , k 2 > n 2 + log k < ⇣ n n − 1 log n δ 2 (0 , 1 / 2] , > > > > > > > n − 2(1 − δ ) > δ 2 (1 / 2 , 1] . > :
Graphon Estimation Theorem (Aldous-Hoover) . A random array { A ij } is jointly exchangeable in the sense that { A ij } d = { A � ( i ) � ( j ) } for all permutation � , if and only if it can be represented as follows: there is a random function F : [0 , 1] 3 ! R such that d = F ( ⇠ i , ⇠ j , ⇠ ij ) , A ij where { ⇠ i } and { ⇠ ij } are i.i.d. Unif [0 , 1] .
Graphon Estimation ⇣ 2 When the graph is undirected and has no self-loop, A ij | ξ i , ξ j ⇠ Bernoulli( θ ij ) , θ ij = f ( ξ i , ξ j ) . ξ i ⇠ Unif(0 , 1) i.i.d. 2 Goal: recover f .
Graphon Estimation A ij | ξ i , ξ j ⇠ Bernoulli( θ ij ) , θ ij = f ( ξ i , ξ j ) . ( ξ 1 , ..., ξ n ) ⇠ P ξ Assumption: f 2 F α ( M ). older class F α ( M ) , defined in Section 2.3. We have Theorem 1.2. Consider the H¨ 8 9 n − 2 α ( α +1 , 0 < α < 1 , 1 < = (ˆ X θ ij � θ ij ) 2 inf sup sup E ; ⇣ n 2 log n ˆ n , α � 1 . θ ξ ∼ P ξ f ∈ F α ( M ) : i,j ∈ [ n ] The expectation is jointly over { A ij } and { ξ i } .
Graphon Estimation Proof: ⇢ 1 k 2 α + k 2 � n 2 + log k min n k
Lower Bound Proof When 1 < k O (1), the minimax rate is 1 n . Su ffi cient to prove for k = 2.
Lower Bound Proof Proposition (Fano) . Let ( Θ , ⇢ ) be a metric space and { P ✓ : ✓ 2 Θ } a collection of probability measures. For any T ⇢ Θ , denote by M ( ✏ , T, ⇢ ) the ✏ -packing number of T w.r.t. ⇢ . Define the KL diameter of T by d KL( T ) = sup D ( P ✓ || P ✓ 0 ) . ✓ , ✓ 0 2 T Then ✏ 2 1 � d KL( T ) + log 2 ✓ ◆ E ✓ ⇢ 2 ⇣ ⌘ ˆ inf sup ✓ ( X ) , ✓ � sup 4 log M ( ✏ , T, ⇢ ) ˆ ✏ > 0 ✓ ✓ 2 Θ
Lower Bound Proof • Construct a subset • Upper bound the KL-diameter • Lower bound the packing number
Lower Bound Proof ( { ✓ ij } ∈ [0 , 1] n ⇥ n : ✓ ij = 1 2 for ( i, j ) ∈ ( S × S ) ∪ ( S c × S c ) , = T ) ✓ ij = 1 c √ n for ( i, j ) ∈ ( S × S c ) ∪ ( S c × S ) , with some S ∈ S 2 + . 1 1 c S S 2 + p n 2 1 c 1 2 + p n S c S 2 S S c S S
Lower Bound Proof ij ) 2 = 2 c 2 ⇢ 2 ( ✓ , ✓ 0 ) = 1 | I S � I S 0 | ( n � | I S � I S 0 | ) X ( ✓ ij � ✓ 0 . n 2 n n n 1 i,j n 1 1 c S S 2 + p n 2 1 c 1 2 + p n S c S 2 S S c S S
Lower Bound Proof Construct a subset: � T ⇢ Θ k � Upper bound the KL diameter || � 8 || ✓ − ✓ 0 || 2 ≤ 8 c 2 n. sup D ( P θ || P θ 0 ) ≤ sup � θ , θ 0 2 T θ , θ 0 2 T � Lower bound the packing number
Lower Bound Proof Lower bound the packing number I − I S 0 | as the Hamming 1 4 n ≤ | I S − I S 0 | ≤ 3 s.t. o pick S 1 , ..., S N ⊂ S 4 n, ⇢ 2 ( ✓ , ✓ 0 ) = 2 c 2 � c 2 | I S � I S 0 | ( n � | I S � I S 0 | ) 8 n =: ✏ 2 . n n n M ( ✏ , T, ⇢ ) ≥ N ≥ exp( c 1 n )
Lower Bound Proof 0 1 A � c 2 1 � 8 c 2 n + log 2 ✓ ◆ @ 1 & 1 X (ˆ ✓ ij � ✓ ij ) 2 inf sup n. E n 2 32 n c 1 n ˆ θ θ ∈ Θ 2 1 ≤ i,j ≤ n
Upper Bound ∥ − ∥ Oracle solution When the clustering z is known, an obvious estimator 1 ˆ � for ( i, j ) ∈ z − 1 ( a ) × z − 1 ( b ) θ ij = A ij , | z − 1 ( a ) || z − 1 ( b ) | ( i,j ) ∈ z − 1 ( a ) × z − 1 ( b ) achieves the rate ∥ ˆ θ − θ ∥ 2 k 2 � � F ≤ O P .
Upper Bound An equivalent form (least squares) Fixing the known z , then solve ∥ A − θ ∥ 2 min F θ θ ij = Q z ( i ) z ( j ) for some Q = Q T ∈ [0 , 1] k × k s.t. A natural estimator Solve ∥ A − θ ∥ 2 min F θ θ ij = Q z ( i ) z ( j ) for some Q = Q T ∈ [0 , 1] k × k s.t. and some z : { 1 , 2 , ..., n } → { 1 , 2 , ..., k } . k 2 + n log k ∥ ˆ θ − θ ∥ 2 � � F ≤ O P
Bayes Estimation � D ( k 2 + n log k ) 1. Sample k ⇠ ⇡ . � � ⇡ ( k ) / exp 2. Sample z 2 { z : [ n ] ! [ k ] } . uniform 3. Sample Q ⇠ f . ? 4. Let ✓ ij = Q z ( i ) z ( j ) .
Bayes Estimation � D ( k 2 + n log k ) 1. Sample k ⇠ ⇡ . � � ⇡ ( k ) / exp � � 2. Sample z 2 { z : [ n ] ! [ k ] } . uniform sdf ◆ k 2 Γ ( k 2 / 2) ✓ � k f ( Q ) = 1 3. Sample Q ⇠ f . Γ ( k 2 ) e � � k || Q || p ⇡ 2 4. Let ✓ ij = Q z ( i ) z ( j ) .
Bayes Estimation Γ ( k 2 ) � D ( k 2 + n log k ) 1. Sample k ⇠ ⇡ . � � ⇡ ( k ) / Γ ( k 2 / 2) exp � � 2. Sample z 2 { z : [ n ] ! [ k ] } . uniform sdf ◆ k 2 Γ ( k 2 / 2) ✓ � k f ( Q ) = 1 3. Sample Q ⇠ f . Γ ( k 2 ) e � � k || Q || p ⇡ 2 4. Let ✓ ij = Q z ( i ) z ( j ) .
Bayes Estimation � D ( k 2 + n log k ) 1. Sample k ⇠ ⇡ . � � ⇡ ( k ) / exp 2. Sample z 2 { z : [ n ] ! [ k ] } . uniform sdf ✓ λ k ◆ k 2 f ( Q ) = 1 e − λ k || Q || p π 3. Sample Q ⇠ f . 2 4. Let ✓ ij = Q z ( i ) z ( j ) .
Bayes Estimation ✓ ◆ 2 π Theorem 1.3. Consider λ k = β n k for some constant β > 0 . Then 0 1 ✓ k 2 ◆ � @ 1 n 2 + log k ij ) 2 > M k 2 + n log k X A exp � � C 0 � �� ( θ ij � θ ⇤ � A , E θ ∗ Π � n 2 n i,j for some constants M, C 0 > 0 .
Reference Gao, Chao, Yu Lu, and Harrison H. Zhou. "Rate-optimal Graphon Estimation." arXiv preprint arXiv:1410.5837 (2014).
Thank you
Recommend
More recommend