Framework Main results An application Deep Neural Networks Conclusion An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport View on Generalization 1 / 10
Framework Main results An application Deep Neural Networks Conclusion Outline Framework Main results An application Deep Neural Networks An Optimal Transport View on Generalization 2 / 10
Framework Main results An application Deep Neural Networks Conclusion learning algorithm A : Z n → W instance space Z = X × Y hypothesis space W underlying distribution D training sample S n ∼ D ⊗ n loss function ℓ : Z×W → R + An Optimal Transport View on Generalization 3 / 10
Framework Main results An application Deep Neural Networks Conclusion learning algorithm A : Z n → W instance space Z = X × Y hypothesis space W underlying distribution D training sample S n ∼ D ⊗ n loss function ℓ : Z×W → R + risk R ( w ) = E z ∼ D [ ℓ ( z , w )] empirical risk R S n ( w ) = E z ∼ S n [ ℓ ( z , h )] = 1 � n i = 1 ℓ ( z i , w ) n � � � � = E R ( W ) − R S n ( W ) generalization error G D , P W | S n An Optimal Transport View on Generalization 3 / 10
Framework Main results An application Deep Neural Networks Conclusion µ and ν two measures on W � T ( X , W ) = µ ( X ) coupling T measure on W × W such that T ( W , X ) = ν ( X ) wasserstein distance T ∈ Γ ( µ,ν ) E ( W , W ′ ) ∼ T [ d W ( W , W ′ )] ❲ 1 ( µ,ν ) = inf algorithmic transport cost of algorithm A ( P W | S n ) � � � � �� Opt D , P W | S n = E z ∼ D ❲ 1 P W , P W | z An Optimal Transport View on Generalization 4 / 10
Framework Main results An application Deep Neural Networks Conclusion � � � � �� A.G.T. Opt D , P W | S n = E z ∼ D ❲ 1 P W , P W | z main theorem � � � � � � R ( W ) − R S n ( W ) ≤ K × Opt G D , P W | S n = E D , P W | S n An Optimal Transport View on Generalization 5 / 10
❲ ❲ Framework Main results An application Deep Neural Networks Conclusion 1 a 0 Z n → W x i if { i | y i = 0 } � ∅ max A : 1 ≤ i ≤ n S n = { ( x 1 , y 1 ) ,..., ( x n , y n ) } �→ s.t. y i = 0 0 otherwise An Optimal Transport View on Generalization 6 / 10
Framework Main results An application Deep Neural Networks Conclusion 1 a 0 Z n → W x i if { i | y i = 0 } � ∅ max A : 1 ≤ i ≤ n S n = { ( x 1 , y 1 ) ,..., ( x n , y n ) } �→ s.t. y i = 0 0 otherwise P W ( w ) = ( 1 − a ) n − k + n ( w + 1 − a ) n P W | z = δ x if x ≤ a P W | z = δ 0 otherwise � a ❲ 1 ( µ,δ t ) = E X ∼ µ [ d ( X , t )] = ⇒ ❲ 1 ( P W ,δ x ) = | x − w | P W ( w ) d w 0 An Optimal Transport View on Generalization 6 / 10
Framework Main results An application Deep Neural Networks Conclusion 1 a 2 (( − a + 1 ) n + 2 ) n + a 2 ( 3 ( − a + 1 ) n + 2 ) + 2 (( − a + 1 ) n n + ( − a + 1 ) n ) x 2 � ❲ 1 ( P W ,δ x ) = 2 ( an + a ) ( − a + x + 1 ) n − 2 a (( − a + 1 ) n + 1 ) − 2 ( a ( 2 ( − a + 1 ) n + 1 ) n a 2 − ax − a � � − 4 + a ( 2 ( − a + 1 ) n + 1 )) x ) W ( P W , δ x ) W ( P W , δ x ) 0.5 0.12 0.11 0.4 0.1 0.09 0.3 0.08 0.2 0.07 0.06 0.1 0.05 x x 0 0.05 0.1 0.15 0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 W ( P W , δ x ) 0.6 0.5 0.4 0.3 0.2 0.1 x 0.1 0.2 0.3 0.4 0.5 0.6 Figure: Graphs of the Wasserstein distance between P W and δ x for An Optimal Transport View on Generalization 7 / 10 several value of a and n . First row: n 10 with a 0 2 (left) and
Framework Main results An application Deep Neural Networks Conclusion � a � � = ❲ 1 ( P W ,δ x ) d x + ( 1 − a ) ❲ 1 ( P W ,δ 0 ) Opt D , P W n 0 1 2 a 2 ( 2 ( − a + 1 ) n − 3 ) − a 2 ( 4 ( − a + 1 ) n + 3 ) − 3 a (( − a + 1 ) n + 2 ) � � � � � n 2 D , P W n = Opt 6 ( n 2 + 3 n + 2 ) 3 a 2 + 3 a (( − a + 1 ) n − 2 ) − 2 ( − a + 1 ) n + 2 − 6 a (( − a + 1 ) n − 2 ) � � � − 3 n � � � � D , P W | S n ≤ 1 × Opt D , P W n G An Optimal Transport View on Generalization 8 / 10
Framework Main results An application Deep Neural Networks Conclusion �� � K 2 R 2 I ( S n ; W ) 2 log 1 − H � � R ( W ) − R S n ( W ) ≤ exp E 2 n η An Optimal Transport View on Generalization 9 / 10
Framework Main results An application Deep Neural Networks Conclusion Powerful theoretical tool (average case, link with information theory) Quite quickly too convoluted to provide concrete bounds An Optimal Transport View on Generalization 10 / 10
Recommend
More recommend