Surfing: Iterative Optimization Over Incrementally Trained Deep Networks Ganlin Song, Zhou Fan, John Lafferty Department of Statistics and Data Science Yale University Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 1 / 6
Background We consider inverting a trained generative network G by x � G ( x ) − y � 2 min x f ( x ) = min Generative Model Invert a Generator Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 2 / 6
Background • Compressed sensing framework: observe z = Ay + ǫ ; recover y by (Bora, Jalal, Price & Dimakis 2017) x � AG ( x ) − z � 2 x f ( x ) = min min Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 3 / 6
Background • Compressed sensing framework: observe z = Ay + ǫ ; recover y by (Bora, Jalal, Price & Dimakis 2017) x � AG ( x ) − z � 2 x f ( x ) = min min • f ( x ) is non-convex; gradient descent not guaranteed to reach global optimum Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 3 / 6
Motivation Landscape of x �→ − f θ ( x ) = −� G θ ( x ) − y � 2 , as weights θ are trained Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 4 / 6
Algorithm Intuition • The landscape for initial random network is “nice” • Initialize with random network and track optimum for intermediate networks Surfing Algorithm • Obtain a sequence of parameters θ 0 , θ 1 , . . . , θ T during training • Optimize empirical risk function f θ 0 , f θ 1 , . . . , f θ T iteratively using gradient descent • For each t ∈ { 1 , . . . , T } , initialize gradient descent at the solution from time t − 1 Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 5 / 6
Theory and Experiments Theoretical Results If G θ has random parameters, all critical points of f θ ( x ) belong to a small 1 neighborhood around 0 with high probability (Builds on Hand & Voroninski 2017) Under certain conditions, modified surfing can track the minimizer 2 Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 6 / 6
Theory and Experiments Theoretical Results If G θ has random parameters, all critical points of f θ ( x ) belong to a small 1 neighborhood around 0 with high probability (Builds on Hand & Voroninski 2017) Under certain conditions, modified surfing can track the minimizer 2 Experiments For DCGAN trained on Fashion-MNIST min x � G θ ( x ) − G θ ( x 0 ) � 2 min x � AG θ ( x ) − Ay � 2 Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 6 / 6
Recommend
More recommend