Lecture 18 : Black Box Variational Inference Niklas Swede Scribes Margulies mark : - Daniel 2- eiberg
Stochastic Recap : Variation Inference al Bound ( ELBO ) Lower Objective Evidence : , ,[ log.gl?tTr.. , ] 219,10 Ea , ) = ;µaa . Natural Gradient invariant Coordinate : . of Computable steepest descent notion . form exponential family for models in closed t.LU ) ] , ;¢,[ yglx I Eq , ) ,z ,a = . + § D ) ( tcxa ,7a ) ngl 7.4 d 92 x. + = , , ,
Variation Inference Today Black box : . - form Problem Closed VBEM 1 : updates in be for SUI to derived need re and every model new . Problem 2 Closed form updates practice in : . in Conjugate work to require priors order models of limits ( which the number we | define can
Variation Inference Black box . foal Can formulate general strategies : we the ELBO ? computing for gradients of Taegu , ] log MEET T.LK : , ,[ ) = How Problem do compute gradients : we ? of expectations Monte Can Carlo ? methods ' Idea : we use
Gradients Expectations of Problem Compute ( z gradient respect to with : I set parameters ef some # function Ann % [ f( ; z , ] El z if ) Any q distribution Easy Distribution does not ) : case qcz I depend an [ fc2.si T.IE ) ] fast ) ] , ,[ Eq P , = ( ) z q § § " ' flz ' " To :o) 9177 7 ± n ,
Gradients Expectations of Harden 71 ;g,[ Case Distribution d0es= ;D ) : qcz I depend an , |dt :o) fat , =P % Eqcz fl ] qct 0×917=511 fl 7 ;D ) | ' at = . here ? What do do we
Williams Gradients 1992 Expectations of ) ( @ # Northeastern ( ) Trick REINFORCE Rewrite use % got :D ) i ;D ) log Tg " 917 :D , get % = 9¥ , hel Li Estimator head ratio : : - fat ft ;D , [ f ) I get :D ) 7,9 Eqcz To A ) = / To log fo :D :D da 917 get ) ) ) = , [ To log fth ) Eq :D got , = ;g , - Mc Can approximation
Blach Box Variatianal Inference fct ;D ) ✓at at ional Perform Inference Idea , Likelihood Ratio with Estimator - , ] £17 ) , ,[ To log PgYI?,÷ To Eg , = ,± Egan ,|% leg PgYzYf,d log qizsd , = ;D ) ] Poly got - i ) ] Egan , 10 , gash ( log log Phhf?÷ = - , .
Blach Box Variatianal Inference % leg Observation Value of qct ;D : ) Pg ) |d7 . :D % log ;D ql7 ;D ) d7 ) 0 917 got = ) = to estimator Implication Can a constant add any : , ( log PhY¥I a ) ] Expected Egan , [ T.ly VILLA ) qasi = , . =±§%byqz' " d) ( bspgYI÷' i. a ) Llo ) - * v
Black Box Variational Inference flog High Variance Reinforce Problem Estimators hare style : - a ) I Los to , loggia :O , Et To - = go.io , IT ' %) ;D Norm ( ;D , t qcz ' = leg ;D get , P I pig , 't log go 2- / . . . . it Y . by 917%1 To Few Many here here samples scruples large ) all ( small ( all terms terms )
Exploiting PlYitiB_) Conditional eogqcz.o.p.iq/ligw Independence 99,133 ⇐ ) £1197 ) b Ea = - I 7,0 ,p ) = ? leggttdulodn ) t § leg gloat qcz.O.pl t fu logqtpulwu leg Eat ) Idea Can conditional use : we simplify to independence Bh Gd yan far K ? gradients expressions Fdn Nd D
quo . Exploiting PCYitiB_) Conditional Independence 97133 / 917,0 log ) £1197 ( pit ) ) ( b E , log r w = - , , p ) an £ I bi ) ) I loan ) ( log Dolan by IF to 91 2- w an = - 917,0 , p ) flog ? P' i ) ) = Ethan ) gcoasqcp , autos 917dm Idan ) 9%1,171,0%7 - T lower estimator much dimensional - ;D ? } ! ! ; " " = Cabg P ' t fees log fees w + Gd ) C Bul Jul god 9 Lan leg + fu log q log , p ) loan ) Ea leg qc gttdul a) wi ) q Cao t =
Control Variates General Setup terms Zero mean preserve : - values expected II [ fits ] hcz , ] ) ] # [ flzi # [ ft hcz ) , ] ( E[ a = = - - Goal Choose to minimize : variance a Var ( ILA ] . ) ] . ) ] Van [ 1,17 ) ) a 'Var[ flu ,hh hh 2cal + - : = favor [ fly ) covlfth.hn# 0 a = → Van [ . ) ) hh
BBVI Control Variates : 'm = st § ; 1) ( bspgYI÷ % a ) 's ' Lb T.by ) qiz - , To log log Phff#o , fat qct :D ) Cov[fth,hczi]_ = a = % qc 7 ;D ) log hth hl 7) ) Var [ = 's } !fa' fill " it 't in ) 1 = - . , t 5 I. In it 'z' " I -
Blach Box Variatianal Inference Algorithm 1 C ) I ( randomly Initialize ) - For T t in 1 - pmnsilide y { i. , . , for ,s sin i. - ... 't " ' ' ' ;D ' ) z qc 7 . ~ ' ) 8. an ) Lcd 's ? ( fcz " ) - = - It " ' ' " ' " t.CH d ) g + . = \ vanilla Can SGD replace with algorithms recent move
SGA 86A Improvements on 't 't,£( ' + " " ' " Normal 2 7 a) g : = Estimator Problem high be 1 : Variance may ) ( control vaniates even with LH ) ]= VILA ) to Var[§Lt ) ] 1%11/912 E[ but > > Problem of We have computing 2 : no way RD ' ) gradient De 0 ( natural when D - ' "= ' "ti' 0k " ' " Fila 7 I Hi ; ) g . = dij 27 :
ADAM SGA Improvements : on " ' " ' Parameters B Be f , f E : - , , , . . . , , I c ) I ( randomly " ' ' " " '=o Initialize ) - g so = o v m , , Fon T t in 7 - . , . . , G , 11911,9=9 't ' 't " ' g = - ' ' 't " - p 't ' - ' " ' - p n' " in It , ) p 111 in , ) in t g = = - , 't 'll " - " 't 'T 't ' ' " I I g th B.) 13 v v pay = - - , , = u - / ( TE ' ' " 't e ) " " I 't ' wi I ' " g + + = -
SGA 86A Improvements on 't 't,£( ' + " " ' " Normal 2 I a) g : = Improvement 2 : Smooth time steps over LH ) ] VILA ) to Problem E[ : = IT .LI/7)I2 Var[§Lt ) ] but > > Solution :
Blach Box Variatianal Inference Algorithm 1 C ) I ( randomly Initialize ) - For T t in 1 - i. , . , For 5 in 7 - s . . . , , 't " ' ' ' ;D ' ) z qc 7 . ~ ' ' ) " ) f 's { " ' " ' ' " glz log I fit ' Pg ;D got - = = ' ) by 's 't 's 't PgY÷?¥ flz 's 's ? got ' 's , , ,y← get g- = = " ) F) § ( f ( gc ' ' 1 ) g- 's 7 't . - an = . § 2 g) ' " ( gcz 1 .
Normal 86A SGA + gradient Normal "ti' Improvements an ) natural not gradient / 't 't,L( ' " " ' " I 2 a) g : = Improvement Approximate 1 Gradient Natural : ' "= 023 " ' ' " Film 9 9 Hi ; ) g . = diidij " Rb O(D3 Problem ) H It requires time for : ( H ) Approximation H diag : = 06 " 2 (d) 5) diag ( H ) [ ( To :[ # = ? = dg ,
Recommend
More recommend