Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) q ( θ ) KL ( q k p ( ·| x )) • VB practical success • point estimates and prediction • fast 3
Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3
Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3
Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3
Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3
Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3
Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3 [Broderick, Boyd, Wibisono, Wilson, Jordan 2013]
Variational Bayes • Variational Bayes (VB) • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast, streaming, distributed 3 [Broderick, Boyd, Wibisono, Wilson, Jordan 2013]
What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]
What about uncertainty? • Variational Bayes ! • Mean-field variational Bayes (MFVB) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) • No covariance estimates 4
What about uncertainty? • Variational Bayes ! • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]
What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]
What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]
What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]
What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]
What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]
What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]
What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011] 4 [Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2015]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation d �� d � Σ = dtC p ( ·| x ) ( t ) � dt T � t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006] [Bishop 2006]
Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [Bishop 2006]
LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t 6
LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t 6
LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t = ( I − V H ) − 1 V 6
LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ 6
LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ 6
LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ 6
LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL • The LRVB assumption: 6
LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL • The LRVB assumption: E p t θ ≈ E q ∗ t θ 6
LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL p ( θ | x ) • The LRVB assumption: E p t θ ≈ E q ∗ t θ q ∗ ( θ ) [Bishop 2006] 6
LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL p ( θ | x ) • The LRVB assumption: E p t θ ≈ E q ∗ t θ q ∗ ( θ ) • LRVB estimate is exact when MFVB gives exact mean (e.g. multivariate normal) [Bishop 2006] 6
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ µ k µ iid ∼ N , C τ k τ 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ µ k µ iid ∼ N , C τ k τ iid σ − 2 ∼ Γ ( a, b ) k 7
Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ ✓ ◆ ✓✓ ◆ ◆ µ k µ iid µ iid µ 0 , Λ − 1 ∼ N ∼ N , C τ k τ τ τ 0 iid σ − 2 ∼ Γ ( a, b ) C ∼ Sep&LKJ( η , c, d ) k 7
Recommend
More recommend