fast quantification of uncertainty and robustness with
play

Fast Quantification of Uncertainty and Robustness with Variational - PowerPoint PPT Presentation

Fast Quantification of Uncertainty and Robustness with Variational Bayes Tamara Broderick ITT Career Development Assistant Professor, MIT With: Ryan Giordano, Rachael Meager, Jonathan H. Huggins, Michael I. Jordan Bayesian inference


  1. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) q ( θ ) KL ( q k p ( ·| x )) • VB practical success • point estimates and prediction • fast 3

  2. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Liebler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3

  3. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3

  4. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3

  5. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3

  6. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3

  7. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast 3 [Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

  8. Variational Bayes • VB approximation • Approximation for q ∗ ( θ ) p ( θ | x ) posterior • Minimize Kullback-Leibler (KL) divergence: p ( θ | x ) KL ( q k p ( ·| x )) q ∗ ( θ ) • VB practical success • point estimates and prediction • fast, streaming, distributed 3 [Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

  9. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  10. What about uncertainty? • Variational Bayes ! • Mean-field variational Bayes (MFVB) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) • No covariance estimates 4

  11. What about uncertainty? • Variational Bayes ! • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  12. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  13. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  14. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  15. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  16. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [Bishop 2006]

  17. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates 4 [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]

  18. What about uncertainty? • Variational Bayes q ( θ ) log q ( θ ) Z KL ( q || p ( ·| x )) = p ( θ | x ) d θ ! θ • Mean-field variational Bayes (MFVB) θ 2 p ( θ | x ) J Y q ( θ ) = q ( θ j ) ! q ∗ ( θ ) j =1 • Underestimates variance (sometimes severely) θ 1 • No covariance estimates [MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011] 4 [Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2015]

  19. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  20. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  21. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  22. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • True posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  23. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  24. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  25. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  26. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006]

  27. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  28. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  29. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  30. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  31. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  32. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  33. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003] [Bishop 2006] [Bishop 2006]

  34. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  35. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  36. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation  d �� d � Σ = dtC p ( ·| x ) ( t ) � [Bishop 2006] dt T � t =0 5 [see also Opper, Winther 2003]

  37. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  38. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  39. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  40. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  41. Linear response • Cumulant-generating function � mean = d C ( t ) := log E e t T θ � dtC ( t ) � ! � t =0 • Exact posterior covariance vs MFVB covariance d 2 d 2 � � � � Σ := dt T dtC p ( ·| x ) ( t ) V := dt T dtC q ∗ ( t ) ! � � � � t =0 t =0 • “Linear response” p ( θ | x ) ! log p t ( θ ) := log p ( θ | x ) + t T θ − C ( t ) , MFVB q ∗ t q ∗ ( θ ) • The LRVB approximation � � d d =: ˆ � � dt T E p t θ t θ Σ = dt T E q ∗ Σ ≈ � � [Bishop 2006] � � t =0 t =0 5 [see also Opper, Winther 2003]

  42. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t 6

  43. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t 6

  44. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t = ( I − V H ) − 1 V 6

  45. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ 6

  46. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ 6

  47. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ 6

  48. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL • The LRVB assumption: 6

  49. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL • The LRVB assumption: E p t θ ≈ E q ∗ t θ 6

  50. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL p ( θ | x ) • The LRVB assumption: E p t θ ≈ E q ∗ t θ q ∗ ( θ ) [Bishop 2006] 6

  51. LRVB estimator � d • LRVB covariance estimate ˆ � t θ Σ := dt T E q ∗ � � t =0 • Suppose exponential family with mean parametrization q t m t ✓ ∂ 2 KL ◆ − 1 � = ( I − V H ) − 1 V ˆ � Σ = � ∂ m ∂ m T � m = m ∗ • Symmetric and positive definite at local min of KL p ( θ | x ) • The LRVB assumption: E p t θ ≈ E q ∗ t θ q ∗ ( θ ) • LRVB estimate is exact when MFVB gives exact mean (e.g. multivariate normal) [Bishop 2006] 6

  52. Microcredit Experiment • Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7

  53. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7

  54. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7

  55. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7

  56. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: ! ! • Priors and hyperpriors: 7

  57. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  58. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  59. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  60. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  61. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  62. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  63. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  64. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  65. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: 7

  66. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ µ k µ iid ∼ N , C τ k τ 7

  67. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ µ k µ iid ∼ N , C τ k τ iid σ − 2 ∼ Γ ( a, b ) k 7

  68. Microcredit Experiment • Simplified from Meager (2016) • K microcredit trials (Mexico, Mongolia, Bosnia, India, Morocco, Philippines, Ethiopia) • N k businesses in k th site (~900 to ~17K) 1 if microcredit • Profit of n th business at k th site: profit ! indep N ( µ k + T kn τ k , σ 2 k ) y kn ∼ ! • Priors and hyperpriors: ✓ ◆ ✓✓ ◆ ◆ ✓ ◆ ✓✓ ◆ ◆ µ k µ iid µ iid µ 0 , Λ − 1 ∼ N ∼ N , C τ k τ τ τ 0 iid σ − 2 ∼ Γ ( a, b ) C ∼ Sep&LKJ( η , c, d ) k 7

Recommend


More recommend