Examples and Implementations [Bayesian approach to Latent Class - PDF document

BIOSTAT 830 GRAPHICAL MODELS Problem Set 4 – Case Study: Latent Class Models Note: 1. Due 11:59PM, December 21, 2016. 2. Electronic submission to your instructor’s email. 3. You are VERY MUCH encouraged to form teams to discuss proofs and program algorithms. If so, please acknowledge your teammate(s)’ contributions at the beginning of your submitted homework. You must independently write your homework based on your own understanding. 4. Choose any programming language you like, R, Python, Matlab, C/C++, Julia, etc. Examples and Implementations [Bayesian approach to Latent Class Models: Definition, Simulation, Estimation and The Choice of Number of Classes] This Problem is a simulation study of latent class models, which is a widely useful and effective class of models for studying multivariate discrete data. The latent class models have a long history and wide applications in disease diagnosis, psychology, psychiatrics, pattern recognition, data compression, etc. You will be asked to simulate data from latent class models given parameters, and then hide the true parameters and fit the latent class models. To specify a latent class model with 𝑁 " classes, we define 𝒛 $ , to be a vector of length 𝐿 indicating individual 𝑗 ’s binary response to 𝐿 items, 𝜃 $ ∈ {1, … , 𝑁 " } to be individual 𝑗 ’s unobserved latent class, and 𝜌 0 = 𝑄(𝜃 $ = 𝑘) to be the probability that individual 𝑗 is in class 𝑘 for 𝑘 = 1, … , 𝑁 " . Here we assume there are 𝑂 subjects. For example, in the studies investigating major depressive disorder, investigators obtain information on the symptoms through NIMH Diagnostic Interview Schedule. The data 𝒛 $ is a vector representing the presence or absence of 𝐿 symptoms of depression for individual 𝑗 , 𝜃 $ is individual 𝑗′𝑡 true but unknown depression class, and 𝜌 0 is the proportion of individuals in the population of which our sample is representative in depression class 𝑘 . Given 𝜃 $ , elements 𝑧 $: of 𝒛 $ are assumed to be mutually independent so that the distribution of 𝒛 $ is ? @ G BID EF , D EF 𝑔 𝒛 $ ; 𝝆, 𝒒 = 𝜌 0 𝑞 0: 1 − 𝑞 0: 0AB :AB where 𝑞 0: = 𝑄(𝑧 $: = 1 ∣ 𝜃 $ = 𝑘) is the probability that individual 𝑗 , who is in class 𝑘 , will have a positive response to item 𝑙 . 1) Draw the directed acyclic graph (DAG), 𝐻, with nodes 𝑧 $: , 𝑞 0: , 𝜌 0 , {𝜃 $ }, so that the joint distribution with density 𝑔(𝒛 $ ; 𝝆, 𝒒, 𝜃 $ ) is Markov to 𝐻 . ( Note: if we condition on an individual’s latent class 𝜃 $ , her binary response vector 𝒛 $ is independent of 𝝆 . Also, use BIOSTAT 830 GRAPHICAL MODELS 1

BIOSTAT 830 GRAPHICAL MODELS Problem Set 4 – Case Study: Latent Class Models minimal number of edges.) 2) In the DAG you drew, for a directed arrow from 𝜃 $ to 𝑧 $: , write the mathematical condition on 𝑔(𝒛 $ ; 𝝆, 𝒒, 𝜃 $ ) that will make it disappear. State its interpretation. 3) Simulate a dataset, 𝐸 ∗ , with 𝑂 = 300 subjects, 𝑁 " = 3 classes, 𝐿 = 5 symptoms, with 0.1 0.9 0.1 0.15 0.1 𝑞 0: = , 0.4 0.4 0.45 0.5 0.4 0.95 0.1 0.9 0.9 0.9 and 𝝆 = (0.5,0.3,0.2)′ . Calculate and tabulate the frequency of each K -dimension binary XYZ,[ = patterns ( 2 G in total) and the observed pairwise log odds ratios 𝜔 :,:W _ ` (D EF AB,D EFa AB)_ ` (D EF A",D EFa A") log _ ` (D EF A",D EFa AB)_ ` (D EF AB,D EFa A") for all pairs of (𝑙, 𝑙′) if 0/0 does not occur. ( Note : fix a seed if you’ll need me to reproduce your results.) ? eEf ,G c dF 4) For ease of estimation, we reparametrize the model with { 𝑕 0: = log BIc dF } , 0AB,:AB ? eEf IB , where 𝑁 i$j is the number of classes you specify when and {𝑏 0 = log(𝜌 0 /𝜌 ? eEf )} 0AB fitting the model that could be 𝑁 " or not. Show the likelihood 𝑔(𝒁 ∣ 𝒃, 𝒉) , where 𝒁 = [ , 𝒃 = 𝑏 0 , 𝒉 = {𝑕 0: } . 𝒛 $ B 5) Assuming a Bayesian model, we need to specify prior distributions for the parameters in our latent class model. For a model with 𝑁 i$j classes, let priors 𝑕 0: ∼ 𝑂(0, 𝑤𝑏𝑠𝑗𝑏𝑜𝑑𝑓 = 9/4) , and 𝑏 0 ∼ 𝑂(0,9/4) . Write out the full-conditional distributions (densities if continuous) for: 𝑔(𝑕 0: ∣ 𝑕 I0,I: , 𝜽, 𝒁) , 𝑔(𝑏 0 ∣ {𝑏 I0 }, 𝜽) , and 𝑔(𝜃 $ ∣ 𝒃, 𝒉, 𝒁) up to proportionality constants. 6) Fit a Bayesian latent class model with three classes ( 𝑁 i$j = 𝑁 " = 3 ), using your simulated data, and the priors specified in 5). Obtain the sequence of values for each j u j u j u j j j parameter that are drawn from the posterior, 𝑞 0: , 𝜌 0 , 𝜃 $ , 𝑘 = jAj @ jAj @ jAj @ 1, … 𝑁 i$j , 𝑙 = 1, … , 𝐿 , 𝑗 = 1, … , 𝑂 , where 𝑢 " and 𝑢 B are the indices of the start and end of your sampling chain, respectively. ( Note : you may use JAGS, WinBUGS and call them from R. You must submit your code as well. ) 7) Visualize/Plot your estimated posterior distributions: 𝑔(𝑞 0: ∣ 𝒁, 𝑁 i$j = 3) , 𝑔(𝜌 0 ∣ 𝒁, 𝑁 i$j = 3) , 𝑄 𝜃 $ = 𝑘 𝒁, 𝑁 i$j = 3 , 𝑘 = 1, … , 𝑁 i$j , 𝑙 = 1, … , 𝐿, 𝑗 = 1, … , 𝑂 . ( Hint : compare the estimated posteriors with the true parameter values that were used to simulate the data 𝐸 ∗ . For the posteriors of the individual class indicators {𝜃 $ } , just randomly choose 4 individuals.) BIOSTAT 830 GRAPHICAL MODELS 1

BIOSTAT 830 GRAPHICAL MODELS Problem Set 4 – Case Study: Latent Class Models 8) At each iteration from the kept sampling chain, 𝑢 = 𝑢 " , … , 𝑢 B , simulate one data sets 𝐸 (j) ? eEf ,G , 𝝆 j ; j with 300 subjects following the latent class model with parameters, 𝑞 0: 0AB,:AB Compute the all the finite-sample-based pairwise log odds ratios from 𝐸 (j) and denote it j ,[ } . Compare the set of values {𝜔 :,:W j ,[ } to 𝜔 :,:W XYZ,[ , for each pair (𝑙, 𝑙′) . What do by {𝜔 :,: a you see? (Note: you may choose a few interesting pairs ( 𝑙, 𝑙′ ) to demonstrate what you find.) 9) Repeat 5) to 8) for 𝑁 i$j = 2, 4 . Summarize your results. (Note: you may choose a few interesting pairs ( 𝑙, 𝑙′ ) you used in 8) to demonstrate what you find.) 10) Summarize your experience with this simulation study of latent class model, e.g., what’s the statistical mechanism that gives rise to the dependence among symptoms (can refer to the DAG), or do we have evidence in the data about the true number of classes, etc. BIOSTAT 830 GRAPHICAL MODELS 1

Examples and Implementations [Bayesian approach to Latent Class - PDF document

BIOSTAT 830 GRAPHICAL MODELS Problem Set 4 Case Study: Latent Class Models Note: 1. Due 11:59PM, December 21, 2016. 2. Electronic submission to your instructors email. 3. You are VERY MUCH encouraged to form teams to discuss proofs and

Threshold Implementations Svetla Nikova Threshold Implementations A provably secure

Contracts vs. Implementations: Where? Common Eiffel Errors: Instructions for Implementations :

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

The PKIX Standards and PKI Implementations Simos Xenitellis University of London

Exploring Lightweight Implementations of Generics Bruno Oliveira University of Oxford Page 1

Hardware- -Based Implementations Based Implementations Hardware of Factoring Algorithms of

Stack Implementations Tiziana Ligorio 1 Todays Plan Stack Implementations: Array

Common Eiffel Errors: Contracts vs. Implementations EECS3311 A: Software Design Fall 2018 C HEN

Verifying Security Protocols and their Implementations Information Security and Cryptography

A Study of Erlang ETS Table Implementations and Performance Or: Judy Arrays Are Amazing Data

Verification of Implementations of Distributed Systems under Churn Ryan Doenges , James R. Wilcox,

Distributed Implementations of Adaptive Collective Decision Making Krzysztof R. Apt CWI and

of SSH Implementations Paul Fiterau, Toon Lenaerts, Erik Poll, Joeri de Ruiter, Frits Vaandrager,

Testing Qt Model-View Implementations Stephen Kelly July 2010 T esting Model-View

MPRI 2-30: Automated Verification of Cryptographic Protocol Implementations K Bhargavan

Efficient Implementations of MQPKS Peter Czypek,Stefan Heyse, Enrico Thomae on Constrained

Food Price Heterogeneity and Income Inequality in Malawi: Is Inequality Underestimated? Richard

Releasing Search Queries and Clicks Privately Arne Bayer July 24, 2017 Arne Bayer Releasing

Laplace Sanitizer Claire McKay Bowen Postdoctoral Researcher, Los Alamos National Laboratory

TDDD17 Informatjon Security Topic: Database Privacy Olaf Hartjg olaf.hartjg@liu.se

Latent Class Models for Algorithm Portfolio Methods Bryan Silverthorn and Risto Miikkulainen

Meta-Learning with Shared Amortized Variational Inference Ekaterina Iakovleva Jakob Verbeek

Invariant-equivariant representation learning for multi-class data Ilya Feige Faculty

(An example of) The Expectation-Maximization (EM) Algorithm Instructor: Sham Kakade 1 An