Assessing Proximal and Lagged Moderated Effects in Mobile Health Joint Statistical Meetings Health Policy Statistics Section (Invited) Chicago, IL August 2, 2016 Audrey Boruvka, 1 Daniel Almirall, 1 Katie Witkiewitz, 2 and Susan A. Murphy 1 1 University of Michigan and 2 University of New Mexico
Outline 1. Three examples: BASICS Mobile, HeartSteps and Sense2Stop 2. What does data from a micro-randomized trial look like? 3. Proximal and lagged moderated effects 4. Estimating the proximal and lagged moderated effects 5. Simulation experiments 6. A data example using BASICS Mobile 1 / 29
BASICS-Mobile Example (College Drinking) PI: Katie Witkiewitz Smartphone-based intervention to curb heavy drinking and smoking in college students Data Collected EMA up to 3x/day (morning, aftern., eve) Intervention Frequency Up to 2x/day (afternoon, evening) Intervention Content Mindfulness-based message vs general health information (binary treatment) Intervention Availability Based on answering an EMA Typical Question Is the effect of providing a mindfulness-based intervention (vs GHI) on subsequent smoking rate moderated by increase in need to self-regulate? 2 / 29
HeartSteps Example (Physical Activity) PI: Pedja Klasjna Wearable activity-tracker + smartphone-based intervention to encourage physical activity Data Collected "Continuously" + EMA each evening Intervention Frequency Up to 5x/day (before work, lunch, 2pm, after work, eve) Intervention Content Delivers vs does not deliver (binary treatment) contextually relevant activity suggestion via the smartphone Intervention Availability Not in vehicle, not exercising, not "snooze" the app, phone on Typical Question Does time-of-day or the busyness influence the effect of suggesting an activity on step count? 3 / 29
Sense2Stop Example (Smoking Cessation) PI: Bonnie Spring Wearable chest-strap + wrist-band + smartphone-based intervention to sense stress and reduce smoking Data Collected "Continuously" + EMA Intervention Frequency 3x/day on average; with 50% chance of happening when stressed and 50% chance of happening when not stressed Intervention Content Deliver or not deliver prompt (binary treatment) via smartphone to use one of 3 stress-management apps Intervention Availability Not in vehicle, ≥ 60min since intervention, ≥ 10min since EMA, cannot have uncertain stress classification, phone on Typical Question Will delivering the message be more effective than not delivering the message in times of stress? In times of no stress? Or equally effective in either? 4 / 29
Data from a Micro-randomized Trial t treatment occasion X t individual and contextual characteristics at t A t binary treatment at t Y t + 1 continuous response following t and before t + 1 H t history through t : ( ¯ X t , ¯ Y t , ¯ A t − 1 ) Data in temporal order looks like this X 1 , A 1 , Y 2 , . . . , X t , A t , Y t + 1 , . . . , X T , A T , Y T + 1 ←←←←←← H t , A t , Y t + 1 , . . . ρ t ( 1 | H t ) is known randomization probability P ( A t = 1 | H t ) that generates A t 5 / 29
Example Data Structure BASICS Mobile X t − 1 Y t + 1 X t A t − 1 A t . . . . . . t − 1 t Morning Afternoon Evening Morning 6 / 29
Proximal moderated effect A t on Y t + 1 Y t + 1 (¯ a t ) response, had the treatments ¯ a t been provided S 1 t (¯ a t − 1 ) vector of candidate moderators from the history through t , H t , had the treatments ¯ a t − 1 been provided The proximal treatment effect is Y t + 1 ( ¯ A t − 1 , 1 ) − Y t + 1 ( ¯ A t − 1 , 0 ) | S 1 t ( ¯ � � E A t − 1 ) . You can think of S 1 t (¯ a t − 1 ) as a "State" of particular interest 7 / 29
Proximal moderated effect A t on Y t + 1 A proximal treatment effect is Y t + 1 ( ¯ A t − 1 , 1 ) − Y t + 1 ( ¯ A t − 1 , 0 ) | S 1 t ( ¯ � � E A t − 1 ) . S 1 t ( ¯ a t − 1 ) is low-dimensional, pre-selected by scientist. It can be the "empty set". It can include time trends. Proximal effect is averaged over any variables in H t not represented in S 1 t . The definition depends on distribution of (past) treatments in the data. 8 / 29
Lagged moderated effect A t on Y t + 2 A lagged treatment effect is Y t + 2 ( ¯ A t − 1 , 1 , A a t = 1 t + 1 ) − Y t + 2 ( ¯ A t − 1 , 0 , A a t = 0 t + 1 ) | S 2 t ( ¯ � � E A t − 1 ) . t + 1 = A t + 1 ( ¯ A a t = a A t − 1 , a ) S 2 t (¯ a t − 1 ) is again a low-dimensional, pre-selected by scientist Delayed effect is averaged over any variables in H t not represented in S kt Delayed effect is averaged over future treatment A a t t + 1 . Here, lag = 2. 9 / 29
General case: Lag k treatment effects Y t + k ( ¯ � A t − 1 , 1 , A a t = 1 t + 1 , . . . , A a t = 1 � E t + k − 1 ) Y t + k ( ¯ t + k − 1 ) | S kt ( ¯ A t − 1 , 0 , A a t = 0 t + 1 , . . . , A a t = 0 � � − E A t − 1 ) . where t + 1 denotes A t + 1 ( ¯ A a t = a A t − 1 , a ) , t + 2 denotes A t + 2 ( ¯ A t − 1 , a , A t + 1 ( ¯ A a t = a A t − 1 , a )) , and so on. S kt (¯ a t − 1 ) is again a low-dimensional, pre-selected by scientist for examining the lag k effect 10 / 29
Identification (Effects in terms of observed data) Under sequential randomization, consistency and positivity assumptions The proximal treatment effect is Y t + 1 ( ¯ A t − 1 , 1 ) − Y t + 1 ( ¯ A t − 1 , 0 ) | S 1 t ( ¯ � � E A t − 1 ) = E [ E [ Y t + 1 | A t = 1 , H t ] − E [ Y t + 1 | A t = 0 , H t ] | S 1 t ] � � I ( A t = 1 ) Y t + 1 − I ( A t = 0 ) Y t + 1 � � = E � S 1 t , � ρ t ( 1 | H t ) 1 − ρ t ( 1 | H t ) where ρ t ( 1 | H t ) = Pr ( A t = 1 | H t ) is the probabilities used to randomize sequentially. Lagged treatment effects can be identified similarly. 11 / 29
The Notion of Availability Not all individuals are available for treatment at all time points (e.g., Wang et al. 2012; Robins 2004). For simplicity, we define this in terms of the observed data. E [ E [ Y t + k | A t = 1 , I t = 1 , H t ] | I t = 1 , S kt ] − E [ E [ Y t + k | A t = 0 , I t = 1 , H t ] | I t = 1 , S kt ] � 1 ( A t = 1 ) Y t + 1 − 1 ( A t = 0 ) Y t + 1 � � = E � I t = 1 , S kt , � ρ t ( 1 | H t ) 1 − ρ t ( 1 | H t ) Note that I t = 1 is not a static subpopulation; and we expect prior treatment to effect it. 12 / 29
Modeling assumptions We consider linear models for each lag k effect E [ E [ Y t + 1 | A t = 1 , I t = 1 , H t ] | I t = 1 , S 1 t ] − E [ E [ Y t + 1 | A t = 0 , I t = 1 , H t ] | I t = 1 , S 1 t ] = f kt ( S kt ) ⊺ β k . Recall k = 1 is the proximal effect. These models do not constrain each other across k (Robins, Rotnitzky and Scharfstein 2000, Theorem 8.6). We assume these treatment effect models are correct. 13 / 29
Estimation What would we like in an estimator? Recall H t is high-dimensional (especially in mobile health!) Our goal was to develop a Ease of use Familiar and easy-to-use estimation method that allows the scientist to Parsimony Examine proximal or lagged effects of A t conditional on any S tk , a low-dim subset of H t Efficiency While incorporating working knowledge about the association of H t and Y t + k for statistical power Robustness Yet not requiring this working knowledge to be correct–which can be difficult or impossible! 14 / 29
Weighted & Centered least squares W t ∼ g kt ( H t ) + ˜ Fix k . Just think Y t + k A t f k ( t , S kt ) , where ˜ A t = A t − ˜ p t ( 1 | S kt ) , centered treatment � ( 1 − A t ) � A t � � p t ( 1 | S kt ) ˜ 1 − ˜ p t ( 1 | S kt ) W t = . ρ t ( 1 | H t ) 1 − ρ t ( 1 | H t ) g kt ( H t ) ⊺ α k is a working model for E [ W t Y t + k | H t ] . Formally, solve for ( α k , β k ) in 0 = P n U W ( α k , β k ) , where U W = T − k + 1 � � g kt ( H t ) � � � Y t + k − g kt ( H t ) ⊺ α k − ˜ A t f k ( t , S kt ) ⊺ β k W t ˜ A t f k ( t , S kt ) t = 1 15 / 29
Weighted & Cenered least squares vs Usual GEE Our proposed weighted and centered estimating function T − k + 1 � � g kt ( H t ) � � � Y t + k − g kt ( H t ) ⊺ α k − ˜ A t f k ( t , S kt ) ⊺ β k W t ˜ A t f k ( t , S kt ) t = 1 versus standard, traditional GEE for longitudinal data analyses T − k + 1 � � g kt ( H t ) � ( Y t + k − g kt ( H t ) ⊺ α k − A t f k ( t , S kt ) ⊺ β k ) A t f k ( t , S kt ) t = 1 but this requires E [ W t Y t + k | H t ] = g kt ( H t ) ⊺ α k + A t f k ( t , S kt ) ⊺ β k ; we don’t! 16 / 29
Implementation is easy Estimation can be implemented with standard GEE software. Availability? Just replace W t with I t W t . Only the independence working correlation structure may be employed. Alternative structures induce bias. Extra code (available in R) is needed for SEs with estimated (i) numerator or (ii) denominator of the weights. 17 / 29
Simulation Experiment Omitting an underlying moderator variable induces bias in standard GEE but not in our proposed Weighting and Centering Estimator. 18 / 29
Recommend
More recommend