Bottom-up Estimation and Top-down Prediction for Multi-level Models: Solar Energy Prediction Combining Information from Multiple Sources Jae-Kwang Kim Department of Statistics, Iowa State University Ross-Royall Symposium: Johns Hopkins University Feb 26, 2016 1/37
Collaborators ◮ Youngdeok Hwang (IBM Research) ◮ Siyuan Lu (IBM Research) 2/37
Outline ◮ Introduction ◮ Modeling approach ◮ Application: Solar Energy Prediction ◮ Conclusion Overview 3/37
Mountain Climbing for Problem Solving! Math Problem Math Solution Stat Problem Stat Solution Real Problem Real Solution We need a map (abstraction) to move from problem to solution! Overview 4/37
Real Problem: Solar Energy Prediction ◮ Solar electricity is now projected to supply 14% of total demand of contiguous U.S. by 2030, and 27% by 2050. Introduction 5/37
IBM Solar Forecasting Figure : Sky Camera for short-term forecasting (located at Watson) ◮ Research program funded the by the U.S. Department of Energy’s SunShot Initiative. Introduction 6/37
Monitoring Network ◮ Global Horizontal Irradiance (GHI) : The total amount of shortwave radiation received from above by a horizontal surface. ◮ GHI Measurements are being collected every 15 minutes from 1,528 sensor units. Introduction 7/37
Weather Models ◮ Prediction of GHI from widely-used weather models North American Mesoscale Forecast System ( NAM ) and Short-Range Ensemble Forecast ( SREF ). ◮ We want to combine GHI measurements with the weather model outcomes to obtain the solar energy prediction. Introduction 8/37
Statistical Model: Basic setup ◮ Population is divided into H exhaustive and non-overlapping groups, where group h has n h units, for h = 1 , . . . , H . ◮ For group h , n h units are selected for measurement. ◮ From the i -th unit of group h , the measurements and its associated covariates, ( y hij , x hij ) , are available for j = 1 , . . . , n hi . Model 9/37
Multi-level Model ◮ Consider level one and level two model, y hi ∼ f 1 ( y hi | x hi ; θ hi ) , θ hi ∼ f 2 ( θ hi | z hi ; ζ h ) , ◮ y hi = ( y hi 1 , . . . , y hin hi ) ⊤ : observations at unit ( hi ) . ◮ x hi = ( x ⊤ hi 1 , . . . , x ⊤ hin hi ) ⊤ : covariates associated with unit ( hi ) (=two weather model outcomes). ◮ z hi : unit-specific covariate. ◮ Note that θ hi is a parameter in level 1 model, but a random variable (latent variable) in level 2 model. ◮ We can build a level 3 model on ζ h if necessary. ζ h ∼ f 3 ( ζ h | q h ; α ) . Model 10/37
Data Structure Under Two-level Model ζ h f 2 f 2 f 2 θ h 1 θ h 2 θ h 3 f 1 f 1 f 1 y h 11 y h 21 y h 31 . . . . . . . . . y h 1 n 1 y h 2 n 2 y h 3 n 3 Model 11/37
Why Multi-level Models? 1. To reflect the reality: To allow for structural heterogeneity (=variety in big data) across areas. 2. To borrow strength: we need to predict the locations with no direct measurement. Model 12/37
Real Problems Become Statistical Problems! 1. Parameter estimation 2. Prediction 3. Uncertainty quantification Bayesian method using MCMC computation is a useful tool. Model 13/37
Classical Solutions Do Not Necessarily Work in Reality! 1. No single data file exists, as they are stored in cloud (Hadoop Distributed File System). 2. Micro-level data is not always available to the analyst for confidentiality and security reasons. 3. Classical solution, based on MCMC algorithm, is time consuming and the computational cost can be huge for big data. This is a typical big data problem. Solution 14/37
New Solution: Divide-and-Conquer Approach ◮ Three steps for parameter estimation in each level 1. Summarization: Find a summary (=measurement) for latent variable to obtain the sampling error model. 2. Combine: Combine the sampling error model and the latent variable model. 3. Learning: Estimate the parameters from the summary data. ◮ Apply the three steps in level two model and then do these in level three model. Solution 15/37
Modeling Structure Site 1 individual Storage Sensor Level 1 data Unit summary Site 2 Group Storage Sensor Level 1 Level 2 Summary Site 3 Storage Level 1 Sensor Solution 16/37
Summarization ◮ Find a measurement for θ hi . ◮ For each unit, treat ( x hi , y hi ) as a single data set to obtain the best estimator ˆ θ hi of θ hi by treating θ hi as a fixed parameter. ◮ Obtain the sampling distribution of ˆ θ hi as a function of θ hi , θ hi ∼ g 1 (ˆ ˆ θ hi | θ hi ) . Solution 17/37
Summarization Step under Two-Level Model Structure ζ h f 2 f 2 f 2 θ h 3 θ h 1 θ h 2 g 1 g 1 g 1 ˆ ˆ ˆ θ h 1 θ h 2 θ h 3 θ hi ∼ N ( θ hi , ˆ g 1 (ˆ θ hi | θ hi ) : Sampling error model, ˆ V (ˆ θ hi )) . Solution 18/37
Combining ◮ The marginal distribution of ˆ θ hi is � m 2 (ˆ g 1 (ˆ θ hi | z hi ; ζ h ) = θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) d θ hi . (1) which is combining g 1 (ˆ θ hi | θ hi ) and f 2 ( θ hi | z hi ; ζ h ) via latent variable θ hi . ◮ Also, the prediction model for the latent variable θ hi is obtained by using Bayes theorem: g 1 (ˆ θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) p 2 ( θ hi | ˆ θ hi ; ζ h ) = (2) g 1 (ˆ � θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) d θ hi Solution 19/37
Combining Step p 2 θ hi ζ h f 2 p 2 g 1 m 2 ˆ θ hi Sampling error model ( g 1 )+ Latent variable model ( f 2 ) ⇒ Marginal model ( m 2 ) , Prediction model ( p 2 ) Solution 20/37
Learning ◮ Level two model can be learned by EM algorithm: at t -th iteration, we update ζ h by solving n h � ζ ( t + 1 ) � ζ ( t ) � ˆ � � ˆ θ hi ; ˆ ← arg max log f 2 ( θ hi | z hi ; ζ h ) E p 2 � h h ζ h i = 1 where the conditional expectation is taken with respect to ζ ( t ) ζ ( t ) the prediction model p 2 in (2) evaluated at ˆ h , and ˆ h denotes the t -th iteration of the EM algorithm. Solution 21/37
Learning Using EM Algorithm E-step ˆ θ hi ζ h M-step ˆ Z hi θ hi Solution 22/37
Bayesian Interpretation ◮ Prediction model (2) can be written as p 2 ( θ hi | ˆ g 1 (ˆ θ hi ; ζ h ) ∝ θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) . ◮ Here, f 2 ( θ hi | z hi ; ζ h ) can be treated as a prior distribution and p 2 ( θ hi | ˆ θ hi ; ζ h ) is a posterior distribution that incorporates the observation of ˆ θ hi . ◮ Use of g 1 (ˆ θ hi | θ hi ) instead of full likelihood simplifies the computation. (Approximate Bayesian Computation). Solution 23/37
Extension to Three Level Model Model Measurement Parameter Latent variable (Data summary) Level 1 y hi = ( y hi 1 , · · · , y hin ) θ hi θ h = (ˆ ˆ θ h 1 , · · · , ˆ θ hn h ) θ = ( θ h 1 , · · · , θ hn h ) Level 2 ζ h ζ = (ˆ ˆ ζ 1 , · · · , ˆ Level 3 ζ H ) ζ = ( ζ 1 , · · · , ζ H ) α We can apply the same three steps to the level three model. Solution 24/37
Bottom-up Estimation Latent Variable Sampling Error Level Parameter Estimation Model Model ζ h ∼ g 2 (ˆ ˆ � H g 2 (ˆ f 3 ( ζ h | q h ; α ) � 3 ζ h | ζ h ) α = arg max α ˆ h = 1 log ζ h | ζ h ) f 3 ( ζ h | q h ; α ) d ζ h θ hi ∼ g 1 (ˆ ˆ ˆ � n h g 1 (ˆ f 2 ( θ hi | z hi ; ζ h ) � 2 θ hi | θ hi ) ζ h = arg max ζ h i = 1 log θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) d θ hi ˆ � n hi f 1 ( y hij | x hij ; θ hi ) θ hi = arg max θ hi j = 1 log f 1 ( y hij | x hij ; θ hi ) 1 Figure : An illustration of the Bottom-up approach to parameter estimation Solution 25/37
Prediction ◮ Our goal is to predict unobserved y hij values from the above models using the parameter estimates. ◮ The best prediction for y hij is � � � � E f 1 ( y hij | x hij , θ hi ) | ˆ | ˆ ˆ y ∗ hij = E p 3 θ hi ; ζ h ζ h ; ˆ E p 2 α , where g 2 (ˆ ζ h | ζ h ) f 3 ( ζ h | q h ; ˆ α ) p 3 ( ζ h | ˆ ζ h , ˆ α ) = g 2 (ˆ � ζ h | ζ h ) f 3 ( ζ h | q h ; ˆ α ) d ζ h and g 1 (ˆ θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) p 2 ( θ hi | ˆ θ hi , ζ h ) = . g 1 (ˆ � θ hi | θ hi ) f 2 ( θ hi | z hi ; ζ h ) d θ hi ◮ The prediction is made in a top-down manner. Solution 26/37
Prediction: Top-down Prediction α ˆ p 3 p 3 p 3 ζ ∗ ζ ∗ ζ ∗ 1 2 3 p 2 p 2 p 2 θ ∗ θ ∗ θ ∗ 1 i 2 i 3 i Predict y hij using f 1 ( y hij | x hij ; θ ∗ hi ) . Solution 27/37
Prediction: Top-down Prediction Level Latent Prediction Model Best Prediction p 3 ( ζ h | ˆ h ∼ p 3 ( ζ h | ˆ ζ ∗ 3 ζ h ζ h ; ˆ α ) ζ h ; ˆ α ) p 2 ( θ hi | ˆ hi ∼ p 2 ( θ hi | ˆ θ ∗ θ hi ; ζ ∗ 2 θ hi θ hi ; ζ h ) h ) y hij y ∗ hij ∼ f 1 ( y hij | x hij , θ ∗ hi ) f 1 ( y hij | x hij ; θ hi ) 1 Figure : Top-down approach to prediction Solution 28/37
Case study: Application to Solar Energy Prediction ◮ We use 15-day long (12/01/2014 – 12/15/2014) data for analysis. ◮ Organized the states into 12 groups. ◮ The number of sites in each group, m h , varies between 37 and 321. Application 29/37
Grouping Scheme ◮ Pooling data from nearby sites. ◮ Can incorporate complex structure such as distribution zone. Application 30/37
Recommend
More recommend