A Sequential Split-Conquer-Combine Approach for Gaussian Process Modeling in Computer Experiments Chengrui Li Department of Statistics and Biostatistics, Rutgers University Joint work with Ying Hung and Min-ge Xie 2017 QPRC JUNE 13, 2017 1
Outline � Introduction � A Unified Framework with Theoretical Supports � Simulation Study � Real Data Example � Summary 2
Introduction
Motivating example: Data center thermal management • A data center is an integrated facility housing multiple-unit servers, providing application services or management for data processing. • Goal : Design a data center with an efficient heat removal mechanism. • Computational Fluid Dynamics (CFD) simulation ( n = 26820, p = 9) Figure 1: Heat map for IBM T. J. Watson Data Center 3
✵ Gaussian process model • Gaussian process (GP) model: y = X β + Z ( x ) , • y : n × 1 vector of observations (e.g., room temperatures) • X : n × p design matrix • β : p × 1 unknown parameters 4
Gaussian process model • Gaussian process (GP) model: y = X β + Z ( x ) , • y : n × 1 vector of observations (e.g., room temperatures) • X : n × p design matrix • β : p × 1 unknown parameters • Z ( x ) is a GP process with mean ✵ and covariance σ 2 Σ( θ ) • Σ( θ ) : n -by- n Correlation matrix with correlation parameters θ The ij th element of Σ is defined by a power exponential function p corr ( Z ( x i ) , Z ( x j )) = exp( − θ T | x i − x j | ) = exp( − θ k | x ik − x jk | ) . � k = 1 • Remark : Assume σ is known for simplicity in this talk. 4
✵ ✵ ✵ ✵ ✵ ✵ ✵ ✵ Estimation and prediction • Likelihood inference l ( β , θ , σ ) = − 1 2 σ 2 ( y − X β ) ⊤ Σ − 1 ( θ )( y − X β ) − 1 2 log | Σ( θ ) | − n 2 log( σ 2 ) So, { l ( β | θ , σ 2 ) } = ( X ⊤ Σ − 1 ( θ ) X ) − 1 X ⊤ Σ − 1 ( θ ) y � β | θ = ❛r❣ ♠❛① β β , σ 2 = ❛r❣ ♠❛① β , σ 2 ) } θ | � � { l ( θ | � θ 5
Estimation and prediction • Likelihood inference l ( β , θ , σ ) = − 1 2 σ 2 ( y − X β ) ⊤ Σ − 1 ( θ )( y − X β ) − 1 2 log | Σ( θ ) | − n 2 log( σ 2 ) So, { l ( β | θ , σ 2 ) } = ( X ⊤ Σ − 1 ( θ ) X ) − 1 X ⊤ Σ − 1 ( θ ) y � β | θ = ❛r❣ ♠❛① β β , σ 2 = ❛r❣ ♠❛① β , σ 2 ) } θ | � � { l ( θ | � θ • GP prediction, say y ✵ , at a new point x ✵ , given parameters ( β , θ ) , follows a normal distribution with mean p ✵ ( β , θ ) and variance m ✵ ( β , θ ) , where, ✵ β + γ ( θ ) ⊤ Σ − 1 ( θ )( y − X β ) p ✵ ( β , θ ) = x ⊤ m ✵ ( β , θ ) = σ 2 ( 1 − γ ( θ ) ⊤ Σ − 1 ( θ ) γ ( θ )) , and γ ( θ ) is a n × 1 vector of i th element equals to φ ( || x i − x ✵ || ; θ ) . 5
Two challenges in GP modeling � Computational issue: • Estimation and prediction involve Σ − 1 and | Σ | with order of O ( n 3 ) : • Not feasible when n is large 6
Two challenges in GP modeling � Computational issue: • Estimation and prediction involve Σ − 1 and | Σ | with order of O ( n 3 ) : • Not feasible when n is large � Uncertainty quantification of GP predictor • Plug-in predictive distribution is widely used • It underestimates the uncertainty 6
Existing methods � For the computational issue: • Change the model to one that is computationally convenient: Rue and Held (2005), Cressie and Johannesson (2008). • Approximate the likelihood function: Stein et al. (2004), Furrer et al. (2006), Fuentes (2007), Kaufman et al. (2008). • Not focus on uncertainty quantification and bring in addition uncertainty � For uncertainty quantification of GP predictor • Bayesian predictive distribution • Bootstrap approach (Luna and Young 2003) • Intensive computation 7
Solve both problems by a unified framework? • Yes! 8
A Unified Framework
Introduction to confidence distribution (CD) Statistical inference (Parameter estimation): • Point estimate • Interval estimate • Distribution estimate Example : X 1 , . . . , X n i.i.d. follows N ( µ, 1 ) � n x n = 1 • Point estimate: ¯ i = 1 x i n x n − 1 . 96 / √ n , ¯ x n + 1 . 96 / √ n ) • Interval estimate: (¯ x n , 1 • Distribution estimate: N (¯ n ) 9
Introduction to confidence distribution (CD) Statistical inference (Parameter estimation): • Point estimate • Interval estimate • Distribution estimate Example : X 1 , . . . , X n i.i.d. follows N ( µ, 1 ) � n x n = 1 • Point estimate: ¯ i = 1 x i n x n − 1 . 96 / √ n , ¯ x n + 1 . 96 / √ n ) • Interval estimate: (¯ x n , 1 • Distribution estimate: N (¯ n ) The idea of the CD approach is to use a sample-dependent distribution (or density) function to estimate the parameter of interest. • Wide range of examples: bootstrap distribution, (normalized) likelihood function, p -value functions, fiducial distributions, some informative priors and Bayesian posteriors, among others (Xie and Singh 2013) 9
Overview: Sequential Split-Conquer-Combine D D D D Data 1 2 3 m Split and Conquer ˆ D * : Step 1: 1 1 ˆ * Step 2: D : 2 2 ˆ * Step 3: D : 3 3 ˆ * Step m : D : m m Combine ˆ ˆ ˆ ˆ m 2 3 1 ˆ c Figure 2: Sequential Split-Conquer-Combine Approach 10
Ingredients � Split the entire dataset into subsets (correlated) based on compact support correlation assumption for 1-D 11
Ingredients � Split the entire dataset into subsets (correlated) based on compact support correlation assumption for 1-D � Perform a sequential updating to create independent subsets and estimate on each updated subsets 11
Ingredients � Split the entire dataset into subsets (correlated) based on compact support correlation assumption for 1-D � Perform a sequential updating to create independent subsets and estimate on each updated subsets � Combine estimators 11
Ingredients � Split the entire dataset into subsets (correlated) based on compact support correlation assumption for 1-D � Perform a sequential updating to create independent subsets and estimate on each updated subsets � Combine estimators � Quantify prediction uncertainty 11
Split � Split the entire dataset into subsets y = { y a } , a = 1 , ..., m . Denote the size of y a by n a , i.e. � n a = n . • Assumption: compactly supported correlation O O Σ 11 Σ 12 · · · ... O Σ 21 Σ 22 · · · . . ... ... ... . . Σ t = . . , ... O · · · Σ ( m − 1 )( m − 1 ) Σ ( m − 1 ) m O O · · · Σ m ( m − 1 ) Σ mm n × n (after index sorting according to X 1 values) 12
Sequentially update data � Transform y to y ∗ by sequentially updating: y ∗ a = y a − L a ( a − 1 ) y ∗ a − 1 , where L ( a + 1 ) a = Σ t ( a + 1 ) a D − 1 a , D a = Σ aa − L a ( a − 1 ) D ( a − 1 ) L ⊤ a ( a − 1 ) . • Sequential updates are computationally efficient . • The updated block y ∗ a ’s are independent. 13
Estimation from each subset Given θ , we have • MLE of the a th subset: l ( a ) a D − 1 a C a ) − 1 C ⊤ a D − 1 � t ( β | θ ) = ( C ⊤ a y ∗ β a = ❛r❣ ♠❛① a . β | θ • An individual CD for the a th updated subset is (cf., Xie and Singh 2013): N p ( � β a , Cov ( � β a )) . Given β , we have θ a = ❛r❣ ♠❛① θ l ( a ) • MLE of the a th subset: � t ( θ | β ) . • Given β , an individual CD for the a th updated subset is N ( � θ a , Cov ( � θ a )) . Significant computational reduction because D a is much smaller than the original covariance matrix. 14
CD combining • Following Singh, Xie and Strawderman (2005), Liu, Liu and Xie (2014) and Yang et al. (2014), a combined CD is N p ( β c , S c ) , where β c = ( � W a ) − 1 ( � W a � a C a ) − 1 and β a ) with W a = ( C ⊤ a D − 1 � S c = Cov ( � β c ) . • Similar framework can be applied to all the parameters ( β , θ ) . 15
CD combining • Following Singh, Xie and Strawderman (2005), Liu, Liu and Xie (2014) and Yang et al. (2014), a combined CD is N p ( β c , S c ) , where β c = ( � W a ) − 1 ( � W a � a C a ) − 1 and β a ) with W a = ( C ⊤ a D − 1 � S c = Cov ( � β c ) . • Similar framework can be applied to all the parameters ( β , θ ) . Theorem 1 Under some regularity assumptions, when τ > O p ( n 1 / 2 ) and n → ∞ , the SSCC estimator � θ c ) is asymptotically as efficient as MLE λ c = ( � β c , � θ mle ) . � λ mle = ( � β mle , � 15
GP predictive distribution • GP predictor at a new point x ✵ , given parameters ( β , θ ) , follows a normal distribution with mean p ✵ ( β , θ ) and variance m ✵ ( β , θ ) , where, ✵ β + γ ( θ ) ⊤ Σ − 1 ( θ )( y − X β ) p ✵ ( β , θ ) = x ⊤ m ✵ ( β , θ ) = σ 2 ( 1 − γ ( θ ) ⊤ Σ − 1 ( θ ) γ ( θ )) , and γ ( θ ) is a n × 1 vector of i th element equals to φ ( || x i − x ✵ || ; θ ) . 16
Recommend
More recommend