A model selection algorithm for mixture experiments including process variables Hugo Maruri and Eva Riccomagno Department of Statistics, London School of Economics and Dipartimento di Matematica, Universit` a di Genova mODa 8, Almagro, Spain June 2, 2007 marurimoda8.tex 1
Abstract Experiments with mixture and process variables are often constructed as the cross product of a mixture and a factorial design. Often it is not possible to implement all the runs of the cross product design, or the cross product model is too large to be of practical interest. We propose a methodology to select a model with a given number of terms and minimal condition number. The search methodology is based on weighted term orderings and can be extended to consider other statistical criteria. June 2, 2007 marurimoda8.tex 2
Contents of the talk 1. Mixture experiments with process variables and their models 2. Homogeneous supports for mixtures 3. An algorithm for model selection 4. Examples 5. Conclusions June 2, 2007 marurimoda8.tex 3
Mixture experiments with process variables • The response is assumed to depend on other factors apart from the mixture components (Cornell, 2002). • Mixture factors are x = ( x 1 , . . . , x k ) and process variables z = ( z 1 , . . . , z q ) . • For instance, the one of z could be the amount of material used and hence the name mixture-amount experiments. • The design is a finite set of points D ⊂ R k + q . The projection of D over the x -space is D x and D z is the projection over the z -space. • Often D x is a simplex centroid or simplex lattice design, while D z is a full factorial design. June 2, 2007 marurimoda8.tex 4
Example 1: The bread data set (Næs et al., 1998) • Three types of wheat flour ( x 1 , x 2 , x 3 ) and two process factors ( z 1 , z 2 ) . • Response: Loaf volume after baking. • D x a simplex lattice design { 3 , 3 } and D z a 3 2 factorial. Design D = D x × D z with 90 runs. z 2 x 1 z 1 x 2 x 3 D z D x D x × D z June 2, 2007 marurimoda8.tex 5
Models for the combined effect of the factors (Prescott, 2004) Additive regression model y ( x, z ) = f ( x ) + g ( z ) + ε. (1) Complete cross product model y ( x, z ) = f ( x ) g ( z ) + ε (2) Intermediate models q k � � y ( x, z ) = f ( x ) + g ( z ) + f ij ( x i , z j ) + ε. (3) i =1 j =1 Often f is taken to be a Scheff´ e quadratic or cubic polynomial model, in a relevant parametrization, and g is a quadratic or cubic model. June 2, 2007 marurimoda8.tex 6
Models for the combined effect of the factors 2 A mixture amount model of the form y ( x, m ) = f 0 ( x ) + mf 1 ( x ) + . . . + m p f p ( x ) + ε is suggested in (Cornell, 2002) with γ ( p ) γ ( p ) γ ( p ) � � � f p ( x ) = i x i + ij x i x j + . . . + 1 ,...,l x i 1 . . . x i l , i i<j i 1 <...<i l p is a positive integer, l ≤ q and the γ ( p ) are regression parameters. ε are assumed i.i.d. errors. Proposal : Search for a submodel of the complete cross product using • hierarchy (divisibility) condition • a statistical criterion (minimal condition number) June 2, 2007 marurimoda8.tex 7
Homogeneous models in mixtures with CCA (Maruri et al., (2006)). See every d ∈ D in P k − 1 ( R ) , i.e. C d = { αd } . 1) We construct the homogeneous ideal I ( C D ) . 2) Given a term order τ we use GB-driven CoCoA code to obtain a model of degree s . C D D x 2 × x 1 LT s Example 2 Simplex lattice { 3 , 2 } . A model with s = 2 for any τ is { x 2 1 , x 2 2 , x 2 3 , x 1 x 2 , x 1 x 3 , x 2 x 3 } . ⇒ Linear relation with K-models (Draper 1998) and S-models (Scheff` e, 1958). Design (Cone) Ideal: all polynomials that vanish on the design (cone). June 2, 2007 marurimoda8.tex 8
An algorithm for model selection Cross product support Consider a product design D = D x × D z with no replicated runs. Let E x = { x α : α ∈ L x } and E z = { z α : α ∈ L z } be sets of linearly independent monomials in R [ x 1 , . . . , x k ] /I ( D x ) and R [ z 1 , . . . , z q ] /I ( D z ) , respectively. Let E x ⊗ E z be the Kronecker product of E x and E z . Then E x ⊗ E z is a set of linearly independent monomials in R [ x, z ] /I ( D ) . Moreover if also D z and D x have no replicated points, then it is a R -vector space basis and it has dimension n x n z where n i is the number of points in D i , i = z, x . • Tipically E x and E z have a simple structure derived from the designs D x and D z June 2, 2007 marurimoda8.tex 9
An algorithm for model selection 2 Minimal condition number The condition number is defined as λ = λ max (4) λ min where λ max and λ min ≥ 0 are the maximum and minimum eigenvalues of the information matrix X T L X L and X L is the design-model matrix for the model L . • Large values of λ indicate X T L X L close to singular, i.e. λ min ≈ 0 . • Small condition number λ indicates more stability in the least square estimates and smaller variance inlation factor then big condition numbers. • Useful when searching among homogeneous models as it favours Kronecker models, which are conjectured robust to miss-specification of information matrix in mixtures (Prescott et al ., 2002). June 2, 2007 marurimoda8.tex 10
An algorithm for model selection 3 Algorithm Input A fraction F ⊆ D x × D z ; D x and D z and supports E x and E z ; n = number of final terms. For identifiability, n ≤ # F must hold. Output A submodel L 0 with minimal condition number λ 0 , formed with the smallest terms of E x × E z wrt a weighted order. Technique Generate candidate submodels by ordering E x × E z (complete cross product) with weight vectors w ∈ W + , and look for the candidate with smallest condition number. • The search is driven by a finite set of weights W + , i.e. it ends. • The model L 0 respects a hierarchical structure. • Use of arbitrary supports E x and E z . • The Algorithm is of order O (( n x n z ) 2( qk − 1) n 2 ) = poly ( n x n z ) . June 2, 2007 marurimoda8.tex 11
Example 3: Mixture amount design Factors x = ( x 1 , x 2 ) , z = ( m ) listed as ( x 1 , x 2 , m ) . x 1 x 2 m x 2 1 x 1 x 2 x 2 2 mx 2 1 mx 1 x 2 mx 2 2 0 1 1 0 0 1 0 0 1 0 2 2 0 0 4 0 0 8 1 1 2 1 1 1 2 2 2 2 0 2 4 0 0 8 0 0 We have E x = { x 2 1 , x 1 x 2 , x 2 2 } and E z = { 1 , m } . The algorithm returns the support for a mixture amount model x 2 1 , x 2 2 , x 1 x 2 , mx 2 � � L 0 = for w = (1 , 2 , 3) . 2 • Set of representatives W + can be expensive to compute, use of approximate set ˜ W + , simulated over the ( q + k − 1) -simplex. June 2, 2007 marurimoda8.tex 12
Example 1 (cont.): Bread data set Analysis in (Prescott, 2004). • Final model with 15 terms, R 2 = 0 . 998 , ˆ σ = 21 . 04 . • Condition number λ = 86 . 83 . • Fitted model Y = x 1 (522 . 8 + 13 . 0 z 1 + 56 . 3 z 2 − 39 . 4 z 2 ˆ 1 − 10 . 2 z 2 2 ) + x 2 (448 . 1 + 1 . 7 z 1 + 37 . 2 z 2 + 3 . 7 z 2 1 − 28 . 4 z 2 2 ) + x 3 (599 . 3 + 54 . 3 z 1 + 73 . 8 z 2 − 46 . 0 z 2 1 + 1 . 0 z 2 2 ) i.e. a mixture of predictive models for every type of flour. • Symmetric support June 2, 2007 marurimoda8.tex 13
Example 1 (cont.): Bread data set (Using Algorithm). Factors listed as ( x 1 , x 2 , x 3 , z 1 , z 2 ) . e model E x = { x 1 , x 2 , x 3 , x 2 1 , x 2 2 , x 2 • Scheff` 3 , x 1 x 2 , x 1 x 3 , x 2 x 3 } and full product model E z = { 1 , z 1 , z 2 , z 2 1 , z 1 z 2 , z 2 2 } . • Model with λ 0 = 47 . 47 and support L 0 = { x 1 , x 2 , x 3 } ⊗ { 1 , z 1 , z 2 } ∪ { x 2 , x 3 } ⊗ { z 2 1 , z 1 z 2 , z 2 2 } for w = (17 , 12 , 10 , 3 , 2) ∈ ˜ W . σ = 22 . 7 and R 2 = 0 . 998 : • Fitted model with ˆ ˆ Y = x 1 (489 . 7 + 13 . 0 z 1 + 56 . 3 z 2 ) + x 2 (467 . 9 + 1 . 7 z 1 + 37 . 1 z 2 ) + x 3 (619 . 1 + 54 . 2 z 1 + 73 . 8 z 2 ) + x 2 ( − 19 . 9 z 2 1 + 3 . 6 z 1 z 2 − 34 . 6 z 2 2 ) + x 3 ( − 69 . 6 z 2 1 + 13 . 3 z 1 z 2 − 5 . 1 z 2 2 ) • Slight asymmetry allows for reduction in condition number. June 2, 2007 marurimoda8.tex 14
Final comments • The Algorithm blends the change of basis (Faug` ere et al ., 1993) with a statistical criterion. • The search space of the Algorithm presented is much smaller than a full search. • It can be adapted to consider other criterion or even composite criteria. For example it could be used for hierarchical model selection (Peixoto, 1987) (Bates et al ., 2003). • Expensive computation of set of weights W , but approximate set ˜ W allows fast search. Stopping rule still empirical. • A possible drawback is the potential exclusion of symmetric models. This is inherent by the use of term orders ( w -order), e.g. there is no term order such that x 2 1 ≻ x 2 2 ≻ x 1 x 2 . June 2, 2007 marurimoda8.tex 15
References Bates et al . (2003). Technometrics 45 ,246-255. Cornell (2002). Experiments with mixtures . Draper, Pukelsheim (1998). JSPI 71 (1-2),303-311. Faug` ere et al . (1993). Jour. Symb. Comp. 16 (4),329-344. Maruri, Notari, Riccomagno (2006). Statistica Sinica (in print). Næs et al . (1998). Chem. Int. Lab. Syst. 41 , 221-235. Peixoto (1987). Am. Stat. 41 (4),311-313. Prescott et al . (2002) Technometrics 44 (3),260-268. Prescott (2004). Qual. Tech. & Qual. Manag. 1 (1), 87-103. Scheff` e (1958). JRSS B 20 ,344-360. June 2, 2007 marurimoda8.tex 16
Recommend
More recommend