Georgia Institute of Technology, Atlanta GA, USA COMPUTATIONAL ASPECTS OF SELECTION OF EXPERIMENTS Yining Wang Machine Learning Department, Carnegie Mellon University Arxiv:1711.05174 Joint work with Zeyuan Allen-Zhu, Yuanzhi Li and Aarti Singh
MOTIVATING APPLICATION Worst-case structural analysis - Maximum stress resulting from worst-case external forces - Example application: lightweight structural design in automated fiber process
MOTIVATING APPLICATION Worst-case structural analysis - Challenges : Finite Element Analysis (FEA) for every external force locations would be computationally too expensive Justification for single, normal, compressive load can be found in Ulu et al.’17, based on Rockafellar’s Theorem
MOTIVATING APPLICATION Worst-case structural analysis - Idea : Sample a few “representative” force locations and build a predictive model for the rest locations - Challenge : How to determine the “best” representative locations ~4000 nodes 200 nodes
PROBLEM FORMULATION max. stress response y i = h x i , θ 0 i + ε i modeling error Linear regression model: top e-vec of surface Laplacian unknown regression model Experiment selection: dimension p Force location 1 x 1 dimension p Force location 2 x 2 y 1 selected location 1 selected location 2 y 2 X S ∈ R k × p X ∈ R n × p selected location k y k ~4000 nodes 200 nodes Force location n x n
PROBLEM FORMULATION y i = h x i , θ 0 i + ε i Linear regression model: θ = ( P i ) � 1 ( P b i 2 S x i x > Ordinary Least Squares: i 2 S y i x i ) → N (0 , ( P √ n ( b d i 2 S x i x > i ) � 1 ) θ − θ 0 ) - By CLT: (scaled) sample covariance, Fisher’s Information Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j “optimality criteria”
PROBLEM FORMULATION y i = h x i , θ 0 i + ε i Predictive model: Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j “optimality criteria” MSE E k ˆ f A ( Σ ) = tr( Σ − 1 ) /p θ � θ 0 k 2 Example: A -optimality 2 f D ( Σ ) = det( Σ ) − 1 /p “scale invariant” D -optimality E -optimality f E ( Σ ) = 1 / k Σ − 1 k op V-optimality ….
PROBLEM FORMULATION y i = h x i , θ 0 i + ε i Predictive model: Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j Objective: efficient approximation algorithms ⇣P ⌘ ⇣P ⌘ S x j x > j 2 S x j x > ≤ C ( n, p ) · min | S | k f f j 2 b j j “approximation ratio”
EXISTING RESULTS Existing positive results - O (1) approximation for D-optimality ( Nikolov & Singh, STOC’15) - O(n/k) approximation for A-optimality ( Avron & Boutsidis, SIMAX’13) Existing negative results - NP-Hard for exact optimization of D/E-optimality ( Summa et al., SODA ’15) - NP-Hard for (1+ 𝜁 ) approximation for D-optimality when k=p ( Cerny & Hladik, Comput. Optim. Appl.’12) Applicable to only one or two criteria f
REGULAR CRITERIA Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j “Regular” criteria: (A1) Convexity : f (or its surrogate) is convex; (A2) Monotonicity : A � B = ) f ( A ) � f ( B ) f ( tA ) = t − 1 f ( A ) (A3) Reciprocal linearity : All popular optimality criteria are “regular”, e.g., A/D/E/V/G-optimality
OUR RESULT Theorem. For all regular criteria f , there exists a polynomial time (1+ 𝜁 ) approximation algorithm provided that k = Ω ( p/ ε 2 ) #. of design subsets #. of variables / dimension - Remark 1: Concurrent to or after our works, 1+ 𝜁 approx. for D/A- optimality are obtained under condition k = Ω ( p/ ε + 1 / ε 2 ) ( Singh & Xie, SODA’18; Nikolov et al., arXiv’18) - Remark 2: The condition is tight for E-optimality and k = Ω ( p/ ε 2 ) continuous relaxation type methods. (Nikolov et al., arXiv’18)
ALGORITHMIC FRAMEWORK Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions
ALGORITHMIC FRAMEWORK Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions
CONTINUOUS RELAXATION Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j Relaxation: 0 ≤ s i ≤ 1 - Equivalent formulation: n ! n X X s i x i x > s i ≤ k, s i ∈ { 0 , 1 } s 1 , ··· ,s n f s.t. min i i =1 i =1 - Convex! Can be solved using classical methods (e.g., projected gradient/ mirror descent )
CONTINUOUS RELAXATION Optimal experimental design Find subset , so as to minimize S ⊆ [ n ] | S | ≤ k ⇣P ⌘ j 2 S x j x > f j Relaxation: 0 ≤ s i ≤ 1 - Equivalent formulation: n ! n X X s i x i x > s i ≤ k, s i ∈ { 0 , 1 } s 1 , ··· ,s n f s.t. min i i =1 i =1 - Question: Round { s i } to integer values
ALGORITHMIC FRAMEWORK Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions
WHITENING Rounding problem. Given optimal continuous solution , π s ∈ { 0 , 1 } n , P round it to such that b i b s i ≤ k f ( P i ) ≤ (1 + O ( ε )) · f ( P s i x i x > i π i x i x > i ) i b x i = W − 1 / 2 x i i π i x i x > - Whitening: where W = P e i - By monotonicity of f , the rounding problem is reduced to λ min ( P x > i ) ≥ 1 − O ( ε ) i b s i e x i e
ALGORITHMIC FRAMEWORK Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions
REGRET MINIMIZATION Matrix linear bandit/online learning: Action space ∆ p = { A ⌫ 0 , tr( A ) = 1 } - At each time t a player picks an action , observes a A t ∈ ∆ p reference and suffers loss h A t , F t i F t - Objective: minimize regret of the action sequences T T X X R ( A ) := h F t , A t i � inf h F t , ∆ i U ∈ ∆ p t =1 t =1 precisely λ min ( X F t )
REGRET MINIMIZATION Matrix linear bandit/online learning: - At each time t a player picks an action , observes a A t ∈ ∆ p reference and suffers loss h A t , F t i F t - Objective: minimize regret of the action sequences R ( A ) - Follow-The-Regularized-Leader policy: ( t − 1 ) X A t = arg min w ( A ) + α · h F τ , A i A ∈ ∆ p “regularizer” τ =1 Example regularizers: ( ) t − 1 X A t = exp cI − α F τ w ( A ) = tr( A > (log A − I )) 1. MWU: τ =1 ! − 2 2. l 1/2 -regularization: w ( A ) = − 2tr( A 1 / 2 ) t − 1 X A t = cI − α F τ τ =1
REGRET MINIMIZATION swapping of two design points F t = u t u > t − v t v > Regret lemma. Suppose . Then t � 2 p p k k u > v > t A t u t t A t v t X X h F t , U i � � inf t A 1 / 2 t A 1 / 2 1 + 2 α u > 1 � 2 α v > U 2 ∆ p u t v t α t =0 t =1 t t penalty parameter in FTRL FTRL solution at time t - Proved using classical analysis of regret of FTRL policies - F t : swapping of two design points from the pool.
ALGORITHMIC FRAMEWORK Continuous relaxation of the discrete problem Whitening of candidate design points Regret minimization characterization of least eigenvalues Greedy swapping based on FTRL potential functions
GREEDY SWAPPING F t = u t u > t − v t v > Regret lemma. Suppose . Then t � 2 p p k k u > v > t A t u t t A t v t X X h F t , U i � � inf t A 1 / 2 t A 1 / 2 1 + 2 α u > 1 � 2 α v > U 2 ∆ p u t v t α t =0 t =1 t t A “potential” function: u > Au v > Av ψ ( u, v ; A ) := 1 + 2 α u > A 1 / 2 u − 1 − 2 α v > A 1 / 2 v
GREEDY SWAPPING F t = u t u > t − v t v > Regret lemma. Suppose . Then t � 2 p p k k u > v > t A t u t t A t v t X X h F t , U i � � inf t A 1 / 2 t A 1 / 2 1 + 2 α u > 1 � 2 α v > U 2 ∆ p u t v t α t =0 t =1 t t The “greedy swapping” algorithm: - Start with an arbitrary set of size k S 0 ⊆ [ n ] - At each t , find that maximize i t ∈ S t − 1 , j t / ∈ S t − 1 ψ ( x j t , x i t ; A t − 1 ) - Greedy swapping: S t ← S t − 1 ∪ { j t }\{ i t }
GREEDY SWAPPING F t = u t u > t − v t v > Regret lemma. Suppose . Then t � 2 p p k k u > v > t A t u t t A t v t X X h F t , U i � � inf t A 1 / 2 t A 1 / 2 1 + 2 α u > 1 � 2 α v > U 2 ∆ p u t v t α t =0 t =1 t t Proof framework: k ≥ 5 p/ ε 2 , α = √ p/ ε - If then the “progress” of each swapping is lower bounded by until ε /k λ min ≥ 1 − O ( ε ) - Repeat the swapping for at most iterations until we’re done. O ( k/ ε )
Recommend
More recommend