Why adjoint based least squares solving ought to be optimal Andreas - PowerPoint PPT Presentation

Why adjoint based least squares solving ought to be optimal Andreas Griewank Department of Mathematics, Humboldt-Universit¨ at zu Berlin, Germany School of Information Sciences, Yachaytech, Ibarra, Ecuador September 2, 2015 Numerical Methods for Large-Scale Nonlinear Problems and their Applications ICERM, Brown University, Providence, RI with thanks to Andrea Walther(PAB) and Sebastian Schlenkrich(TUD) Sandra Schneider(HUB) and Claudia Tutsch(CLU) Andreas Griewank (HU-Berlin) Theoretically optimal least squares solving 2. September 1 / 4

Setting Problem F : R n �→ R m min ϕ ( x ) ≡ 1 2 � F ( x ) � 2 for with n ≤ m 2 First order optimality condition (necessary) 0 = ∇ ϕ ( x ∗ ) ≡ F ( x ∗ ) ⊤ F ′ ( x ∗ ) ∈ R n Second order optimality condition (sufficient) m � 1 > κ ∗ ≡ � R −⊤ F i ( x ) ∇ 2 F i ( x ) R − 1 F ′ ( x ∗ ) = Q ∗ R ∗ ∗ � 2 with ∗ i =1 Derivative availability and cost x ⊤ ≡ ¯ y ≡ F ′ ( x ) ˙ y ⊤ F ′ ( x ) � � � � OPS ˙ x 4 ≥ OPS ¯ ≤ 3 , � � � � OPS y ≡ F ( x ) OPS y ≡ F ( x ) Andreas Griewank (HU-Berlin) Theoretically optimal least squares solving 2. September 2 / 4

Gauss Adjoint Broyden Method Tangent conditions for B ≈ F ′ B + s = y ≡ F ′ ( x + ) s ∈ R m B ⊤ + σ = F ′ ( x + ) ⊤ σ ∈ R n and Transposed Broyden Update B + = B + σσ ⊤ ⊤ σ ( F ′ ( x + ) − B ) for σ = y and σ = r ≡ y − Bs σ yields rank-two update, which can be implemented in O ( mn ) operations. Resulting Properties Frobenius norm change minimality, domain transformation invariance, and heredity on affine systems F ( x ) = Ax − b . Quasi-Gauss-Newton Iteration x + = x − α ( B ⊤ B ) − 1 ∇ ϕ ( x ) with by Andersen(m=1) α Andreas Griewank (HU-Berlin) Theoretically optimal least squares solving 2. September 3 / 4

Provable Properties Global convergence x 0 ∈ { ϕ ( x ) ≤ c } compact and rank( F ′ ( x )) = n 0 = inf k �∇ ϕ ( x k ) � ⇐ Asymptotic R-rate in overdetermined case ( m > n ) 1 k ≤ κ ∗ < 1 0 = inf k � x k − x ∗ � ⇒ lim sup � x k − x ∗ � k →∞ Asymptotic order in consistent case ( m = n ) 1 k ≥ ρ n ≈ 1 + log( n ) 1 = ρ n +1 − ρ n lim inf k →∞ | log( � x k − x ∗ � ) | with n n n On affine problems Finite termination in ≤ n steps, (´ a la GMRES when m = n and B 0 = I .) Andreas Griewank (HU-Berlin) Theoretically optimal least squares solving 2. September 4 / 4

Piecewise linearizations of nonsmooth equations and their numerical solution Andreas Griewank 1,2 Tom Streubel 1 Richard Hasenfelder 1 1) Department of Mathematics, Humboldt University at Berlin 2) School of Information Sciences, Yachaytech, Ibarra, Eucador Numerical Methods for Large-Scale Nonlinear Problems and Their Applications ICERM at Brown University

evaluation procedures function expression assume to be a chain of functions from some Library and the absolute value function the expression can be recast as single assignment code here is a dependence relation generating a partial order single assignment code acyclic directed computational graph

Algorithmic Piecewise Linearization - I basic idea  propagate piecewise linear rather than linear approximations  therefor replace difgerentiable elementals by its linear tangent/secant model  as well as absolute value function by itself secant mode tangent mode

Algorithmic Piecewise Linearization - II either choose  one reference point (tangent mode)  two reference points (secant mode) For any single assignment evaluate an increment These increments depends on reference point(s) and preceding increments. So we write  (tangent mode)  (secant mode)

Algorithmic Piecewise Linearization - III is called tangent piecewise linear model of at and satisfjes Inhomogeneous tangent model is called secant piecewise linear model of at if where Inhomogeneous secant model

Algorithmic Piecewise Linearization - IV  Algorithmic piecewise linearization can be performed by slight modifjcations of common AD-Tools (e.g. Adol-C) see autodifg.org →  general properties of PL functions  Lipschitz continuous  consists of linear and absolute value functions  correspond to a polyhedral subdivision  a polyhedron with non empty interior is called essential  Implication chain (by S. Scholtes):  openness is equivalent to coherent orientation:

Approximation properties of PL models For some (algorithmically computable) Lipschitz constant simplifjes to, if For some (algorithmically computable) Lipschitz constant Implications:  

Newton via successive piecewise linearization I Tangent mode Let be a root of a algorithm . If for a fjxed radius then is called feasible tangent mode iteration . Secant mode if again for a fjxed radius then is called feasible secant mode iteration , where and set-valued inverses

Quadratic or golden ration convergence rate Tangent mode assume feasibility of tangent mode iteration as well as (local strong metric regularity) satisfjed, the tangent mode iteration converges quadratically (rate ) to Secant mode assume feasibility of secant mode iteration as well as (local strong metric regularity) satisfjed, then the secant mode iteration converges with Golden ratio rate to the root

Newton via successive piecewise linearization II strong metric regularity in i.e. is implied by openness of the restriction of to So far we know feasibility of both iterations is implied by injectivity of Open Newton Conjecture : feasibility is already guaranteed in case of openness of

A 2D oscillating test example  For any vector take its angle from polar coordinate representation and map it by some difgerentiable (right picture) or bijective function (picture below)  thereby preserve its euclidean norm

A 2D oscillating test example – homogen. part  upper half of is stretched (blue)  lower half is compressed (red)  is bijective and Lipschitz continuous  the line is kept fjxed  almost everywhere difgerentiable, but not at origin

A 2D oscillating test example

Piecewise linear subproblem I Defjnition: Abs-normal Form PL  Any piecewise linear function can be represented this way  the matrix is of strict lower triangular form thus can be evaluated explicitly and element wise  the abs-normal form is numerically stable use as data structure  the signature of is defjned as follows each one corresponds to a polyhedron from the polyhedral subdivision of  Task : search a root such that

Piecewise linear subproblem II  one can simplify the polyhedral structure of a given problem Find (we refer this as original piecewise linear problem or short OPL )  evaluate Schur-complement of and defjne Find (we refer this as complementary piecewise linear problem or short CPL )  CPL 's and LCP 's are equivalent formulations via Möbius transformation  there is a one-to-one solution correspondence between OPL and CPL

Full step Newton method I By the one to one solution correspondence search a root of one of the two systems ( OPL ) ( CPL ) where where  both are generalized Newton methods in the sense of Qi and Sun  But we seek global rather than local convergence criteria  Converges from every starting point towards a solution if either or is satisfjed and the root is unique  for a essential signature is always a limiting Jacobian of the underlying PL function

Full step Newton method II conditions for contractivity or Verify the conditions is NP-hard but one can fjnd suffjcient conditions:  OPL : Assume from the abs-normal Form to be regular then if both conditions are satisfjed.  CPL : both conditions are satisfjed if or

Restricted Newton method Under the assumption of coherent orientation (c.o.): Piecewise-Newton ( OPL ) ( CPL )  here is called critical multiplier and maximal s.t. the Newton step doesn't leave the closure of the polyhedron corresponding to the chosen essential Signature  the step is shrunk by non smoothness arising on its direction  the paths are bifurcation free for almost all starting points and also for the CPL  if the Problem is c.o. then the piecewise Newton converges from everywhere to a root

Outlook  proof open Newton conjecture  further develop PL Algebra Package Plan-C (C++) → method optimization and comparison  Branin's modifjcation for PL-Newton on PL equation systems (for non open problems)  use clipped Models to preserve global properties (i.e. symmetric, bounded)  extension to euclidean norm or algebraic inclusion

Why adjoint based least squares solving ought to be optimal Andreas - PowerPoint PPT Presentation

Why adjoint based least squares solving ought to be optimal Andreas Griewank Department of Mathematics, Humboldt-Universit at zu Berlin, Germany School of Information Sciences, Yachaytech, Ibarra, Ecuador September 2, 2015 Numerical Methods

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Geometry of Least Squares 2 Least squares from the

Solving Regularized Total Least Squares Problems Based on Eigensolvers Heinrich Voss

ECS231 Least-squares problems (Introduction to Randomized Algorithms) May 21, 2019 1 / 12

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

A least squares approach for the Discretizable Distance Geometry Problem with inexact distances

Optimizing pred(25) Is Problem NP-Hard Main Result Acknowledgments Martine Ceberio, Olga

Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares Sham M Kakade c 2018

Simple Linear Regression and Correlation Model for designed experiment: Y i = 0 + 1 x i +

E9 205 Machine Learning for Signal Processing Probablistic Linear Models 30-09-2019 Linear

Introduction to Mobile Robotics Compact Course on Linear Algebra Wolfram Burgard, Cyrill

a 11 a 1 n . . ... . . A = . . a n 1 a nn

An algorithm for the least-squares solution of rank-deficient linear systems G. Rodriguez