Introduction: Why Optimization? Geoff Gordon & Ryan Tibshirani - PowerPoint PPT Presentation

Introduction: Why Optimization? Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1

Where this course fits in In many ML/statistics/engineering courses, you learn how to: translate into min f ( x ) Question/idea Optimization problem In this course, you’ll learn that min f ( x ) is not the end of the story, i.e., you’ll learn • Algorithms for solving min f ( x ) , and how to choose between them • How knowledge of algorithms for min f ( x ) can influence the choice of translation • How knowledge of algorithms for min f ( x ) can help you understand things about the problem 2

Optimization in statistics A huge number of statistics problems can be cast as optimization problems, e.g., • Regression • Classification • Maximum likelihood But a lot of problems cannot, and are based directly on algorithms or procedures, e.g., • Clustering • Correlation analysis • Model assessment Not to say one camp is better than the other ... but if you can cast something as an optimization problem, it is often worthwhile 3

Sparse linear regression Given response y ∈ R n and predictors A = ( A 1 , . . . A p ) ∈ R n × p . We consider the model y ≈ Ax But n ≪ p , and we think many of the variables A 1 , . . . A p could be unimportant. I.e., we want many components of x to be zero ≈ E.g., size of tumor ≈ linear combination of genetic information, but not all gene expression measurements are relevant 4

Three methods Solving the usual linear regression problem x ∈ R n � y − Ax � 2 min would return a dense x (and not well-defined if p > n ). We want a sparse x . How? Three methods: • Best subset selection – nonconvex optimization problem • Forward stepwise regression – algorithm • Lasso – convex optimization problem 5

Best subset selection Natural idea, we solve x ∈ R p � y − Ax � 2 subject to � x � 0 ≤ k min where � x � 0 = number of nonzero components in x , nonconvex “norm” 1.0 0.5 0.0 x2 { x ∈ R 2 : � x � 0 ≤ 1 } −0.5 −1.0 −1.0 −0.5 0.0 0.5 1.0 x1 • Problem is NP-hard • In practice, solution cannot be computed for p � 40 • Very little is known about properties of solution 6

Forward stepwize regression Also natural idea: start with x = 0 , then • Find variable j such that | A T j y | is largest (note: if variables have been centered and scaled, then A T j y = cor( A j , y ) ) • Update x j by regressing y onto A j , i.e., solve x j ∈ R � y − A j x j � 2 min • Now find variable k � = j such that | A T k r | is largest, where r = y − A j x j (i.e., | cor( A k , r ) | is largest) • Update x j , x k by regressing y onto A j , A k • Repeat Some properties of this estimate are known, but not many; proofs are (relatively) complicated 7

Lasso We solve x ∈ R p � y − Ax � 2 subject to � x � 1 ≤ t min where � x � 1 = � p i =1 | x i | , a convex norm 1.0 0.5 x2 0.0 { x ∈ R 2 : � x � 1 ≤ 1 } −0.5 −1.0 −1.0 −0.5 0.0 0.5 1.0 x1 • Delivers exact zeros in solution – lower t , more zeros • Problem is convex and readily solved • Many properties are known about the solution 8

Comparison # of Google Properties # of algorithms Scholar hits known Best subset 2274 1 (brute force) Little selection Forward stepwise 7207 1 (itself) Some regression 13,100 1 Lasso ≥ 10 Lots 1 I searched for ’lasso + statistics’ because ’lasso’ resulted in nearly 8 times as many hits. I also tried to be fair, and search for best subset selection and forward stepwise regression under their alternative names. On August 27, 2010. 9

Introduction: Why Optimization? Geoff Gordon & Ryan Tibshirani - PowerPoint PPT Presentation

Introduction: Why Optimization? Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Where this course fits in In many ML/statistics/engineering courses, you learn how to: translate into min f ( x ) Question/idea Optimization

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

Optimization (Introduction) Optimization Goal: Find the minimizer that minimizes the

Introduction to Optimization Dr. Mihail October 23, 2018 (Dr. Mihail) Optimization October 23,

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Optimization Optimization Goal: Find the minimizer ! that minimizes the objective (cost)

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

St Stress Aware Layout Stress Aware Layout St A A L L t t Optimization Optimization

TEG: A New Post-Layout TEG: A New Post-Layout Optimization Method Optimization Method Shuo

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

Optimization Process Done by an Optimization Algorithm Jose Rueda Torres Learning Objectives

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

Computational Approaches to Analysis and Control of Hybrid Systems Antoine Girard Laboratoire

Algorithms for Parity Games Piotr Danilewski May 15, 2008 Piotr Danilewski Algorithms for

CS535 Big Data 03/02/2020 Week 7-A Sangmi Lee Pallickara CS535 Big Data | Computer Science |

Information Theory and Feature Selection (Joint Informativeness and Tractability) Leonidas

1 The Cost of Feature Transformation Feature Rescaling } Not every transformation Input: Each

Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology

Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative SMT