compsci 514 algorithms for data science
play

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 18 0 logistics 1 Problem Set 3 on Spectral Methods due this Friday at 8pm . Can turn in without penalty until Sunday at


  1. compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 18 0

  2. logistics 1 • Problem Set 3 on Spectral Methods due this Friday at 8pm . • Can turn in without penalty until Sunday at 11:59pm.

  3. • Power method is an iterative algorithm for solving the non-convex 1 v T X T X v • More general iterative algorithms for optimization, specifically • What are they methods, when are they applied, and how do you • Small taste of what you can find in COMPSCI 590OP or 690OP. summary 2 analyze their performance? gradient descent and its variants. This Class (and until Thanksgiving): 2 Last Class: v v max optimization problem: computing more singular vectors. 2 • Power method for computing the top singular vector of a matrix. • High level discussion of Krylov methods, block versions for

  4. summary max analyze their performance? gradient descent and its variants. This Class (and until Thanksgiving): Last Class: 2 computing more singular vectors. optimization problem: • Power method for computing the top singular vector of a matrix. • High level discussion of Krylov methods, block versions for • Power method is an iterative algorithm for solving the non-convex ⃗ v T X T X ⃗ v . ⃗ v : ∥ ⃗ v ∥ 2 2 ≤ 1 • More general iterative algorithms for optimization, specifically • What are they methods, when are they applied, and how do you • Small taste of what you can find in COMPSCI 590OP or 690OP.

  5. • Unconstrained convex and non-convex optimization. • Linear programming, quadratic programming, semidefinite discrete vs. continuous optimization Discrete (Combinatorial) Optimization: (traditional CS algorithms) maximum independent set, traveling salesman problem scheduling, sequence alignment, submodular maximization possible solutions. Many of these problems are NP-Hard. Continuous Optimization: (not covered in core CS curriculum. Touched on in ML/advanced algorithms, maybe.) programming 3 • Graph Problems: min-cut, max flow, shortest path, matchings, • Problems with discrete constraints or outputs: bin-packing, • Generally searching over a finite but exponentially large set of

  6. • Unconstrained convex and non-convex optimization. • Linear programming, quadratic programming, semidefinite discrete vs. continuous optimization Discrete (Combinatorial) Optimization: (traditional CS algorithms) maximum independent set, traveling salesman problem scheduling, sequence alignment, submodular maximization Continuous Optimization: (not covered in core CS curriculum. Touched on in ML/advanced algorithms, maybe.) programming 3 • Graph Problems: min-cut, max flow, shortest path, matchings, • Problems with discrete constraints or outputs: bin-packing, • Generally searching over a finite but exponentially large set of possible solutions. Many of these problems are NP-Hard.

  7. discrete vs. continuous optimization Discrete (Combinatorial) Optimization: (traditional CS algorithms) maximum independent set, traveling salesman problem scheduling, sequence alignment, submodular maximization possible solutions. Many of these problems are NP-Hard. Continuous Optimization: (not covered in core CS curriculum. Touched on in ML/advanced algorithms, maybe.) programming 3 • Graph Problems: min-cut, max flow, shortest path, matchings, • Problems with discrete constraints or outputs: bin-packing, • Generally searching over a finite but exponentially large set of • Unconstrained convex and non-convex optimization. • Linear programming, quadratic programming, semidefinite

  8. continuous optimization examples 4

  9. continuous optimization examples 4

  10. • A • 1 T mathematical setup 1. c . i 1 i d 0. T A b , 1 1, 2 • Often under some constraints: Typically up to some small approximation factor. 5 Given some function f : R d → R , find ⃗ θ ⋆ with: f ( ⃗ θ ∈ R d f ( ⃗ θ ⋆ ) = min θ ) ⃗

  11. • A • 1 T mathematical setup 1. c . i 1 i d 0. T A b , 1 1, 2 • Often under some constraints: Typically up to some small approximation factor. 5 Given some function f : R d → R , find ⃗ θ ⋆ with: f ( ⃗ θ ∈ R d f ( ⃗ θ ⋆ ) = min θ ) + ϵ ⃗

  12. mathematical setup Typically up to some small approximation factor. b , Often under some constraints: 5 Given some function f : R d → R , find ⃗ θ ⋆ with: f ( ⃗ θ ∈ R d f ( ⃗ θ ⋆ ) = min θ ) + ϵ ⃗ • ∥ ⃗ ∥ ⃗ θ ∥ 2 ≤ 1, θ ∥ 1 ≤ 1. θ ≤ ⃗ • A ⃗ θ T A ⃗ ⃗ θ ≥ 0. θ = ∑ d 1 T ⃗ i = 1 ⃗ • ⃗ θ ( i ) ≤ c .

  13. • Have a model, which is a function mapping inputs to predictions • The model is parameterized by a parameter vector (weights in a • Want to train this model on input data, by picking a parameter why continuous optimization? Modern machine learning centers around continuous optimization. Typical Set Up: (supervised machine learning) (neural network, linear function, low-degree polynomial etc). neural network, coefficients in a linear function or polynomial) vector such that the model does a good job mapping inputs to predictions on your training data. This training step is typically formulated as a continuous optimization problem. 6

  14. • Have a model, which is a function mapping inputs to predictions • The model is parameterized by a parameter vector (weights in a • Want to train this model on input data, by picking a parameter why continuous optimization? Modern machine learning centers around continuous optimization. Typical Set Up: (supervised machine learning) (neural network, linear function, low-degree polynomial etc). neural network, coefficients in a linear function or polynomial) vector such that the model does a good job mapping inputs to predictions on your training data. This training step is typically formulated as a continuous optimization problem. 6

  15. why continuous optimization? Modern machine learning centers around continuous optimization. Typical Set Up: (supervised machine learning) (neural network, linear function, low-degree polynomial etc). neural network, coefficients in a linear function or polynomial) vector such that the model does a good job mapping inputs to predictions on your training data. This training step is typically formulated as a continuous optimization problem. 6 • Have a model, which is a function mapping inputs to predictions • The model is parameterized by a parameter vector (weights in a • Want to train this model on input data, by picking a parameter

  16. why continuous optimization? Modern machine learning centers around continuous optimization. Typical Set Up: (supervised machine learning) (neural network, linear function, low-degree polynomial etc). neural network, coefficients in a linear function or polynomial) vector such that the model does a good job mapping inputs to predictions on your training data. This training step is typically formulated as a continuous optimization problem. 6 • Have a model, which is a function mapping inputs to predictions • The model is parameterized by a parameter vector (weights in a • Want to train this model on input data, by picking a parameter

  17. d (the regression coefficients) minimizing the loss function: x i is from y i . 2 (least squares regression) • y i x i R where is some measurement of how far M • M x i y i M optimization in ml y i x i 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) y i 1 M . Model: M d with M x def x 1 x 1 d x d Parameter Vector: Example 1: Linear Regression Optimization Problem: Given data points (training points) x 1 x n (the rows of data matrix X n d ) and labels y 1 y n , find L X n i 7

  18. d (the regression coefficients) minimizing the loss function: x i is from y i . 2 (least squares regression) • y i optimization in ml y i R where is some measurement of how far M • M x i y i M y i x i M 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) x i i 1 Optimization Problem: Given data points (training points) x 1 def 1 x 1 d x d Example 1: Linear Regression Parameter Vector: . x n L n (the rows of data matrix X X 7 , find y n d ) and labels y 1 n θ : R d → R with M ⃗ = ⟨ ⃗ θ ( ⃗ θ,⃗ x ) x ⟩ Model: M ⃗

  19. d (the regression coefficients) minimizing the loss function: x i is from y i . 2 (least squares regression) • y i optimization in ml R where is some measurement of how far M • M x i y i M x i y i x i 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) y i 1 M n def Example 1: Linear Regression Optimization Problem: Given data points (training points) x 1 x n (the rows of data matrix X Parameter Vector: d ) and labels y 1 n i y n 7 X L , find θ : R d → R with M ⃗ = ⟨ ⃗ x ⟩ = ⃗ x ( 1 ) + . . . + ⃗ θ ( ⃗ θ,⃗ θ ( 1 ) · ⃗ θ ( d ) · ⃗ x ) x ( d ) . Model: M ⃗

  20. minimizing the loss function: x i is from y i . 2 (least squares regression) • y i optimization in ml M y i R where is some measurement of how far M • M x i y i y i x i M 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) x i 1 Example 1: Linear Regression n def i Optimization Problem: Given data points (training points) x 1 x n (the rows of data matrix X 7 d ) and labels y 1 , find L X n y n θ : R d → R with M ⃗ = ⟨ ⃗ x ⟩ = ⃗ x ( 1 ) + . . . + ⃗ θ ( ⃗ θ,⃗ θ ( 1 ) · ⃗ θ ( d ) · ⃗ x ) x ( d ) . Model: M ⃗ Parameter Vector: ⃗ θ ∈ R d (the regression coefficients)

Recommend


More recommend