Estimate Sequences for Variance-Reduced Stochastic Composite - PowerPoint PPT Presentation

Feb 11, 2024 •255 likes •317 views

Estimate Sequences for Variance-Reduced Stochastic Composite Optimization Andrei Kulunchakov Julien Mairal andrei.kulunchakov@inria.fr julien.mairal.@inria.fr International Conference on Machine Learning, 2019 Poster event-4062, (Jun 12th,

Estimate Sequences for Variance-Reduced Stochastic Composite Optimization Andrei Kulunchakov Julien Mairal andrei.kulunchakov@inria.fr julien.mairal.@inria.fr International Conference on Machine Learning, 2019 Poster event-4062, (Jun 12th, Pacific Ballroom 204)
Problem statement Assumptions We solve a stochastic composite optimization problem n f ( x ) = 1 � ˜ � � F ( x ) = f ( x ) + ψ ( x ) where f i ( x ) with f i ( x ) = E ξ f i ( x , ξ ) , n i =1 where ψ ( x ) is a convex penalty, each f i is L -smooth and µ -strongly convex. Variance in gradient estimates Stochastic realizations of gradients are available for each i ˜ and Var [ ξ i ] ≤ σ 2 . ∇ f i ( x ) = ∇ f i ( x ) + ξ i with E [ ξ i ] = 0 Poster event-4062, (Jun 12th, Pacific Ballroom 204)
Main contribution (I) Optimal incremental algorithm robust to noise Optimal incremental algorithm with a complexity � �� F ( x 0 ) − F ⋆ σ 2 nL O n + log + O , µ ε µε based on the SVRG gradient estimator with random sampling. Algorithm Briefly, the algorithm is an incremental hybrid of the heavy-ball method with randomly updated SVRG anchor point and two auxiliary sequences, controlling the extrapolation. Poster event-4062, (Jun 12th, Pacific Ballroom 204)
Main contribution (II) Novelty When σ 2 = 0, we recover the same complexity as Katyusha [Allen-Zhu, 2017]. • Novelty: accelerated incremental algorithm robust to σ 2 > 0 with the optimal term σ 2 /µε . • Another contributions • Generic proofs for incremental methods (SVRG, SAGA, MISO, SDCA) to show their robustness to noise � � � F ( x 0 ) − F ⋆ σ 2 n + L �� log + O . O µ ε µε • When µ = 0, we recover optimal rates in fixed horizon and known σ 2 . • Provide a support for non-uniform sampling. Poster event-4062, (Jun 12th, Pacific Ballroom 204)
Side contributions Adaptivity to strong convexity parameter µ When σ = 0, we show adaptivity to µ for all above-mentioned non-accelerated methods. This property is new for SVRG. Accelerated SGD A version of robust accelerated SGD with complexity similar to [Ghadimi and Lan, 2012, 2013] σ 2 + σ 2 �� F ( x 0 ) − F ⋆ L n O µ log + O , ε µε where σ 2 n is due to sampling the data points. Poster event-4062, (Jun 12th, Pacific Ballroom 204)
Experiments with three datasets in the experiments — Pascal Large Scale Learning Challenge ( n = 25 · 10 4 ) — Light gene expression data for breast cancer ( n = 295) — CIFAR-10 (images represented by features from a network) with n = 5 · 10 4 Examples with zero noise ( σ = 0) and stochastic case ( σ > 0) 10 0 rand-SVRG 1/12L 10 0 acc-SVRG 1/3L rand-SVRG-d 10 -1 10 -1 acc-SVRG-d log(F/F * -1) SGD-d log(F/F * -1) 10 -2 10 -2 acc-mb-SGD-d 10 -3 10 -3 rand-SVRG 1/12L rand-SVRG 1/3L acc-SVRG 1/3L 10 -4 10 -4 SGD 1/L SGD-d 10 -5 10 -5 acc-SGD-d 0 50 100 150 200 250 300 0 50 100 150 200 250 300 acc-mb-SGD-d Effective passes over data, CIFAR-10 Effective passes over data, Pascal Challenge Poster event-4062, (Jun 12th, Pacific Ballroom 204)

Recommend

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X is: var( X ) = E [( X E X ) 2 ] Alternatively, (check using linearity of expectation), var( X ) = E [ X 2 ] ( E X ) 2 Variance Variance is

395 views • 12 slides

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we define: the bias of the estimator: the variance of estimator: e.g., estimator for probability of heads, based on n independent coin

606 views • 25 slides

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more generally behaviors are about Sequences are used integrating the concept of time into what is learned. In } to analyze time dependent data general,

298 views • 6 slides

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip a coin: If H you make a dollar. If T you lose a dollar. Let X be the RV indicating how much money you make. E ( X ) = 0. Flip a coin: If H you make

497 views • 36 slides

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Estimating Variance . . . Case of Measurement . . . Case of Expert . . . Estimating Mean . . . Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . . Estimating Variance . . . Uncertainty: Estimating

278 views • 11 slides

Analysis of variance and regression December 4, 2007 Variance component models Variance

Analysis of variance and regression December 4, 2007 Variance component models Variance components One-way anova with random variation estimation interpretations Two-way anova with random variation Crossed random effects

803 views • 78 slides

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Variance of an Indicator Mathematics for Computer Science MIT 6.042J/18.062J IanindicatorwithE[I]=p: Var[I]:: = E[(I p) 2 ] Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1 variance.2

104 views • 6 slides

Stochastic Simulation Variance reduction methods Bo Friis Nielsen Applied Mathematics and

Stochastic Simulation Variance reduction methods Bo Friis Nielsen Applied Mathematics and Computer Science Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfni@dtu.dk Variance reduction methods Variance reduction methods

371 views • 16 slides

Stochastic Simulation Methods: Variance reduction methods Antithetic variables Bo Friis

Variance reduction methods Variance reduction methods To obtain better estimates with the same ressources Exploit analytical knowledge and/or correlation Stochastic Simulation Methods: Variance reduction methods Antithetic

277 views • 4 slides

Some Immediately Noticeable Benefits of using Polytron Reduced temperature n Reduced vibrations n

Polytron Lubrication Technology Based on Micro-Metallurgical Process Some Immediately Noticeable Benefits of using Polytron Reduced temperature n Reduced vibrations n Reduced noise level n Reduced electrical power consumption n Considerable

630 views • 16 slides

Sequences Sequences and Difference Equations "Sequences" is a central topic in

5mm. Sequences Sequences and Difference Equations "Sequences" is a central topic in mathematics: (Appendix A) x 0 , x 1 , x 2 , . . . , x n , . . . , Example: all odd numbers Hans Petter Langtangen 1 , 3 , 5 , 7 , . . . , 2 n + 1 , .

223 views • 10 slides

Sequences Sequences and Difference Equations "Sequences" is a central topic in

475 views • 5 slides

Advanced Algorithms (X) Shanghai Jiao Tong University Chihao Zhang May 11, 2020 Estimate

Advanced Algorithms (X) Shanghai Jiao Tong University Chihao Zhang May 11, 2020 Estimate Estimate One can design a Monte-Carlo algorithm to estimate the value of Estimate One can design a Monte-Carlo algorithm to estimate the value

1.44k views • 71 slides

Outline 1 Presentation of the problem Truncated Stochastic Algorithms and Variance Reduction:

Presentation of the problem Presentation of the problem S.A. and Variance Reduction: Toward an automatic method S.A. and Variance Reduction: Toward an automatic method Stochastic Algorithms Stochastic Algorithms Numerical Examples Numerical

392 views • 10 slides

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Julien Mairal Alberto Bietti Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21, 2017 1 / 20 Stochastic optimization

373 views • 20 slides

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some Decompose Error Rate into components, some of which can be measured on unlabeled data of which can be measured on unlabeled data Bias- -Variance

942 views • 77 slides

Serial Peripheral Interface (SPI) Synchronous serial data transfers Multipoint serial

Serial Peripheral Interface (SPI) Synchronous serial data transfers Multipoint serial communication between a master and a slave device Clock permits faster data rates than async communications (framing unnecessary)

337 views • 21 slides

3 High data rate in interference limited senarios High data rates in noise limitted

Outline 3G Evolution High data rates: Fundamental constraints Chapter: 3 High data rate in interference limited senarios High data rates in noise limitted senarios Higher data rates within a limited bandwidth : Higher order

539 views • 8 slides

Cyber-Physical Systems Communication IECE 553/453 Fall 2019 Prof. Dola Saha 1 Why do we

Cyber-Physical Systems Communication IECE 553/453 Fall 2019 Prof. Dola Saha 1 Why do we need Communication? Connect different systems together o Two embedded systems o A desktop and an embedded system Connect different chips together

1.22k views • 63 slides

Transmission Expansion Transmission Expansion REV REVOLUTION: Oklahoma W TION: Oklahoma Wind

Transmission Expansion Transmission Expansion REV REVOLUTION: Oklahoma W TION: Oklahoma Wind nd Ener Energy Conf Confer erence ence Oklahoma Cit Oklahoma City, y, OK y, y, OK December 3, December 3, 2008 2008 O O Overv verview

703 views • 41 slides

fantastic frontend whee! performance tricks and why we Jenna Zeigen do them #perfmatters

*fantastic* frontend whee! performance tricks and why we Jenna Zeigen do them #perfmatters 4/2/19 Senior Frontend Engineer at Slack Organizer of EmpireJS Organizer of BrooklynJS jenna.is/at-perfmatters @zeigenvector very online.

869 views • 85 slides

EE107 Spring 2019 Lecture 4 Serial Busses Embedded Networked Systems Sachin Katti *slides

EE107 Spring 2019 Lecture 4 Serial Busses Embedded Networked Systems Sachin Katti *slides adapted from Aaron Schulmans CSE190 Serial Buses in our project UART serial bus for sending debug messages to your development host I2C

639 views • 44 slides

How to port the NFC Reader Library to K64F Jordi Jofre (Speaker) Angela Gemio (Host) Webinar

How to port the NFC Reader Library to K64F Jordi Jofre (Speaker) Angela Gemio (Host) Webinar instructions Audio settings: You are in listen only mode due to possible background noise Set mic & speakers option (headset + external

516 views • 38 slides

A Self-Organizing Fuzzy Neural Networks H. S. LI N, X. Z. GAO, XI ANLI N HUANG, AND Z. Y. SONG

A Self-Organizing Fuzzy Neural Networks H. S. LI N, X. Z. GAO, XI ANLI N HUANG, AND Z. Y. SONG Abstract This paper proposes a novel clustering algorithm for the structure learning of fuzzy neural networks. Our clustering algorithm uses

635 views • 37 slides

Estimate Sequences for Variance-Reduced Stochastic Composite - PowerPoint PPT Presentation

Estimate Sequences for Variance-Reduced Stochastic Composite Optimization Andrei Kulunchakov Julien Mairal andrei.kulunchakov@inria.fr julien.mairal.@inria.fr International Conference on Machine Learning, 2019 Poster event-4062, (Jun 12th,

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Analysis of variance and regression December 4, 2007 Variance component models Variance

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Stochastic Simulation Variance reduction methods Bo Friis Nielsen Applied Mathematics and

Stochastic Simulation Methods: Variance reduction methods Antithetic variables Bo Friis

Some Immediately Noticeable Benefits of using Polytron Reduced temperature n Reduced vibrations n

Sequences Sequences and Difference Equations &quot;Sequences&quot; is a central topic in

Sequences Sequences and Difference Equations &quot;Sequences&quot; is a central topic in

Advanced Algorithms (X) Shanghai Jiao Tong University Chihao Zhang May 11, 2020 Estimate

Outline 1 Presentation of the problem Truncated Stochastic Algorithms and Variance Reduction:

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Bias- -Variance Theory Variance Theory Bias Decompose Error Rate into components, some

Serial Peripheral Interface (SPI) Synchronous serial data transfers Multipoint serial

3 High data rate in interference limited senarios High data rates in noise limitted

Cyber-Physical Systems Communication IECE 553/453 Fall 2019 Prof. Dola Saha 1 Why do we

Transmission Expansion Transmission Expansion REV REVOLUTION: Oklahoma W TION: Oklahoma Wind

*fantastic* frontend whee! performance tricks and why we Jenna Zeigen do them #perfmatters

EE107 Spring 2019 Lecture 4 Serial Busses Embedded Networked Systems Sachin Katti *slides

How to port the NFC Reader Library to K64F Jordi Jofre (Speaker) Angela Gemio (Host) Webinar

A Self-Organizing Fuzzy Neural Networks H. S. LI N, X. Z. GAO, XI ANLI N HUANG, AND Z. Y. SONG

Sequences Sequences and Difference Equations "Sequences" is a central topic in

Sequences Sequences and Difference Equations "Sequences" is a central topic in

fantastic frontend whee! performance tricks and why we Jenna Zeigen do them #perfmatters