Local Loss Optimization in Operator Models: A New Insight into - PowerPoint PPT Presentation

Local Loss Optimization in Operator Models: A New Insight into Spectral Learning Borja Balle , Ariadna Quattoni, Xavier Carreras ICML 2012 June 2012, Edinburgh This work is partially supported by the PASCAL2 Network and a Google Research Award

A Simple Spectral Method [HKZ09] ➓ n states – Y t P t 1, . . . , n ✉ Discrete Homogeneous ➓ k symbols – X t P t σ 1 , . . . , σ k ✉ Hidden Markov Model ➓ for now assume n ↕ k Y 1 Y 2 Y 3 Y 4 ⋯ ➓ Forward-backward equations with A σ P R n ✂ n : X 1 X 2 X 3 X 4 P r X 1 : t ✏ w s ✏ α ❏ 1 A w 1 ☎ ☎ ☎ A w t � 1 ➓ Probabilities arranged into matrices H , H σ 1 , . . . , H σ k P R k ✂ k H ♣ i , j q ✏ P r X 1 ✏ σ i , X 2 ✏ σ j s H σ ♣ i , j q ✏ P r X 1 ✏ σ i , X 2 ✏ σ , X 3 ✏ σ j s ➓ Spectral learning algorithm for B σ ✏ QA σ Q ✁ 1 : 1. Compute SVD H ✏ UDV ❏ and take top n right singular vectors V n 2. B σ ✏ ♣ HV n q � H σ V n (For simplicity, in this talk we ignore learning of initial and final vectors)

A Local Approach to Learning? ➓ Maximum likelihood uses the whole of the sample S ✏ t w 1 , . . . , w N ✉ and is always consistent in the realizable case ➳ N 1 log ♣ α ❏ � max 1 A w i 1 ☎ ☎ ☎ A w i 1 q N ti α 1 , t A σ ✉ i ✏ 1 ➓ The spectral method only uses local information from the sample in H , ♣ ♣ H a , ♣ H b and its consistency depends on properties of H S ✏ t abbabba , aabaa , baaabbbabab , bbaaba , bababbabbaaaba , abbb , . . . ✉ Questions ➓ Is the spectral method minimizing a “local” loss function? ➓ When does this minimization yield a consistent algorithm?

Outline Spectral Learning as Local Loss Optimization A Convex Relaxation of the Local Loss Choosing a Consistent Local Loss

Loss Function of the Spectral Method ➓ Both ingredients in the spectral method have optimization interpretations n V n ✏ I ⑥ HV n V ❏ SVD — min V ❏ n ✁ H ⑥ F Pseudo-inverse — min B σ ⑥ HV n B σ ✁ H σ V n ⑥ F ➓ Can formulate a joint optimization for the spectral method ➳ ⑥ HV n B σ ✁ H σ V n ⑥ 2 min F t B σ ✉ , V ❏ n V n ✏ I σ P Σ

Properties of the Spectral Optimization ➳ ⑥ HV n B σ ✁ H σ V n ⑥ 2 min F t B σ ✉ , V ❏ n V n ✏ I σ P Σ ➓ Theorem The optimization is consistent under the same conditions of the spectral method ➓ The loss is non-convex due to V n B σ and constraint V ❏ n V n ✏ I ➓ Spectral method equivalent to 1. Choosing V n using SVD 2. Optimizing t B σ ✉ with fixed V n Intuition about the Loss Function ➓ Minimize the ℓ 2 norm of the unexplained (finite set of) futures when a symbol σ is generated and the transition is explained using B σ ( over a finite set of pasts ) ➓ Strongly based on the markovianity of the process – which generic ML does not exploit

A Convex Relaxation of the Local Loss ➓ For algorithmic purposes a convex local loss function is more desirable ➓ A relaxation can be obtained by replacing the projection V n with a regularization term ➦ σ P Σ ⑥ HV n B σ ✁ H σ V n ⑥ 2 min t B σ ✉ , V ❏ n V n ✏ I F ➓ 1. fix n ✏ ⑤ S ⑤ and take V n ✏ I ➓ ➒ 2. B Σ ✏ r B σ 1 ⑤ ☎ ☎ ☎ ⑤ B σ k s and H Σ ✏ r H σ 1 ⑤ ☎ ☎ ☎ ⑤ H σ k s 3. regularize via nuclear norm to emulate V n min B Σ ⑥ HB Σ ✁ H Σ ⑥ 2 F � τ ⑥ B Σ ⑥ ✝ ➓ This optimization is convex and has some interesting theoretical (see paper) and empirical properties

Experimental Results with the Convex Local Loss Performing experiments with synthetic targets the following facts are observed ➓ Tuning the regularization parameter τ a better trade-off between generalization and model complexity can be achieved ➓ The largest gains when using the convex relaxation are attained for targets suposedly hard to the spectral method 0.09 0.1 SVD n=1 SVD SVD n=2 0.09 CO SVD n=3 difference 0.08 SVD n=4 0.08 SVD n=5 0.07 0.07 CO 0.06 0.06 L1 error L1 error 0.05 0.04 0.05 0.03 0.04 0.02 0.01 0.03 0 0.02 -0.01 0 500 1000 1500 2000 2500 3000 1e-05 0.0001 0.001 0.01 0.1 1 tau minimum singular value of target model

The Hankel Matrix For any function f : Σ ✍ Ñ R its Hankel matrix H f P R Σ ✍ ✂ Σ ✍ is defined as H f ♣ p , s q ✏ f ♣ p ☎ s q Σ λ a b aa ab ... 1 0.3 0.7 0.05 0.25 . . . λ H 0.3 0.05 0.25 0.02 0.03 . . . a 0.7 0.6 0.1 0.03 0.2 . . . b 0.05 0.02 0.03 0.017 0.003 . . . aa H a 0.25 0.23 0.02 0.11 0.12 . . . ab . . . . . . ... . . . . . . . . . . . . ➓ Blocks defined by sets of rows (prefixes P ) and columns (suffixes S ) ➓ Can parametrize the spectral method by P and S taking H P R P ✂ S ➓ Each pair ♣ P , S q defines a different local loss function

Consistency of the Local Loss Theorem (Schützenberger ’61) rank ♣ H f q ✏ n iff f can be computed with operators A σ P R n ✂ n Consequences ➓ The spectral method is consistent iff rank ♣ H q ✏ rank ♣ H f q ✏ n ➓ There always exist ⑤ P ⑤ ✏ ⑤ S ⑤ ✏ n with rank ♣ H q ✏ n Trade-off ➓ Larger P and S more likely to have rank ♣ H q ✏ n , but also require larger samples for good estimation ♣ H Question ➓ Given a sample, how to choose good P and S ? Answer ➓ Random sampling succeeds w.h.p. with ⑤ P ⑤ and ⑤ S ⑤ depending polynomially on the complexity of the target

Visit us at poster 53

Local Loss Optimization in Operator Models: A New Insight into Spectral Learning Borja Balle , Ariadna Quattoni, Xavier Carreras ICML 2012 June 2012, Edinburgh This work is partially supported by the PASCAL2 Network and a Google Research Award

Local Loss Optimization in Operator Models: A New Insight into - PowerPoint PPT Presentation

Local Loss Optimization in Operator Models: A New Insight into Spectral Learning Borja Balle , Ariadna Quattoni, Xavier Carreras ICML 2012 June 2012, Edinburgh This work is partially supported by the PASCAL2 Network and a Google Research Award

Indicative Rating. . www.arcratings.com LOCAL EXPERTISE, SHARED INSIGHT LOCAL EXPERTISE, SHARED

Shuffle algebra perspective on operator valued probability theory 30 mars 2020 1/25 Operator

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

Optimization of the Poisson Operator Optimization of the Poisson Operator in Chombo in Chombo

Youth Insight Developing our understanding 1 Agenda and outcomes Agenda Reminder of youth

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

National Rail Passenger Survey ( NRPS ) Ian Wright Keith Bailey Head of insight Senior insight

Personal Robots Group Lab Projects Overview by Mikey Siegel 6. Insight Presentation | September

EE359 Lecture 2 Outline TX and RX Signal Models Path Loss Models Free-space and

Sector New Y Sector Ne w Yor ork Risk Insight Risk Insight Define maritime risks

OPERATOR OF NEW INDUSTRY CITIES A LEADING NEW INDUSTRY CITY OPERATOR CFLD ONE New Industry City

RADIO PROPAGATION MODELS 1 Radio Propagation Models 1 Path Loss Free Space Loss

Hearing Loss Hearing Loss and and Relationships Relationships Shanna Groves and Melissa Frye

Repetitive Loss Properties and the CRS NFIP/Community Rating System Visual 10.1 Repetitive Loss

What is it? A casualty loss is defined as the damage, destruction, or loss of property

/k /keZ eZ Dharma By Dr. Suryanarayana Nanda 2016-06-26 for Arya Samaj Greater Houston 1 ds

Advanced Macroeconomics 6. Rational Expectations and Consumption Karl Whelan School of

Democratic renewal: Background & Aims Methodology Three separate literature searches

Structural Resolution Katya Komendantskaya School of Computing, University of Dundee, UK 07 May

Training Throughout the VISTA Lifecycle Dial: 866-267-3008 Connecting to Audio Dial:

Human Insight Tools for Client-Centered Program Design December 6, 2017 Welcome Carmen Shorter

References: The content and information has been compiled from various sources: - Irish Institute

Introduction Context is a new graduate level course Approaches to Enquiry, a required

Local Loss Optimization in Operator Models: A New Insight into - PowerPoint PPT Presentation

Local Loss Optimization in Operator Models: A New Insight into Spectral Learning Borja Balle , Ariadna Quattoni, Xavier Carreras ICML 2012 June 2012, Edinburgh This work is partially supported by the PASCAL2 Network and a Google Research Award

Indicative Rating. . www.arcratings.com LOCAL EXPERTISE, SHARED INSIGHT LOCAL EXPERTISE, SHARED

Shuffle algebra perspective on operator valued probability theory 30 mars 2020 1/25 Operator

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

Optimization of the Poisson Operator Optimization of the Poisson Operator in Chombo in Chombo

Youth Insight Developing our understanding 1 Agenda and outcomes Agenda Reminder of youth

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

National Rail Passenger Survey ( NRPS ) Ian Wright Keith Bailey Head of insight Senior insight

Personal Robots Group Lab Projects Overview by Mikey Siegel 6. Insight Presentation | September

EE359 Lecture 2 Outline TX and RX Signal Models Path Loss Models Free-space and

Sector New Y Sector Ne w Yor ork Risk Insight Risk Insight Define maritime risks

OPERATOR OF NEW INDUSTRY CITIES A LEADING NEW INDUSTRY CITY OPERATOR CFLD ONE New Industry City

RADIO PROPAGATION MODELS 1 Radio Propagation Models 1 Path Loss Free Space Loss

Hearing Loss Hearing Loss and and Relationships Relationships Shanna Groves and Melissa Frye

Repetitive Loss Properties and the CRS NFIP/Community Rating System Visual 10.1 Repetitive Loss

What is it? A casualty loss is defined as the damage, destruction, or loss of property

/k /keZ eZ Dharma By Dr. Suryanarayana Nanda 2016-06-26 for Arya Samaj Greater Houston 1 ds

Advanced Macroeconomics 6. Rational Expectations and Consumption Karl Whelan School of

Democratic renewal: Background &amp; Aims Methodology Three separate literature searches

Structural Resolution Katya Komendantskaya School of Computing, University of Dundee, UK 07 May

Training Throughout the VISTA Lifecycle Dial: 866-267-3008 Connecting to Audio Dial:

Human Insight Tools for Client-Centered Program Design December 6, 2017 Welcome Carmen Shorter

References: The content and information has been compiled from various sources: - Irish Institute

Introduction Context is a new graduate level course Approaches to Enquiry, a required

Democratic renewal: Background & Aims Methodology Three separate literature searches