Boosting Algorithm with Sequence-loss Cost Function for Structured - PowerPoint PPT Presentation

Boosting Algorithm with Sequence-loss Cost Function for Structured Prediction Tomasz Kajdanowicz , Przemysław Kazienko, Jan Kraszewski Wroclaw University of Technology, Poland

Outline 1. Introduction to Structured Prediction 2. Problem Description 3. The concept of AdaBoost Seq 4. Experiments 2

Structured prediction Single value prediction Structured prediction • function f maps an input to an • prediction problems with more simple output (binary complex outputs (structured classification , multiclass prediction) classification or regression) Example : Example : problem of predicting whether the problem of predicting weather for next day will or will not be next few days. rainy on the basis of historical weather data. 3

Structured prediction • Structured prediction is a cost-sensitive prediction problem, where output has structure of elements decomposing into variable-length vectors. [Daume] Vector notation is treated as useful encoding not only for sequence labeling problems. 0 1 0 1 1 1 Input = original input + partially produced output (extended notion for feature input space) 4

Structured prediction algorithms • Most algorithms are based on the well know binary classification adapted in the specific way [Nguyen et al.] • Structured perceptron [Collins] – minimal requirements on output space shape – easy to implement – poor generalization • Max-margin Markov Nets [Taskar et al.] – very useful – perform very slow – limited to Hamming loss function 5

Structured prediction algorithms • Conditional Random Fields [Lafferty et al.] • extention of logistic regression to the structured outputs • probabilistic outputs • good generalization • relatively slow • Support Vector Machine for Interdependent and Structured Outputs (SVM STRUCT ) [Tsochantaridis et al.] • more loss functions 6

Ensembles • Combined may be better – the goal is to select the right component for building a good hybrid system – Lotfi Zadeh is reputed to have said: Good combined system is like Bad combined system is like British Police British Cuisine German Mechanics German Police French Cuisine French Mechanics Swiss Banking Italian Banking Italian Love Swiss Love 7

Problem Description prediction of sequential values • for single case a attributes output sequence of output values 8

Problem Statement • Binary sequence classification problem f : X  Y where : X – vector input, Y - variable-length vector ( y 1 , y 2 , ..., y T ) i  {-1,1} y μ • where i =1,2,…, N – number of observations μ =1,2,…, T – length of sequence 9

Problem Statement • Goal : T classifiers combined: – optimally designed linear combination – K base classifiers of the form K           F x x ; k k where  k 1 Φ ( x , Θ k ) - k th base classifier Θ k - parameters of k th classifier  k - weight associated to the k th classifier 10

General Idea of AdaBoost Seq Attributes 1 2 3 4 5 6 7 8 case 1 case 2 case 3 . . . And so on.. . . . . . . case N input target 11

AdaBoost Seq • A novel algorithm for sequence prediction • Optimization for each sequence item:   N      arg min exp y F x i i   ; ; k : 1 , K  k k i 1 • Equation is highly complex => a stage-wise suboptimal method is performed 12

AdaBoost Seq • By definition of the m th partial sum: m            F x x ; , m 1 , 2 ,..., K m k k  k 1 • The recurence is obvious:              F x F x x ;  m m m m 1 • Stagewise optimization – m th step , F m-1 (x) is part of the previous step          – the new target is: , arg min J , m m   , 13

AdaBoost Seq     N                         J , exp y F x 1 y R ( x ) x ;  i i i m i i m 1  i 1 where   - impact function denoting the influence of R m the quality of preceding sequence labels  m 1  prediction      ( ) ( ) R x R x m i i  i 1    F ( x ) 1 i y   K i 1  j    j 1 R ( x )   14 1

AdaBoost Seq N   • For given α :           ( m ) arg min w exp y x ; i i i  i 1                 ( m ) w exp y F x 1 y R ( x )  i i m 1 i i (m) does not depend neighter on α nor • Because w i Ф (x i ; Θ ), it can be threated as a weigth of x i • Binary nature of base classifier:   N             ( m )  arg min P w I 1 y x ; m m i i i     i 1   0 , if x 0   I ( x ) P m – weighted empirical error   1 , if x 0

AdaBoost Seq • Computing base classifier at step m: N   ( m ) w P i m      y x ; 0 i i m N    ( m ) w 1 P i m      y x ; 0 i i m 16

AdaBoost Seq • Getting equations together:          arg min exp( )( 1 P ) exp( ) P m m m P m  • derivative: 1 1 P   m ln m 2 P m 17

AdaBoost Seq • Weight of the i th case:                ( m ) w exp y x ; 1 R ( x )   ( m 1 ) i i m i m m w i Z m • Z m – normalizator:   N                ( m ) Z w exp y x ; 1 R ( x ) m i i m i m m  i 1 18

Algorithm AdaBoost Seq • For each sequence position ( μ =1 to T ) – Initialization: w i (1) =1/N, i=1,2,...,N; m=1 – While termination criterion is not met: • obtain optimal Θ m and Ф ( ∙ ; Θ m ) (min. P m ) • obtain optimal Pm • α m =1/2ln((1-P m )/P m ) • Z m = 0.0 • For i = 1 do N (m) exp(-y i  α m Ф (x i ; Θ m )-(1-  ) α m R μ (x)) – w i (m+1) = w i – Z m =Z m +w i (m+1) • End For • For i = 1 do N – w i (m+1) = w i (m) /Z m • End For • K = m; m = m+1 – End while – f μ ( ∙)=sign(Σ K k=1 α k Ф ( ∙ ; Θ k ) ) • End for 19

Profile of AdaBoost Seq • A new algorithm for sequence prediction • For each sequence item – AdaBoost Seq considers also prediction errors for all previous items in the sequence within the boosting algorithm – the more errors on previous sequence items, the stronger focus on bad cases at the recent item • Self-adaptive 20

Experiments • 4019 cases in the dataset • 20 input features • Sequence lenght=10 • Decision stump as the base classifier • 10 fold cross-validation 21

AdaBoost vs. AdaBoost Seq (with  ) Mean Absolute Error 0,12 Ssequence mean absolute error 0,0922 0,1011 0,0945 0,1 0,0872 0,0828 0,0810 0,0762 0,08  =0.6 0,06 the best 0,04 0,02 0 ξ = 0.4 ξ =0.5 ξ =0.6 ξ =0.7 ξ =0.8 ξ =0.9 ξ =1 For  =1 it is a standard AdaBoost (the worst) 22

Summary of the Experiments • For item 2+ error reduced dramatically 0,35 ξ = 0.4 ξ =0.6 ξ =0.8 ξ =1 (6 times!) since it 0,3 0,25 respects errors on previous Mean absolute error items 0,2 •  influences error 0,15 0,1 •  =0.6 error decreases 0,05 by 24% for the whole 0 sequence compared to 1 2 3 4 5 6 7 8 9 10 Sequence item the standard approach (  =1) 23

Conclusions and Future Work • AdaBoost Seq - a new algorithm for sequence prediction based on AdaBoost • While prediction of the following items in sequence, the errors from the previous items are utilized • Much more accurate than AdaBoost applied to sequence items independently • Parametrized,  - how much errors are respected • Recent application: prediction for debt valuation • Future work: new cost functions (on HMM canva) 24

Boosting Algorithm with Sequence-loss Cost Function for Structured - PowerPoint PPT Presentation

Boosting Algorithm with Sequence-loss Cost Function for Structured Prediction Tomasz Kajdanowicz , Przemysaw Kazienko, Jan Kraszewski Wroclaw University of Technology, Poland Outline 1. Introduction to Structured Prediction 2. Problem

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Evaluation function Cost function g g Evaluation function Cost function expand vertex

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

BUILDING ALLIANCES TO MAKE BEHAVIOR ANALYSIS STRONGER THE POLAND MODEL M O N I K A S U C

Introduction from the president of CLIP Group Since 1989 till today Poland is developing rapidly

How far from reality Pawe Krajewski , Grayna Krajewska Central Laboratory for Radiological

Northern Eurasia Earth Science Partnership Initiative (NEESPI): An Overview of its current Status

Community Planning for Resilience SPUR Standards for Disaster Resilience for Buildings and

PKC Group Oyj Investor Presentation Q1/2011 May 2011 Harri Suutari President & CEO 1 1

Human Fall Detection by Mean Shift Combined with Depth Connected Components Michal Kepski and

BLI SS Seeing lithium David Cousins MRC Clinician Scientist Institute of Neuroscience

Boosting Algorithm with Sequence-loss Cost Function for Structured - PowerPoint PPT Presentation

Boosting Algorithm with Sequence-loss Cost Function for Structured Prediction Tomasz Kajdanowicz , Przemysaw Kazienko, Jan Kraszewski Wroclaw University of Technology, Poland Outline 1. Introduction to Structured Prediction 2. Problem

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Evaluation function Cost function g g Evaluation function Cost function expand vertex

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

61A Lecture 30 Announcements Efficient Sequence Processing Sequence Operations 4 Sequence

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

BUILDING ALLIANCES TO MAKE BEHAVIOR ANALYSIS STRONGER THE POLAND MODEL M O N I K A S U C

Introduction from the president of CLIP Group Since 1989 till today Poland is developing rapidly

How far from reality Pawe Krajewski , Grayna Krajewska Central Laboratory for Radiological

Northern Eurasia Earth Science Partnership Initiative (NEESPI): An Overview of its current Status

Community Planning for Resilience SPUR Standards for Disaster Resilience for Buildings and

PKC Group Oyj Investor Presentation Q1/2011 May 2011 Harri Suutari President &amp; CEO 1 1

Human Fall Detection by Mean Shift Combined with Depth Connected Components Michal Kepski and

BLI SS Seeing lithium David Cousins MRC Clinician Scientist Institute of Neuroscience

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

PKC Group Oyj Investor Presentation Q1/2011 May 2011 Harri Suutari President & CEO 1 1