splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel - PowerPoint PPT Presentation

splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications Yoav Goldberg Michael Elhadad university-logo ACL 2008, Columbus, Ohio Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

Introduction Support Vector Machines SVMs are supervised binary classifiers Max-margin linear classification Can perform non-linear classification by use of a kernel function SVMs in NLP SVM classifiers are used in many NLP applications Such applications usually involve a great number of binary valued features Using d th-order polynomial kernel amounts to effectively consider all d -tuples of features Low-degree (2-3) Polynomial Kernels constantly produce university-logo state-of-the-art results Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

The Problem Kernel-SVMs are slow! Computation of kernel-based classifier decision is expensive! Can grow linearly with size of training data. Non-kernel classifiers are orders of magnitude faster. We are not talking about learning, we are talking about the decision for a given model. Enter splitSVM We propose a method for speeding up the computation of low-degree polynomial kernel classifiers for NLP applications, while still computing the exact decision function, and with a modest memory overhead university-logo Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

Kernel Decision Function Computation �� y ( x ) = sgn x j ∈ SV y j α j K ( x j , x ) + b In every classification, A Set of Support Vectors . the kernel function must be computed for Each support vector is a each Support Vector weighted instance from the training set. There typically are many such vectors. university-logo Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

Decision Function Computation - Polynomial Kernel �� x j ∈ SV y j α j ( γ x · x j + c ) d + b y ( x ) = sgn The polynomial kernel of degree d Proportional to the number of d -tuples of features the classified item and the sv have in common. university-logo Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

Polynomial Kernel Speedup 1 �� x j ∈ SV y j α j ( γ x · x j + c ) d + b y ( x ) = sgn Speedup method 1 – PKI (Kudo and Matsumoto 2003) Feature vectors are sparse If the classified item and an sv don’t share any features, we can skip the kernel computation for this sv ⇒ Keep an inverted index of feature → sv , and use it to find only the relevant sv s for each item Problem: the Zipfian distribution of language Language data has a Zipfian distribution ⇒ There is a small number of very frequent features W:’a’ , POS:NN , POS:VB university-logo ⇒ PKI pruning does not remove many sv s . . . Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

Polynomial Kernel Speedup 2 y ( x ) = sgn ( w · x d + b ) Speedup method 2 – Kernel Expansion (Isozaki and Kazawa, 2002) ⇒ transform the d -degree polynomial classifier into a linear one in the kernel space At classification time: transform the instance to be classified into the d -tuple space, and perform linear classification (each weight in w corresponds to a specific d − tuple ) Problem: the Zipfian distribution of language Language data has a Zipfian distribution ⇒ There is a huge number of very in frequent features W:calculation , W:polynomial , W:ACL ⇒ The number of d -tuples is Huge! university-logo Storing w is impractical Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

Our Solution: splitSVM This work: splitSVM Features have Zipfian distribution ⇒ Split the features into rare and common features Perform PKI inverted indexing on the rare features Perform Kernel Expansion on the common features Combine the result into a single decision For the math, see the paper university-logo Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

Software Toolkit Java Software Available We provide a Java implementation: splitSVM We provide the same interface as common SVM packages (libsvm, yamcha) In order to use splitSVM in your application: Train a libsvm/ svm light /tinySVM/yamcha model as you did before Convert the model to our splitSVM format Change 2 lines in your code university-logo Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

A Testcase - Speeding up MaltParser MaltParser (Nivre et.al., , 2006) A state of the art dependency parser Java implementation is freely available Uses 2nd degree polynomial kernel for classification Uses libsvm as classification engine . . . is a bit slow. . . Enter splitSVM We use the pre-trained English models We replaced the libsvm classifier with splitSVM (Rare features: those in less than 0.5% of the SV s) university-logo Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

A Testcase - Speeding up MaltParser Method Mem. Parsing Time Sents/Sec Libsvm 240MB 2166 (sec) 1.73 ThisPaper 750MB 70 (sec) 53 Table: Parsing Time for WSJ Sections 23-24 (3762 sentences), on Pentium M, 1.73GHz Only 3 fold memory increase ∼ 30 times faster A Java-based parser parsing > 50 sentences / sec! university-logo Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

To Conclude Simple idea. Works great. Simple to use. Use it. http://www.cs.bgu.ac.il/ ∼ nlpproj/splitsvm university-logo Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel - PowerPoint PPT Presentation

splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications Yoav Goldberg Michael Elhadad university-logo ACL 2008, Columbus, Ohio Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

Heuristic Search Lucia Moura Winter 2018 Heuristic Search Lucia Moura Heuristic Search Intro

Heuristic Search Heuristic Search Best-First A * Heuristic Functions Some material

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

On Kauffman polynomial of alternating knot and HOMFLY polynomial of its Whitehead double

PATTERN RECOGNITION AND MACHINE LEARNING Polynomial Curve Fitting Sum-of-Squares Error Function 0

Property of the interior polynomial from the HOMFLY polynomial

Solving Evacuation Problems in Polynomial Space Miriam Schlter & Martin Skutella

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Twisted Alexander Polynomial Revisited Masaaki Wada September 15, 2010 Abstract The heuristic

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

Polynomial Space The classes PS and NPS Relationship to Other Classes Equivalence PS = NPS A

Polynomial Resultants Henry Woody May 2, 2016 The Resultant Polynomial Resultants Henry Woody

Section4.1 Polynomial Functions and Models Introduction Definitions A polynomial function is a

AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Example Handwritten Digit Recognition Polynomial

Polynomial Equations and Inequal- ities We will consider polynomial equations first and assume

12.1 Active Learning: A Review When learning, it may be the case that getting the true labels of

PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION Hypothesis

Meta-parameters of kernel methods and their optimization Petra Vidnerov Roman Neruda

Spectral regularization methods for statistical inverse learning problems G. Blanchard

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

Knapsack Problem Items have weights and values The problem is to maximize total value

- Richard E. Bellman Origins A method for solving complex problems by breaking them into

Greedy algorithms Announcements Programming assignment 1 posted - need to submit a .sh file The

splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel - PowerPoint PPT Presentation

splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP Applications Yoav Goldberg Michael Elhadad university-logo ACL 2008, Columbus, Ohio Yoav Goldberg, Michael Elhadad splitSVM: Fast SVM Decoder

Heuristic Search Lucia Moura Winter 2018 Heuristic Search Lucia Moura Heuristic Search Intro

Heuristic Search Heuristic Search Best-First A * Heuristic Functions Some material

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

On Kauffman polynomial of alternating knot and HOMFLY polynomial of its Whitehead double

PATTERN RECOGNITION AND MACHINE LEARNING Polynomial Curve Fitting Sum-of-Squares Error Function 0

Property of the interior polynomial from the HOMFLY polynomial

Solving Evacuation Problems in Polynomial Space Miriam Schlter &amp; Martin Skutella

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Twisted Alexander Polynomial Revisited Masaaki Wada September 15, 2010 Abstract The heuristic

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

Polynomial Space The classes PS and NPS Relationship to Other Classes Equivalence PS = NPS A

Polynomial Resultants Henry Woody May 2, 2016 The Resultant Polynomial Resultants Henry Woody

Section4.1 Polynomial Functions and Models Introduction Definitions A polynomial function is a

AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Example Handwritten Digit Recognition Polynomial

Polynomial Equations and Inequal- ities We will consider polynomial equations first and assume

12.1 Active Learning: A Review When learning, it may be the case that getting the true labels of

PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION Hypothesis

Meta-parameters of kernel methods and their optimization Petra Vidnerov Roman Neruda

Spectral regularization methods for statistical inverse learning problems G. Blanchard

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

Knapsack Problem Items have weights and values The problem is to maximize total value

- Richard E. Bellman Origins A method for solving complex problems by breaking them into

Greedy algorithms Announcements Programming assignment 1 posted - need to submit a .sh file The

Solving Evacuation Problems in Polynomial Space Miriam Schlter & Martin Skutella