Towards an evolutionary-based approach for natural language - PowerPoint PPT Presentation

Towards an evolutionary-based approach for natural language processing Luca Manzoni, Domagoj Jakobovic, Luca Mariot, Stjepan Picek, Mauro Castelli l.mariot@tudelft.nl GECCO 2020, 8–12 July 2020

Next Word Prediction (NWP) ◮ Task: given an initial sequence of k words w 1 , ··· , w k , complete the sentence by predicting the last word w k + 1 ◮ Exact or plausible prediction? Original completion: table L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Next Word Prediction (NWP) ◮ Task: given an initial sequence of k words w 1 , ··· , w k , complete the sentence by predicting the last word w k + 1 ◮ Exact or plausible prediction? Plausible prediction: chair L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Next Word Prediction (NWP) ◮ Task: given an initial sequence of k words w 1 , ··· , w k , complete the sentence by predicting the last word w k + 1 ◮ Exact or plausible prediction? Plausible (?) prediction: tractor L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Next Word Prediction (NWP) ◮ Task: given an initial sequence of k words w 1 , ··· , w k , complete the sentence by predicting the last word w k + 1 ◮ Exact or plausible prediction? Plausible (?) prediction: tractor ◮ We consider the setting of plausible word predictions with Genetic Programming (GP) L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

NWP with Genetic Programming (GP) To cast NWP as a learning task for GP we need to consider: ◮ Input representation . How can the input words be represented in a suitable way for GP? ◮ Functional operators . What operations can be performed on the representation of the words? ◮ Output interpretation . How can we decode the output of a GP individual and interpret it as a word? L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

GP Input: word2vec embedding ◮ word2vec : a NN-based model that learns a word embedding of a vocabulary over the vector space R d v = ( v 1 , v 2 ) ( 0 , 1 ) u = ( u 1 , u 2 ) θ ( 1 , 0 ) ( 0 , 0 ) v ∈ R d with a high ◮ Similar words u , v are mapped to vectors � u ,� cosine similarity : � d i = 1 � u i � v i sim ( � u ,� v ) = || � u || 2 ·|| � v || 2 L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Evaluation of a GP tree (1) The input words are converted to vectors through the word2vec embedding L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Evaluation of a GP tree (2) The vectors of the input words are fed to the GP tree, and the output vector is evaluated at the root node L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Evaluation of a GP tree (3) The output vector is converted to the most similar word occurring in the vocabulary learned by word2vec L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Evaluation of a GP tree (4) Compute the similarity between the original (target) word and the word predicted by GP L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Fitness Function ◮ The fitness is computed over a training set S of sentences, all with the same number of words k + 1 ◮ A fitness case is thus defined as a pair c = (( w 1 , ··· , w k ) , w k + 1 ) ◮ Each word w i is represented by the vector � w i produced by the word2vec embedding ◮ Fitness of a GP individual T : similarity between target � w k + 1 and the output vector � p k + 1 , averaged over all fitness cases fit ( T ) = 1 � sim ( � w k + 1 ,� p k + 1 ) | S | · c ∈ S L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Training Phase – Experimental Settings Common Parameters : ◮ Dataset: Million News Headlines (MNH) ◮ Headlines length: 6 words (267 292 instances in MNH) ◮ word2vec embedding dimensions: d ∈ { 10 , 15 , 20 , 25 , 50 , 100 } ◮ Training set size per GP run: 2672 (randomly selected from the 267 292 6-word headlines) GP Parameters : ◮ Functional set: + , − , × , / , ( · ) 2 , √· ◮ Population size: 500 individuals ◮ Selection operator: steady-state with 3-tournament operator ◮ Mutation probability: p m = 0 . 3 ◮ Termination criterion: 100000 fitness evaluations ◮ Number of independent runs: 30 L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Is GP learning a language model? ◮ Idea: compare the best GP individuals at the first and last generation, and GP with a random predictor GP vs. Random predictor GP First/Last Generations ◮ Main finding : The GP evolutionary process is able to learn, to a certain extent, a representation of the MNH dataset L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

What is the influence of the word2vec embedding? ◮ Idea: compare the best GP individual with the "trivial" predictors that always generate the first or the last word GP vs. First predictor GP vs. Last predictor ◮ Main finding : Lower embedding dimensions work better. For higher ones, the GP behavior approaches the trivial predictors L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Testing Phase – Experimental Settings ◮ Selected the best GP tree out of 30 runs for each dimension d 10 15 20 25 50 100 size 27 38 39 48 36 27 ◮ Each selected tree was tested over a random sample of 10 000 6-words headlines from the MNH dataset ◮ As in the training phase, the task was to predict the sixth word by reading the first five in input ◮ For each sentence, we computed the similarity between the predicted and the original word L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Example of tree evolved by GP ◮ Example of best individual evolved by GP for embedding dimension d = 10: + + − + + + × √· √· w 2 + ( · ) 2 w 3 + − √· w 0 w 4 w 1 w 4 w 2 w 0 w 2 × w 2 w 1 w 4 L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Testing Results ◮ Distributions of similarity between predicted and original word over the test set: L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Examples of sentences completed by GP ◮ Examples of test headlines completed by the best GP individual for embedding dimension d = 10: Predicted headline Original Regional education to fund youth preschool allowance Aerial footage of flooded Townsville houses homes Greens renew call for tax changes review Napthine to launch new Portland rail marina 4 charged over 10000 jewellery robberies heist Vanstone defends land rights act overhaul changes Community urged to seek infrastructure funds funding Govt. pressured on company tax bureaucracy rates Petition urges probe into abattoir maintenance closure Rain does little for central towns Victoria L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Wrapping Up Learning vs. Exact Prediction : ◮ GP usually predicts a different word than the original one ◮ Not necessarily a drawback: a sentence can have many different meaningful completions ◮ GP can navigate the word2vec embedding and predict words that are aligned with the semantics of the sentence Dimensionality and Fitness : ◮ The embedding dimension has a significant impact on the GP performance: the higher the dimension, the lower the fitness ◮ Neural networks-based models usually employ embeddings with hundreds of dimensions L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Future Directions ◮ Use vector-oriented operators as GP functionals (e.g., rotations) ◮ Probabilistic generation : use an ensemble of GP trees to induce a probability distribution on the word to predict ◮ Extend the approach to text generation (e.g. by using a sliding window approach ) ◮ Co-evolve a population of GP generators and a population of GP discriminators , to distinguish real words from GP ones L. Manzoni, D. Jakobovic, L. Mariot, S. Picek, M. Castelli Towards an evolutionary-based approach for natural language processing

Thank you for your attention!

Towards an evolutionary-based approach for natural language - PowerPoint PPT Presentation

Towards an evolutionary-based approach for natural language processing Luca Manzoni, Domagoj Jakobovic, Luca Mariot, Stjepan Picek, Mauro Castelli l.mariot@tudelft.nl GECCO 2020, 812 July 2020 Next Word Prediction (NWP) Task: given an

Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In this section we will look at

Evolutionary Design By: Dianna Fox and Dan Morris Review 4 main types of Evolutionary

Evolutionary Evolutionary Creation Creation A Christian Approach to Evolution A Christian

6 th A NNUAL H UMIES A WARDS Evolutionary Learning of Local Descriptor Evolutionary

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky & Yoad Lewenberg

Principles and Techniques of Evolutionary Architecture Rebecca Parsons Chief Technology O ffi cer

I t Introduction to d ti t Evolutionary Algorithms Federico Nesti, f.nesti@santannapisa.it

Outline DM812 METAHEURISTICS Lecture 6 Evolutionary Algorithms 1. Evolutionary Algorithms

Models of Language Evolution Session 04 : Evolutionary Game Theory: Evolutionary Dynamics Michael

How competition affects evolutionary rescue: theoretical insight Matthew Osmond Claire de

Model-Based Evolutionary Algorithms Part 1: Estimation of Distribution Algorithms Dirk Thierens

Evolutionary Algorithms General Concepts Prof. Thomas Bck Natural Computing Group

Outcome Based Approach in Outcome Based Approach in Outcome Based Approach in Outcome Based

HIDER: Method and Natural Coding HIDER: Method and Natural Coding Ra l Gir l Gir ldez

DNN#AssistedParameterSpace* ExplorationandVisualizationfor LargeScaleSimulations HA

COMET-CTH simulation Kuno-lab M1 Yoshiki Sato CTH(Cylindrical Trigger Hodoscope) 300 mm

Adversarial Autoencoders Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow,

Particle Physics II CP violation (also known as Physics of Anti-matter) Lecture 5 N.

Adversarial Generation of Time-Frequency Features with application in audio synthesis Speaker:

Suffix arrays: A new method for on-line string searches Udi Manber 1 Gene Myers 2 Department of

response to various frequencies. The reluctance is mostly a hindrance but sometime it can help ! Q

Mathematical background Daniele Carnevale Dipartimento di Ing. Civile ed Ing. Informatica

Towards an evolutionary-based approach for natural language - PowerPoint PPT Presentation

Towards an evolutionary-based approach for natural language processing Luca Manzoni, Domagoj Jakobovic, Luca Mariot, Stjepan Picek, Mauro Castelli l.mariot@tudelft.nl GECCO 2020, 812 July 2020 Next Word Prediction (NWP) Task: given an

Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In this section we will look at

Evolutionary Design By: Dianna Fox and Dan Morris Review 4 main types of Evolutionary

Evolutionary Evolutionary Creation Creation A Christian Approach to Evolution A Christian

6 th A NNUAL H UMIES A WARDS Evolutionary Learning of Local Descriptor Evolutionary

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky &amp; Yoad Lewenberg

Principles and Techniques of Evolutionary Architecture Rebecca Parsons Chief Technology O ffi cer

I t Introduction to d ti t Evolutionary Algorithms Federico Nesti, f.nesti@santannapisa.it

Outline DM812 METAHEURISTICS Lecture 6 Evolutionary Algorithms 1. Evolutionary Algorithms

Models of Language Evolution Session 04 : Evolutionary Game Theory: Evolutionary Dynamics Michael

How competition affects evolutionary rescue: theoretical insight Matthew Osmond Claire de

Model-Based Evolutionary Algorithms Part 1: Estimation of Distribution Algorithms Dirk Thierens

Evolutionary Algorithms General Concepts Prof. Thomas Bck Natural Computing Group

Outcome Based Approach in Outcome Based Approach in Outcome Based Approach in Outcome Based

HIDER: Method and Natural Coding HIDER: Method and Natural Coding Ra l Gir l Gir ldez

DNN#Assisted*Parameter*Space* Exploration*and*Visualization*for* Large*Scale*Simulations HA

COMET-CTH simulation Kuno-lab M1 Yoshiki Sato CTH(Cylindrical Trigger Hodoscope) 300 mm

Adversarial Autoencoders Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow,

Particle Physics II CP violation (also known as Physics of Anti-matter) Lecture 5 N.

Adversarial Generation of Time-Frequency Features with application in audio synthesis Speaker:

Suffix arrays: A new method for on-line string searches Udi Manber 1 Gene Myers 2 Department of

response to various frequencies. The reluctance is mostly a hindrance but sometime it can help ! Q

Mathematical background Daniele Carnevale Dipartimento di Ing. Civile ed Ing. Informatica

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky & Yoad Lewenberg

DNN#AssistedParameterSpace* ExplorationandVisualizationfor LargeScaleSimulations HA