The Empirical Data: Do infants generalize identity rules? - PowerPoint PPT Presentation

de li de ji we ji... Pre-Wiring & Pre-Training : What does a neural network need to learn truly general identity rules? Raquel G. Alhama Willem Zuidema

The Empirical Data: Do infants generalize identity rules? PARTICIPANTS: 7 month old infants [Marcus et al. 1999]

The Empirical Data: Do infants generalize identity rules? PARTICIPANTS: 7 month old infants FAMILIARIZATION: = ji li li ABA: “wi-je-wi le-di-le ji-li-ji … ” = ABB: “wi-je-je le-di-di ji-li-li ...” ji li li [Marcus et al. 1999]

The Empirical Data: Do infants generalize identity rules? PARTICIPANTS: 7 month old infants FAMILIARIZATION: = ABA: “wi-je-wi le-di-le ji-li-ji … ” ji li li = ABB: “wi-je-je le-di-di ji-li-li ...” ji li li TEST: “ba-po-ba ko-ga-ga ba-po-po … ” ABB ABA ABB [Marcus et al. 1999]

The Empirical Data: Do infants generalize identity rules? [Marcus et al. 1999]

The Empirical Data: Do infants generalize identity rules? bapoba bapopo ≠ kogako kogaga RESULT: Difgerential attention between grammars = = = ba po ba wi je wi A B A = = = wi je je A B B ba po po

Modelling the Results = ● Symbolic Cognition X Y Z “XYZ: X is the same as Z” [Marcus et al. 1999]

Modelling the Results = ● Symbolic Cognition X Y Z “XYZ: X is the same as Z” ● Simple Recurrent Network (SRN) – Trained to predict next syllable – Fails to predict novel (test) items [Evaluation: % correct in predicting the third syllable] [Marcus et al. 1999]

Modelling the Results = ● Symbolic Cognition X Y Z “XYZ: X is the same as Z” ● Simple Recurrent Network (SRN) – Trained to predict next syllable – Fails to predict novel (test) items [Evaluation: % correct in predicting the third syllable] A generalizing solution is in the hypothesis space of the SRN – why doesn't it fjnd it? [Marcus et al. 1999]

Simulations with a Simple Recurrent Network Proportion of statistically signifjcant responses to difgerent grammar conditions, out of 400 runs of the model (with difgerent parameter settings)

What is missing in the SRN simulations? ● The SRN was simulated as a tabula rasa – It starts learning from a random state

What is missing in the SRN simulations? ● The SRN was simulated as a tabula rasa – It starts learning from a random state ● Pre-Wiring : What would be a more cognitively plausible initial state? ● Pre-Training : What is the role of prior experience?

Implementation: the Echo State Network ● Same hypothesis space as SRN ● Reservoir Computing approach: only the weights in the output layer are trained (generally with Ridge Regression, but we use Gradient Descent) ● The weights in the reservoir are randomly initialized (with spectral radious < 1 ) ESN, Jaeger (2001) – How can we pre-wire it for this task?

Pre-Wiring: Delay Line Memory ● DELAY LINE MEMORY: mechanism to preserve the input by propagating it in a path with a delay t=0 t=1 t=2 ● Implementation: – “Feed-Forward” structure in the reservoir – Strict or approximated copy

Pre-Wiring: Delay Line Memory

Pre-Wiring: Delay Line Memory Does the model learn the generalized solution?

Simulations with the Delay Line Original Extended

Pre-Training ● There are many solutions that fjt the training data (non-generalizing solutions) ● Where does the pressure to fjnd a general solution come from? – Hypothesis: prior experience with environmental data may have created a domain-general bias for abstract solutions → PRE-TRAINING: Incremental Novelty Exposure

Pre-Training: Incremental Novelty Exposure T R A 1 A 2 A 3 A 4 A I B 1 B 2 B 3 B 4 N I N A i B j A i G C 1 C 2 C 3 C 4 T E S D 1 D 2 D 3 D 4 T C i D j C i

Pre-Training: Incremental Novelty Exposure A 1 B 1 A 5 B 5 T R A 2 A 3 A 4 A 5 A 1 A 2 A 3 A 4 A I B 2 B 3 B 4 B 5 B 1 B 2 B 3 B 4 N I N A i B j A i A i B j A i G C 1 C 2 C 3 C 4 C 1 C 2 C 3 C 4 T E S D 1 D 2 D 3 D 4 D 1 D 2 D 3 D 4 T C i D j C i C i D j C i

Pre-Training: Incremental Novelty Exposure A 1 B 1 A 5 B 5 A 2 B 2 A 6 B 6 T R A 2 A 3 A 4 A 5 A 1 A 2 A 3 A 4 A I ... B 2 B 3 B 4 B 5 B 1 B 2 B 3 B 4 N I N A i B j A i A i B j A i G C 1 C 2 C 3 C 4 C 1 C 2 C 3 C 4 T E ... S D 1 D 2 D 3 D 4 D 1 D 2 D 3 D 4 T C i D j C i C i D j C i

Pre-Training: Incremental Novelty Exposure A 1 B 1 A 5 B 5 A 2 B 2 A 6 B 6 A k-4 B k-4 A k B k T R A k-3 A k-2 A k-1 A k A 2 A 3 A 4 A 5 A 1 A 2 A 3 A 4 A I ... B k-3 B k-2 B k-1 B k B 2 B 3 B 4 B 5 B 1 B 2 B 3 B 4 N I N A i B j A i A i B j A i A i B j A i G C 1 C 2 C 3 C 4 C 1 C 2 C 3 C 4 C 1 C 2 C 3 C 4 T E ... S D 1 D 2 D 3 D 4 D 1 D 2 D 3 D 4 D 1 D 2 D 3 D 4 T C i D j C i C i D j C i C i D j C i

Simulations with Incremental Novelty Exposure Random Random Huge increase in % of correct predictions!

Conclusions ● Finally, simulations of a recurrent network successfully solving the task of Marcus et al. 1999 ● This simple learning problem might hold lessons for more complex architectures solving more complex tasks ● Crucial for success are: (i) Pre-Wiring with a structure that improves memory (cf. LSTM, Memory Networks) (ii) Pre-Training with training regimes that favour generalization (cf. Dropout) Contact: rgalhama@uva.nl https://staff.fnwi.uva.nl/r.garridoalhama/

Efgect of the Delay Line Without Delay Line With Delay Line

The Empirical Data: Do infants generalize identity rules? - PowerPoint PPT Presentation

de li de ji we ji... Pre-Wiring & Pre-Training : What does a neural network need to learn truly general identity rules? Raquel G. Alhama Willem Zuidema The Empirical Data: Do infants generalize identity rules? PARTICIPANTS: 7 month old

Are Infants and Younger Children More Vulnerable? YES. Infants and young children are more

Identity Theft Identity Theft Identity theft occurs when your personal information is stolen

Adopting the global Marketing Lead Domains.coop co-operative identity www.identity.coop 24

Identity and Access Management Using Identity Management and Identity Governance to increase

Inductive Bias: How to generalize on novel data CS 478 - Inductive Bias 1 Non-Linear Tasks l

Research in newborn infants. Ethical aspects, Research in newborn infants. Ethical aspects,

Infants and Toddlers Karen Nemeth, Ed.M. www.languagecastle.com Language Castle

Nurturing Bilingual Infants and Toddlers Karen Nemeth, Ed.M. www.languagecastle.com 1

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of

ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 Big Targets Workshop How can

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Key Escrow free Identity-based Identity-based Cryptosystem Cryptosystem Identity-based

Smart Glasses Fashion Glasses Smart Glasses identity Ephesians 2 Ephesians 2 Ephesians 2

Rules Engine Tool What is the Rules Engine? Alert Proactive Reaction Business Rules Actions

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

Transition Pathways for a Low Carbon Electricity System in the UK Dr Timothy J Foxon

PDP focus session summary Main outcomes of Vista25-NG Essential for Nikhefs impact Stoomboot

How to Become a More Valuable Engineer Video 3 of 3 Presented by: Anthony Fasano, PE Author

How to Read a CS Research Paper Philip W. L. Fong pwlfong@cs.uregina.ca Department of Computer

What We Learn from Cyber Exercises, or Not Jim Duncan CSIRT Coordinator, BB&T 2007 June 20

Table of contents 1. Introduction: You are already an experimentalist 2. Conditions 3. Items

Getting Crowds to Work Leah Birch Naor Brown October 24, 2012 Leah Birch Naor Brown Getting

Genres: Discourse, Speech, and Tweets Sentiment, Subjectivity & Stance Ling 575 April 15,

The Empirical Data: Do infants generalize identity rules? - PowerPoint PPT Presentation

de li de ji we ji... Pre-Wiring & Pre-Training : What does a neural network need to learn truly general identity rules? Raquel G. Alhama Willem Zuidema The Empirical Data: Do infants generalize identity rules? PARTICIPANTS: 7 month old

Are Infants and Younger Children More Vulnerable? YES. Infants and young children are more

Identity Theft Identity Theft Identity theft occurs when your personal information is stolen

Adopting the global Marketing Lead Domains.coop co-operative identity www.identity.coop 24

Identity and Access Management Using Identity Management and Identity Governance to increase

Inductive Bias: How to generalize on novel data CS 478 - Inductive Bias 1 Non-Linear Tasks l

Research in newborn infants. Ethical aspects, Research in newborn infants. Ethical aspects,

Infants and Toddlers Karen Nemeth, Ed.M. www.languagecastle.com Language Castle

Nurturing Bilingual Infants and Toddlers Karen Nemeth, Ed.M. www.languagecastle.com 1

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &amp;

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of

ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 Big Targets Workshop How can

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Key Escrow free Identity-based Identity-based Cryptosystem Cryptosystem Identity-based

Smart Glasses Fashion Glasses Smart Glasses identity Ephesians 2 Ephesians 2 Ephesians 2

Rules Engine Tool What is the Rules Engine? Alert Proactive Reaction Business Rules Actions

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

Transition Pathways for a Low Carbon Electricity System in the UK Dr Timothy J Foxon

PDP focus session summary Main outcomes of Vista25-NG Essential for Nikhefs impact Stoomboot

How to Become a More Valuable Engineer Video 3 of 3 Presented by: Anthony Fasano, PE Author

How to Read a CS Research Paper Philip W. L. Fong pwlfong@cs.uregina.ca Department of Computer

What We Learn from Cyber Exercises, or Not Jim Duncan CSIRT Coordinator, BB&amp;T 2007 June 20

Table of contents 1. Introduction: You are already an experimentalist 2. Conditions 3. Items

Getting Crowds to Work Leah Birch Naor Brown October 24, 2012 Leah Birch Naor Brown Getting

Genres: Discourse, Speech, and Tweets Sentiment, Subjectivity &amp; Stance Ling 575 April 15,

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &

What We Learn from Cyber Exercises, or Not Jim Duncan CSIRT Coordinator, BB&T 2007 June 20

Genres: Discourse, Speech, and Tweets Sentiment, Subjectivity & Stance Ling 575 April 15,