Analysis of patterns and minimal embeddings of non-Markovian - PowerPoint PPT Presentation

Analysis of patterns and minimal embeddings of non-Markovian sequences Manuel.Lladser@Colorado.EDU Department of Applied Mathematics University of Colorado Boulder AofA - April 13 2008 1

NOTATION & TERMINOLOGY. A is a finite alphabet A ∗ is the set of all words of finite length A language is a set L ⊂ A ∗ X = ( X n ) n ≥ 1 is a sequence of A -valued random variables X may be non-Markovian X 1 · · · X l models a random word of length l 2

PARADIGM. For various probabilistic models for X and languages L the frequency statistics of L are asymptotically normal. 0 1 @ number of prefixes in X 1 · · · X n S L n := A that belong to the language L The paradigm applies for: • generalized patterns ⊕ i.i.d. models [BenKoch93] • simple patterns ⊕ stationary Markovian models [RegSzp98] • primitive patterns ⊕ k -order Markovian models [NicSalFla02, Nic03] • primitive patterns ⊕ nice dynamical sources [BouVal02, BouVal06] • hidden patterns ⊕ i.i.d. models [FlaSpaVal06] 3

THE MARKOV CHAIN EMBEDDING TECHNIQUE. IF X is a homogeneous Markov chain IF L is a regular language IF G = ( V, A , f, q, T ) is a DFA that recognizes L IF the embedding of X into G i.e. the stochastic process X G n := f ( q, X 1 · · · X n ) is a first-order homogenous Markov chain THEN    number of visits the embedded process S L n =  X G makes to T in the first n -steps 4

EXAMPLE. Consider a 1-st order Markov chain X such that P [ X 1 = a ] = µ ; P [ X 1 = b ] = (1 − µ ); P [ X n +1 = a | X n = a ] = p ; P [ X n +1 = b | X n = a ] = (1 − p ); P [ X n +1 = a | X n = b ] = q ; P [ X n +1 = b | X n = b ] = (1 − q ) . Then the embedding of X into the Aho-Corasick automaton a a b a ab b a a a b abb abba ǫ b b a b ba a b b that recognizes matches with the regular expression { a, b } ∗ { ba, abba } i.e. all words of the form x = ...ba or x = ...abba is a 1-st order Markov chain. 5

a a b a ab b a a a b abb abba ǫ b b a b ba a b b p p (1 − p ) (1 − p ) 1 3 µ q q 5 6 (1 − q ) (1 − µ ) p (1 − p ) 2 4 q (1 − q ) (1 − q ) 6

What about a completely general sequence X ? 7

EXAMPLE. A seemingly unbiassed coin. Let 0 < p < 1 / 2 Consider the random binary sequence X = ( X n ) n ≥ 1 such that n  1 X i > 1 � Bernoulli ( p ) ,  n 2   i =1   n  d 1 X i = 1 � Bernoulli (1 / 2) , X n +1 = 2 n i =1  n   1 X i < 1 �  Bernoulli (1 − p ) ,   2 n i =1 Question. Is there a Markovian structure where X can be embedded into for analyzing the asymptotic distribution of the frequency statistics of a given language? 8

GENERAL SETTING. Given • a possibly non-Markovian sequence X • a possibly non-regular language L • a transformation R : A ∗ → S define X R to be the stochastic process X R n := R ( X 1 · · · X n ) Question 1. What conditions are necessary and sufficient in order for X R to be Markovian? Question 2. Given a pattern L , is there a transformation R such that X R is Markovian but also informative of the distribution of the frequency statistics of L ? 9

REMARK. The Markovianity or non-Markovianity of X R n := R ( X 1 · · · X n ) , n ≥ 1 does not really depend on the range of R The above motivates to think of R : A ∗ → S as an equivalence relation over A ∗ : u R v ⇐ ⇒ R ( u ) = R ( v ) • R ( u ) is the unique equivalence class of R that contains u • c ∈ R means that c is an equivalence class of R 10

DEFINITION. X is embedable w.r.t. R provided that for all u, v ∈ A ∗ and c ∈ R , if u R v then � � P [ X = uα... | X = u... ] = P [ X = vα... | X = v... ] α ∈A : R ( uα )= c α ∈A : R ( vα )= c 11

DEFINITION. X is embedable w.r.t. R provided that for all u, v ∈ A ∗ and c ∈ R , if u R v then � � P [ X = uα... | X = u... ] = P [ X = vα... | X = v... ] α ∈A : R ( uα )= c α ∈A : R ( vα )= c Figure. Schematic partition of { 0 , 1 , 2 } ∗ into equivalence classes 12

DEFINITION. X is embedable w.r.t. R provided that for all u, v ∈ A ∗ and c ∈ R , if u R v then � � P [ X = uα... | X = u... ] = P [ X = vα... | X = v... ] α ∈A : R ( uα )= c α ∈A : R ( vα )= c v u Figure. Schematic partition of { 0 , 1 , 2 } ∗ into equivalence classes 13

DEFINITION. X is embedable w.r.t. R provided that for all u, v ∈ A ∗ and c ∈ R , if u R v then � � P [ X = uα... | X = u... ] = P [ X = vα... | X = v... ] α ∈A : R ( uα )= c α ∈A : R ( vα )= c v u u0 u2 v2 v1 u1 v0 Figure. Schematic partition of { 0 , 1 , 2 } ∗ into equivalence classes 14

DEFINITION. X is embedable w.r.t. R provided that for all u, v ∈ A ∗ and c ∈ R , if u R v then � � P [ X = uα... | X = u... ] = P [ X = vα... | X = v... ] α ∈A : R ( uα )= c α ∈A : R ( vα )= c v u .3 .4 .7 u0 u2 v2 v1 u1 v0 Figure. Schematic partition of { 0 , 1 , 2 } ∗ into equivalence classes 15

v u .3 .4 .7 u0 u2 v2 v1 u1 v0 THEOREM A. X is embedable w.r.t. R if and only if, for x ∈ A ∗ , if we condition on having X = x... then the stochastic process X R n := R ( X 1 · · · X n ) , n ≥ | x | , is a first-order homogeneous Markov chain with transition probabilities that do not depend on x THEOREM B. For each equivalence relation R in A ∗ , there exists a unique coarsest refinement R ′ of R w.r.t. which X is embedable 16

APPLICATION/QUESTION. What is the smallest state-space for studying the frequency statistics of a language L in X ? − → X = a b b a b . . . (original sequence) → X R − = 1 0 0 1 0 . . . (non-Markovian encoding) X R ′ = 0 4 6 3 4 . . . (optimal Markovian encoding) X Q = 6 3 18 15 10 . . . (any other Markovian encoding) A*/L L ab a abbab abb abba Figure. Partition R = {L , A ∗ \ L} s.t. X R is non-Markovian 17

APPLICATION/QUESTION. What is the smallest state-space for studying the frequency statistics of a language L in X ? − → X = a b b a b . . . (original sequence) X R = 1 0 0 1 0 . . . (non-Markovian encoding) → X R ′ − = 0 4 6 3 4 . . . (optimal Markovian encoding) X Q = 6 3 18 15 10 . . . (any other Markovian encoding) L A*/L (0) (1) (2) (4) (5) ab a (6) abbab (3) abb abba Figure. Coarsest refinement R ′ of R w.r.t. which X is embedable 18

APPLICATION/QUESTION. What is the smallest state-space for studying the frequency statistics of a language L in X ? − → X = a b b a b . . . (original sequence) X R = 1 0 0 1 0 . . . (non-Markovian encoding) X R ′ = 0 4 6 3 4 . . . (optimal Markovian encoding) → X Q − = 6 3 18 15 10 . . . (any other Markovian encoding) A*/L L (0) (1) (2) (3) (4) (5) ab (6) (8) (9) (7) a (10) (11) abbab (12) (13) (18) (14) (15) (16) (17) abb abba Figure. Arbitrary refinement Q of R w.r.t. which X is embedable 19

REMARK. The optimal refinement R ′ of R such that X R ′ is embedable is obtained through a limiting process: this makes it almost impossible to characterize de equivalence classes of R ′ Motivated by this we will introduce an embedding which—while not as optimal—it is analytically tractable (!) 20

DEFINITION. The Markov relation induced by X into A ∗ is the equivalence relation defined as uR X v ⇔ ( ∀ w ∈ A ∗ ) : P [ X = uw... | X = u... ]= P [ X = vw... | X = v... ] 21

DEFINITION. The Markov relation induced by X into A ∗ is the equivalence relation defined as uR X v ⇔ ( ∀ w ∈ A ∗ ) : P [ X = uw... | X = u... ]= P [ X = vw... | X = v... ] 00 .4 0 .6 01 .8 u ε .2 10 .5 1 v .5 11 Figure. Weighted tree visualization of definition with A = { 0 , 1 } 22

.4 00 0 .6 01 .8 u ε .2 10 .5 1 v .5 11 An equivalence relation R is said to be right-invariant if for all u, v ∈ A ∗ and α ∈ A : R ( u ) = R ( v ) = ⇒ R ( uα ) = R ( vα ) THEOREM C. X is embedable w.r.t. any right-invariant equivalence relation that is a refinement of R X ; in particular, X is embedable w.r.t. R X 23

EXAMPLE. Back to the seemingly unbiassed coin. For 0 < p < 1 / 2, define n 8 1 X i > 1 P Bernoulli( p ) , > 2 n > > i =1 > > n < d 1 X i = 1 P Bernoulli(1 / 2) , X n +1 = 2 n i =1 > n > > 1 X i < 1 P > Bernoulli(1 − p ) , > : 2 n i =1 We aim to understand the frequency statistics of { 0 , 1 } ∗ { 1 } , L 1 = { 0 } ∗ { 1 }{ 0 } ∗ ( { 1 }{ 0 } ∗ { 1 }{ 0 } ∗ ) ∗ L 2 = within X 24

PROPOSITION. R : { 0 , 1 } ∗ → Z defined as 8 9 | x | | x | | x | x i − | x | < = X X X R ( x ) = 2 ; = x i − (1 − x i ) 2 : i =1 i =1 i =1 is a right-invariant refinement of R X . In particular, X R n := R ( X 1 · · · X n ) is a first-order homogeneous Markov chain p (1-p) 1/2 1/2 (1-p) p n<0 n>0 0 X R is recurrent , with period 2. Because 0 < p < 1 / 2, X R is positive recurrent ; in particular, there exists a stationary distribution π . Observe that n X S L 1 = X i n i =1 25

n S L 1 P = X i n i =1 COROLLARY A. If U and V are Z -valued random variables such that P [ U = n ] = 2 · π ( n ) , n = 0( mod 2); P [ V = n ] = 2 · π ( n ) , n = 1( mod 2); then for L 1 := { 0 , 1 } ∗ { 1 } it applies that  S L 1 ff − 1 d n lim 2 n · = U ; n 2 n →∞ n =0(mod 2)  S L 1 ff − 1 d n lim 2 n · = V. n 2 n →∞ n =1(mod 2) 26

Analysis of patterns and minimal embeddings of non-Markovian - PowerPoint PPT Presentation

Analysis of patterns and minimal embeddings of non-Markovian sequences Manuel.Lladser@Colorado.EDU Department of Applied Mathematics University of Colorado Boulder AofA - April 13 2008 1 NOTATION & TERMINOLOGY. A is a finite alphabet A

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Coarse Classification of Binary Minimal Clones Zarathustra Brady Minimal clones A clone C is

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

From minimal non-abelian subgroups finite non-abeian to finite non-abeian p -groups p -groups

Dimers and embeddings Marianna Russkikh MIT Based on: [KLRR] Dimers and circle patterns

Synthetic Minimal Chromosome 2010 CBNU-KOREA team genetic information necessary and sufficient

A toy example in Minimal Model Program In minimal model program for 3-folds, Mori connected

Finding Software Bugs Using Active Automata Learning Frits Vaandrager Radboud University

Avoiding Dead States in Query Learning of Regular Tree Languages Frank Drewes work

Some remarks on Bisimulation and Coinduction Davide Sangiorgi University of Bologna Email:

A compact proof of decidability for regular expression equivalence ITP 2012 Princeton, USA

Compiler Construction Lecture 4: Lexical analysis in the real world 2020-01-17 Michael Engel

C4.1 Pumping Lemma Regular NFAs Languages Automata & Regular Formal Languages

Reverse Mathematics and Field Extensions Preliminary Report ois Dorais, Jeff Hirst 1 , Paul

The Complexity of Semiautomatic Structures Sanjay Jain, Singapore Bakhadyr Khoussainov, Auckland

Analysis of patterns and minimal embeddings of non-Markovian - PowerPoint PPT Presentation

Analysis of patterns and minimal embeddings of non-Markovian sequences Manuel.Lladser@Colorado.EDU Department of Applied Mathematics University of Colorado Boulder AofA - April 13 2008 1 NOTATION & TERMINOLOGY. A is a finite alphabet A

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Coarse Classification of Binary Minimal Clones Zarathustra Brady Minimal clones A clone C is

Factory Patterns: Factory Method and Abstract Factory Design Patterns In Java Bob Tarr

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

From minimal non-abelian subgroups finite non-abeian to finite non-abeian p -groups p -groups

Dimers and embeddings Marianna Russkikh MIT Based on: [KLRR] Dimers and circle patterns

Synthetic Minimal Chromosome 2010 CBNU-KOREA team genetic information necessary and sufficient

A toy example in Minimal Model Program In minimal model program for 3-folds, Mori connected

Finding Software Bugs Using Active Automata Learning Frits Vaandrager Radboud University

Avoiding Dead States in Query Learning of Regular Tree Languages Frank Drewes work

Some remarks on Bisimulation and Coinduction Davide Sangiorgi University of Bologna Email:

A compact proof of decidability for regular expression equivalence ITP 2012 Princeton, USA

Compiler Construction Lecture 4: Lexical analysis in the real world 2020-01-17 Michael Engel

C4.1 Pumping Lemma Regular NFAs Languages Automata &amp; Regular Formal Languages

Reverse Mathematics and Field Extensions Preliminary Report ois Dorais, Jeff Hirst 1 , Paul

The Complexity of Semiautomatic Structures Sanjay Jain, Singapore Bakhadyr Khoussainov, Auckland

C4.1 Pumping Lemma Regular NFAs Languages Automata & Regular Formal Languages