Analysis of patterns and minimal embeddings of non-Markovian sequences Manuel.Lladser@Colorado.EDU Department of Applied Mathematics University of Colorado Boulder AofA - April 13 2008 1
NOTATION & TERMINOLOGY. A is a finite alphabet A ∗ is the set of all words of finite length A language is a set L ⊂ A ∗ X = ( X n ) n ≥ 1 is a sequence of A -valued random variables X may be non-Markovian X 1 · · · X l models a random word of length l 2
PARADIGM. For various probabilistic models for X and languages L the frequency statistics of L are asymptotically normal. 0 1 @ number of prefixes in X 1 · · · X n S L n := A that belong to the language L The paradigm applies for: • generalized patterns ⊕ i.i.d. models [BenKoch93] • simple patterns ⊕ stationary Markovian models [RegSzp98] • primitive patterns ⊕ k -order Markovian models [NicSalFla02, Nic03] • primitive patterns ⊕ nice dynamical sources [BouVal02, BouVal06] • hidden patterns ⊕ i.i.d. models [FlaSpaVal06] 3
THE MARKOV CHAIN EMBEDDING TECHNIQUE. IF X is a homogeneous Markov chain IF L is a regular language IF G = ( V, A , f, q, T ) is a DFA that recognizes L IF the embedding of X into G i.e. the stochastic process X G n := f ( q, X 1 · · · X n ) is a first-order homogenous Markov chain THEN number of visits the embedded process S L n = X G makes to T in the first n -steps 4
EXAMPLE. Consider a 1-st order Markov chain X such that P [ X 1 = a ] = µ ; P [ X 1 = b ] = (1 − µ ); P [ X n +1 = a | X n = a ] = p ; P [ X n +1 = b | X n = a ] = (1 − p ); P [ X n +1 = a | X n = b ] = q ; P [ X n +1 = b | X n = b ] = (1 − q ) . Then the embedding of X into the Aho-Corasick automaton a a b a ab b a a a b abb abba ǫ b b a b ba a b b that recognizes matches with the regular expression { a, b } ∗ { ba, abba } i.e. all words of the form x = ...ba or x = ...abba is a 1-st order Markov chain. 5
a a b a ab b a a a b abb abba ǫ b b a b ba a b b p p (1 − p ) (1 − p ) 1 3 µ q q 5 6 (1 − q ) (1 − µ ) p (1 − p ) 2 4 q (1 − q ) (1 − q ) 6
What about a completely general sequence X ? 7
EXAMPLE. A seemingly unbiassed coin. Let 0 < p < 1 / 2 Consider the random binary sequence X = ( X n ) n ≥ 1 such that n 1 X i > 1 � Bernoulli ( p ) , n 2 i =1 n d 1 X i = 1 � Bernoulli (1 / 2) , X n +1 = 2 n i =1 n 1 X i < 1 � Bernoulli (1 − p ) , 2 n i =1 Question. Is there a Markovian structure where X can be embedded into for analyzing the asymptotic distribution of the frequency statistics of a given language? 8
GENERAL SETTING. Given • a possibly non-Markovian sequence X • a possibly non-regular language L • a transformation R : A ∗ → S define X R to be the stochastic process X R n := R ( X 1 · · · X n ) Question 1. What conditions are necessary and sufficient in order for X R to be Markovian? Question 2. Given a pattern L , is there a transformation R such that X R is Markovian but also informative of the distribution of the frequency statistics of L ? 9
REMARK. The Markovianity or non-Markovianity of X R n := R ( X 1 · · · X n ) , n ≥ 1 does not really depend on the range of R The above motivates to think of R : A ∗ → S as an equivalence relation over A ∗ : u R v ⇐ ⇒ R ( u ) = R ( v ) • R ( u ) is the unique equivalence class of R that contains u • c ∈ R means that c is an equivalence class of R 10
DEFINITION. X is embedable w.r.t. R provided that for all u, v ∈ A ∗ and c ∈ R , if u R v then � � P [ X = uα... | X = u... ] = P [ X = vα... | X = v... ] α ∈A : R ( uα )= c α ∈A : R ( vα )= c 11
DEFINITION. X is embedable w.r.t. R provided that for all u, v ∈ A ∗ and c ∈ R , if u R v then � � P [ X = uα... | X = u... ] = P [ X = vα... | X = v... ] α ∈A : R ( uα )= c α ∈A : R ( vα )= c Figure. Schematic partition of { 0 , 1 , 2 } ∗ into equivalence classes 12
DEFINITION. X is embedable w.r.t. R provided that for all u, v ∈ A ∗ and c ∈ R , if u R v then � � P [ X = uα... | X = u... ] = P [ X = vα... | X = v... ] α ∈A : R ( uα )= c α ∈A : R ( vα )= c v u Figure. Schematic partition of { 0 , 1 , 2 } ∗ into equivalence classes 13
DEFINITION. X is embedable w.r.t. R provided that for all u, v ∈ A ∗ and c ∈ R , if u R v then � � P [ X = uα... | X = u... ] = P [ X = vα... | X = v... ] α ∈A : R ( uα )= c α ∈A : R ( vα )= c v u u0 u2 v2 v1 u1 v0 Figure. Schematic partition of { 0 , 1 , 2 } ∗ into equivalence classes 14
DEFINITION. X is embedable w.r.t. R provided that for all u, v ∈ A ∗ and c ∈ R , if u R v then � � P [ X = uα... | X = u... ] = P [ X = vα... | X = v... ] α ∈A : R ( uα )= c α ∈A : R ( vα )= c v u .3 .4 .7 u0 u2 v2 v1 u1 v0 Figure. Schematic partition of { 0 , 1 , 2 } ∗ into equivalence classes 15
v u .3 .4 .7 u0 u2 v2 v1 u1 v0 THEOREM A. X is embedable w.r.t. R if and only if, for x ∈ A ∗ , if we condition on having X = x... then the stochastic process X R n := R ( X 1 · · · X n ) , n ≥ | x | , is a first-order homogeneous Markov chain with transition probabilities that do not depend on x THEOREM B. For each equivalence relation R in A ∗ , there exists a unique coarsest refinement R ′ of R w.r.t. which X is embedable 16
APPLICATION/QUESTION. What is the smallest state-space for studying the frequency statistics of a language L in X ? − → X = a b b a b . . . (original sequence) → X R − = 1 0 0 1 0 . . . (non-Markovian encoding) X R ′ = 0 4 6 3 4 . . . (optimal Markovian encoding) X Q = 6 3 18 15 10 . . . (any other Markovian encoding) A*/L L ab a abbab abb abba Figure. Partition R = {L , A ∗ \ L} s.t. X R is non-Markovian 17
APPLICATION/QUESTION. What is the smallest state-space for studying the frequency statistics of a language L in X ? − → X = a b b a b . . . (original sequence) X R = 1 0 0 1 0 . . . (non-Markovian encoding) → X R ′ − = 0 4 6 3 4 . . . (optimal Markovian encoding) X Q = 6 3 18 15 10 . . . (any other Markovian encoding) L A*/L (0) (1) (2) (4) (5) ab a (6) abbab (3) abb abba Figure. Coarsest refinement R ′ of R w.r.t. which X is embedable 18
APPLICATION/QUESTION. What is the smallest state-space for studying the frequency statistics of a language L in X ? − → X = a b b a b . . . (original sequence) X R = 1 0 0 1 0 . . . (non-Markovian encoding) X R ′ = 0 4 6 3 4 . . . (optimal Markovian encoding) → X Q − = 6 3 18 15 10 . . . (any other Markovian encoding) A*/L L (0) (1) (2) (3) (4) (5) ab (6) (8) (9) (7) a (10) (11) abbab (12) (13) (18) (14) (15) (16) (17) abb abba Figure. Arbitrary refinement Q of R w.r.t. which X is embedable 19
REMARK. The optimal refinement R ′ of R such that X R ′ is embedable is obtained through a limiting process: this makes it almost impossible to characterize de equivalence classes of R ′ Motivated by this we will introduce an embedding which—while not as optimal—it is analytically tractable (!) 20
DEFINITION. The Markov relation induced by X into A ∗ is the equivalence relation defined as uR X v ⇔ ( ∀ w ∈ A ∗ ) : P [ X = uw... | X = u... ]= P [ X = vw... | X = v... ] 21
DEFINITION. The Markov relation induced by X into A ∗ is the equivalence relation defined as uR X v ⇔ ( ∀ w ∈ A ∗ ) : P [ X = uw... | X = u... ]= P [ X = vw... | X = v... ] 00 .4 0 .6 01 .8 u ε .2 10 .5 1 v .5 11 Figure. Weighted tree visualization of definition with A = { 0 , 1 } 22
.4 00 0 .6 01 .8 u ε .2 10 .5 1 v .5 11 An equivalence relation R is said to be right-invariant if for all u, v ∈ A ∗ and α ∈ A : R ( u ) = R ( v ) = ⇒ R ( uα ) = R ( vα ) THEOREM C. X is embedable w.r.t. any right-invariant equivalence relation that is a refinement of R X ; in particular, X is embedable w.r.t. R X 23
EXAMPLE. Back to the seemingly unbiassed coin. For 0 < p < 1 / 2, define n 8 1 X i > 1 P Bernoulli( p ) , > 2 n > > i =1 > > n < d 1 X i = 1 P Bernoulli(1 / 2) , X n +1 = 2 n i =1 > n > > 1 X i < 1 P > Bernoulli(1 − p ) , > : 2 n i =1 We aim to understand the frequency statistics of { 0 , 1 } ∗ { 1 } , L 1 = { 0 } ∗ { 1 }{ 0 } ∗ ( { 1 }{ 0 } ∗ { 1 }{ 0 } ∗ ) ∗ L 2 = within X 24
PROPOSITION. R : { 0 , 1 } ∗ → Z defined as 8 9 | x | | x | | x | x i − | x | < = X X X R ( x ) = 2 ; = x i − (1 − x i ) 2 : i =1 i =1 i =1 is a right-invariant refinement of R X . In particular, X R n := R ( X 1 · · · X n ) is a first-order homogeneous Markov chain p (1-p) 1/2 1/2 (1-p) p n<0 n>0 0 X R is recurrent , with period 2. Because 0 < p < 1 / 2, X R is positive recurrent ; in particular, there exists a stationary distribution π . Observe that n X S L 1 = X i n i =1 25
n S L 1 P = X i n i =1 COROLLARY A. If U and V are Z -valued random variables such that P [ U = n ] = 2 · π ( n ) , n = 0( mod 2); P [ V = n ] = 2 · π ( n ) , n = 1( mod 2); then for L 1 := { 0 , 1 } ∗ { 1 } it applies that S L 1 ff − 1 d n lim 2 n · = U ; n 2 n →∞ n =0(mod 2) S L 1 ff − 1 d n lim 2 n · = V. n 2 n →∞ n =1(mod 2) 26
Recommend
More recommend