The Digital T ree: Analysis and Applications Philippe Flajolet, - PowerPoint PPT Presentation

Séminaire de Probabilités, Paris June 2010 The Digital T ree: Analysis and Applications Philippe Flajolet, INRIA Rocquencourt Tuesday, June 22, 2010 1

A (finite) tree associated with a (finite) set of words over an alphabet A. Equipped with a randomness model on words, we get a random tree, indexed by the number n of words. Characterize its probabilistic properties, mostly with COMPLEX ANAL YSIS. Tuesday, June 22, 2010 2

1. Digital T rees & Algorithms Tuesday, June 22, 2010 3

infinite tree set of words <--> partial tree word <--> branch Tuesday, June 22, 2010 4

DIGITAL TREE aka “TRIE”:= STOP descent by pruning long one-way branches. ~Only places corresponding to 2+ words (and their immediate descendants) are kept. ~The digital tree is finite as soon as built out of distinct words. E={a..., bba..., bbb...} Tuesday, June 22, 2010 5

TOP-DOWN construction: Set E is separated into E a ,...,E z according to initial letter; continue with next letter... INCREMENTAL construction: start with the empty tree and insert elements of E one after the other... (Split leaves as the need arises.) E={a..., bba..., bbb...} Tuesday, June 22, 2010 6

SUMMARY: Memoryless (Bernoulli) p,q; Markov, CF Tuesday, June 22, 2010 7

Algorithms: 1 - Dictionaries Manage dynamically dictionaries; hope for O(log n) depth? Save space by “factoring” common prefixes; hope for O(n) size? However, worst-case is unbounded... “TRIE”=tree+retrieval (Fredkin, de la Briandais ~1960) Analysis? Tuesday, June 22, 2010 8

n A random trie on n=500 uniform binary sequences; size =741 internal nodes; height=18 Tuesday, June 22, 2010 9

Algorithms: 2 -Hashing Data may be highly structured and share long prefixes. Use a transformation h: W -> W’ called “hashing” (akin to random number generators.) Uniform binary data are meaningful! Analysis? Tuesday, June 22, 2010 10

Algorithms: 3 -Paging Data may be accessible by blocks, e.g., pages on disc. Stop recursion as soon as “b” elements are isolated (standard: b=1). Combine with hashing = get index structure. Index Analysis? ...... Pages Tuesday, June 22, 2010 11

Algorithms: 4-MultiDim Data may be multidimensional & numeric/ geometric. quad-trie Analysis? Tuesday, June 22, 2010 12

Algorithms: 5-Communication Data may be distributed and accessible only via a common channel (network). Everybody speaks at the same time; if noise, then SPLIT according to individual coin flips. ABC tree protocol B AC Analysis? leader - AC A C Tuesday, June 22, 2010 13

2. Expectations Bernoulli vs Poisson models Mellin technology Fluctuations and error terms Tuesday, June 22, 2010 14

S n n (Proof in a “modernized” version follows....) Tuesday, June 22, 2010 15

Algebra... p q [ ] Tuesday, June 22, 2010 16

Algebra... Tuesday, June 22, 2010 17

With S n the expected tree size when the tree contains n elements and S ( x ) the Poisson expectation: S n e − x x n � S ( x ) = n ! . n ≥ 0 The Poisson expectation S ( x ) is like a generating function of { S n } . Go back —“depoissonize”— by Taylor expansion. E.g.: � � n − 1 � � n � 1 − 1 � 1 − 1 p = q = 1 − n � S n = 1 − 2 k 2 k 2 k , 2 . k Many variants are possible and one can justify that (elementary) S n = S ( x ) + small when x = n . Tuesday, June 22, 2010 18

Analysis... The Mellin transform � ∞ f ( x ) x s − 1 dx M f ⋆ ( s ) := f ( x ) � 0 (It exists in strips of C determined by growth of f ( x ) at 0 , + ∞ .) Property 1. Factors harmonic sums : � � � M � λ µ − s · f ⋆ ( x ) . λ f ( µ x ) � ( λ ,µ ) ( λ ,µ ) Property 2. Maps asymptotics of f on singularities of f ⋆ : 1 f ⋆ ≈ f ( x ) ≈ x − s 0 (log x ) m − 1 . = ⇒ ( s − s 0 ) m Proof of P 2 is from Mellin inversion + residues: Z c + i ∞ 1 f ⋆ ( s ) x − s ds . f ( x ) = 2 i π c − i ∞ Tuesday, June 22, 2010 19

Mellin and Tries � 2 k g ( x / 2 k ), with g ( x ) = 1 − (1 + x ) e − x . p = q = 1 / 2 : S ( x ) = k Harmonic sum property: Γ ( s ) �� 2 k 2 ks � S ⋆ ( s ) = · ( s + 1) Γ ( s ) = 1 − 2 1+ s . Mapping properties: S ⋆ exists in − 2 < ℜ ( s ) < − 1. Poles at s k = − 1 + 2 ik π / log 2, for k ∈ Z . Asymptotics of f ( x ) ≈ x − s 0 Location of pole ( s 0 ) � x − σ e i τ log x s 0 = σ + i τ � Tuesday, June 22, 2010 20

Tuesday, June 22, 2010 21

Memoryless sources (I) 1 Correspond to p � = q . Dirichlet series is 1 − p − s − q − s . Theorem (Knuth 1973; Fayolle, F., Hofri 1986, . . . ) Let H := p log p − 1 + q log q − 1 be the entropy. • In the periodic case, log p log q ∈ Q , there are fluctuations in S n . • In the aperiodic case, log p log q �∈ Q : D n ∼ 1 S n ∼ n H log n , and H Philippe Robert & Hanene Mohamed relate this to the periodic/aperiodic dichotomy of renewal theory (2005+). Tuesday, June 22, 2010 22

( pi , e, tan(1), log2, z (3), ...) [Lapidus & van Frankenhuijsen 2006] Tuesday, June 22, 2010 23

3. Distributions Analytic depoissonization & Saddle-points Gaussian laws ... Tuesday, June 22, 2010 24

2 h Text = Throw n balls into 2 h buckets, each of capacity b Tuesday, June 22, 2010 25

E[2 H ] --> Tuesday, June 22, 2010 26

[2001] Tuesday, June 22, 2010 27

DISTRIBUTIONS: size, depth, and path-length Tuesday, June 22, 2010 28

(p=q=1/2) Start with bivariate generating function F(z,u). Analyse log Analyse perturbation near u=1. Use analytic depoissonization Conclude by continuity theorem for characteristic fns. (case of size, p=q=1/2) Tuesday, June 22, 2010 29

Profile of tries, after Szpankowski et al. + Cesaratto-Vallée 2010+ Tuesday, June 22, 2010 30

4. General sources Comparing and sorting real numbers Continued fractions Fundamental intervals... Tuesday, June 22, 2010 31

Comparing numbers & sorting by continued fractions � a � b − c sign = sign( ad − bc ) . d Requires double precision and/or is unstable with floats. (Computational geometry, Knuth’s Metafont,. . . ) � Hakmem Algorithm (Gosper, 1972) 1 1 36 113 113 = 355 = , . 1 1 3 + 3 + 7 + 1 7 + 1 5 16 Theorem (Cl´ ement, F., Vall´ ee 2000+) Sorting with continued fractions : mean path length of trie is K 0 n log n + K 1 n + Q(n) + K 2 + o (1) , + 9(log 2) 2 K 0 = 6 log 2 K 1 = 18 γ log 2 − 72log 2 ζ ′ (2) − 1 , 2 . π 2 π 2 π 2 π 4 and Q ( n ) ≈ n 1 / 4 is equivalent to Riemann Hypothesis . Tuesday, June 22, 2010 32

[Vallée 1997++] (0) (1) View source model in terms of fundamental intervals: w -> p w Revisit the analysis of tries (e.g, size) Mellinize: Tuesday, June 22, 2010 33

Vallée 1997-2001, Baladi-Vallée 2005+, ... For expanding maps T, fundamental intervals are generated by a transfer operator. For binary system (+Markov) and continued fractions, simplifications occur. Tuesday, June 22, 2010 34

...and Nörlund integrals complete the job! Poisson + Mellin = Newton -> Nörlund - = fixed-n model Q.E.D. cf [F . Sedgewick 1995] Tuesday, June 22, 2010 35

5. Other trie algorithms Leader election The tree communication protocol “Patricia” trees Data compression: Lempel-Ziv... Probabilistic counting Quicksort is O(n (log n) 2 )... Tuesday, June 22, 2010 36

ABC B AC Leader election = leftmost boundary of a leader - AC random trie (1/2,1/2). A C Proof: tree decompositions + Mellin... Tuesday, June 22, 2010 37

ABC B AC tree protocol = trie with arrivals - AC A C (non-commutative iteration semigroup) Tuesday, June 22, 2010 38

A curiosity (cf Mellin): = !! - 0.249999999999999999999999999999999999999999999999999 999999999999999999999999999999999999999999999999999999 999999999999999999999999999999999999999999999999999999 9999999999999999999999999999999999999999999999999998211 (= -1/2+10 -211 : there are 208 consecutive nines) Tuesday, June 22, 2010 39

The Digital T ree: Analysis and Applications Philippe Flajolet, - PowerPoint PPT Presentation

Sminaire de Probabilits, Paris June 2010 The Digital T ree: Analysis and Applications Philippe Flajolet, INRIA Rocquencourt Tuesday, June 22, 2010 1 A (finite) tree associated with a (finite) set of words over an alphabet A. Equipped

FREE FREE FREE FREE RIDE RIDE RIDE RIDE W HAT HAT IS IS F REE REE RIDE RIDE ? HAT HAT IS

REE-CALL: how it evolved and what we learnt Dr Md Khalid Hossain Economic Justice &

C ONTENTS I I NTRODUCTION Notation Words and Free Groups Special Words T HEORETICAL F ACTS

REE Working Session Life Sciences Entrepreneurship: The best ways to integrate life science

I NTRODUCTION TO F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire International Associ

P ROTEIN -L IGAND S TANDARD B INDING F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire

T RANSITION -P ATH S AMPLING AND F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire

T RANSITION -P ATH S AMPLING AND F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire

I NTRODUCTION TO F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire International Associ

2. Digital Data CHAPTER HIGHLIGHTS Elements of digital media. Digital codes. Di it l d

The Digital Revolution 1 Digital Revolution Nadias Theme 2 Digital Revolution Digital

Chapter 8 Digital Media Computer Concepts 2013 8 Section A: Digital Sound Digital Audio

Digital Equity Matters Mary Beth Henry Digital Equity/Broadband Advocate Oregon Connections

Shift to Digital Strategy Shift to Digital Strategy Contents Introduction The Digital

www.centre-for-digital-business.com t hemes Did we not see the digital era coming? o Digital

A Framework for Analysis 1. Digital Endowments 2. Digital Intensities 3. Digital Restrictions

Ansible in Operatjon Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za |

Video Streaming in Wireless Environments Manoj Kumar C Advisor Prof. Sridhar Iyer Kanwal Rekhi

Percona XtraBackup at Alibaba Cloud Bo Wang Alibaba Cloud About Me Bo Wang (Fungo Wang)

OSCAR on Debian the EDF Experience Geoffroy Valle Hugues Prisker Jean-Yves Berthou Daniel

FarsiT X E and the Iranian T X Community E Behdad Esfahbod farsitex@behdad.org Roozbeh

01 H M & M

From yesterday The logoff button should be red, located in

The Makam metalanguage Reducing the cost of experimentation in PL research Antonis Stampoulis and

The Digital T ree: Analysis and Applications Philippe Flajolet, - PowerPoint PPT Presentation

Sminaire de Probabilits, Paris June 2010 The Digital T ree: Analysis and Applications Philippe Flajolet, INRIA Rocquencourt Tuesday, June 22, 2010 1 A (finite) tree associated with a (finite) set of words over an alphabet A. Equipped

FREE FREE FREE FREE RIDE RIDE RIDE RIDE W HAT HAT IS IS F REE REE RIDE RIDE ? HAT HAT IS

REE-CALL: how it evolved and what we learnt Dr Md Khalid Hossain Economic Justice &amp;

C ONTENTS I I NTRODUCTION Notation Words and Free Groups Special Words T HEORETICAL F ACTS

REE Working Session Life Sciences Entrepreneurship: The best ways to integrate life science

I NTRODUCTION TO F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire International Associ

P ROTEIN -L IGAND S TANDARD B INDING F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire

T RANSITION -P ATH S AMPLING AND F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire

T RANSITION -P ATH S AMPLING AND F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire

I NTRODUCTION TO F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire International Associ

2. Digital Data CHAPTER HIGHLIGHTS Elements of digital media. Digital codes. Di it l d

The Digital Revolution 1 Digital Revolution Nadias Theme 2 Digital Revolution Digital

Chapter 8 Digital Media Computer Concepts 2013 8 Section A: Digital Sound Digital Audio

Digital Equity Matters Mary Beth Henry Digital Equity/Broadband Advocate Oregon Connections

Shift to Digital Strategy Shift to Digital Strategy Contents Introduction The Digital

www.centre-for-digital-business.com t hemes Did we not see the digital era coming? o Digital

A Framework for Analysis 1. Digital Endowments 2. Digital Intensities 3. Digital Restrictions

Ansible in Operatjon Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za |

Video Streaming in Wireless Environments Manoj Kumar C Advisor Prof. Sridhar Iyer Kanwal Rekhi

Percona XtraBackup at Alibaba Cloud Bo Wang Alibaba Cloud About Me Bo Wang (Fungo Wang)

OSCAR on Debian the EDF Experience Geoffroy Valle Hugues Prisker Jean-Yves Berthou Daniel

FarsiT X E and the Iranian T X Community E Behdad Esfahbod farsitex@behdad.org Roozbeh

01 H M &amp; M

From yesterday The logoff button should be red, located in

The Makam metalanguage Reducing the cost of experimentation in PL research Antonis Stampoulis and

REE-CALL: how it evolved and what we learnt Dr Md Khalid Hossain Economic Justice &

01 H M & M