Large fringe and non-fringe subtrees in conditional Galton-Watson trees Xing Shi Cai, Luc Devroye School of Computer Science McGill University Probabilistic Midwinter Meeting Umeå University Jan 18, 2017 1 / 47
Outline Introduction 1 Large Fringe Subtrees 2 Large Fringe Subtrees—Applications 3 Large Non-Fringe Subtrees 4 2 / 47
What is a tree A tree is an acyclic graph . In this talk, trees are unlabeled, rooted , and ordered (plane trees). 3 / 47
Galton-Watson trees A Galton-Watson (GW) tree T gw starts with a single node. Each node in T gw chooses a random number of child nodes independently from the same distribution ξ . Introduced by Bienaym´ e, 1845. 3 3 3 3 ..... 0 0 2 0 2 1 0 0 0 Note We will always assume that E ξ = 1 and Var ( ξ ) ∈ ( 0, ∞ ) . 4 / 47
Conditional Galton-Watson trees A conditional GW Tree T gw is T gw restricted to |T gw | = n . n � � = P {T gw = T | |T gw | = n } . T gw So P = T n It covers many uniform random tree models: full binary trees binary trees d -ary trees Motzkin trees Plane trees Cayley trees 5 / 47
Example of conditional Galton-Watson trees Let P { ξ = i } = 1 / 2 i + 1 . In other words, ξ L = Ge ( 1 / 2 ) . T gw is uniformly distributed among all trees of size n . n � � P {T gw = T } = 2 − 7 for T ∈ 6 / 47
Fringe subtrees For a node v of a tree T , the fringe subtree T v contains v and all its decedents. It is what normally called a “subtree”. 7 / 47
Fringe subtree count Let N T ( T gw n ) be the number of fringe subtrees of shape T in T gw n . 8 / 47
Fringe subtree count: bigger example In the next example, N T ( T gw n ) = 15 120 = 1 8 = π ( T ) ≡ P {T gw = T } . n Is this just a coincidence? 9 / 47
What is known For large n , fringe subtrees in T gw behave like n independent copies of T gw . Take a uniform random fringe subtree of T gw n , the probability to get T is about π ( T ) ≡ P {T gw = T } . So N T ( T gw n ) ≈ Bi ( n , π ( T )) . 10 / 47
What is known cont. Theorem Aldous (1991) (Law of large number) As n → ∞ , N T ( T gw n ) p → π ( T ) . n Theorem Janson (2016) (Central limit theorm) As n → ∞ , N T ( T gw n ) − nπ ( T ) d γ √ n → N ( 0, 1 ) , where γ is a constant. 11 / 47
What do we want to know What if the T in N T ( T gw n ) changes with n ? The height of the largest complete r -ary fringe subtree. The largest k such that T gw contains all trees of size � k as n fringe subtree. What about non-fringe subtrees? 12 / 47
Outline Introduction 1 Large Fringe Subtrees 2 Large Fringe Subtrees—Applications 3 Large Non-Fringe Subtrees 4 13 / 47
Large fringe subtrees If | T n | → ∞ , then π ( T n ) ≡ P {T gw = T n } → 0. Then we should have N T n ( T gw n ) ≈ Bi ( n , π ( T n )) ≈ Po ( nπ ( T n )) . Theorem 1.2 Let k n = o ( n ) and k n → ∞ . Then d T V ( N T ( T gw n ) , Po ( nπ ( T ))) = 0. lim sup n → ∞ T : | T | = k n 14 / 47
Large fringe subtrees cont. Theorem 1.2 cont. So letting ( T n ) n � 1 be a sequence of trees with | T n | = k n , we have: 1 If nπ ( T n ) → 0 , then N T n ( T gw n ) = 0 whp. n ) d 2 If nπ ( T n ) → µ ∈ ( 0, ∞ ) , then N T n ( T gw → Po ( µ ) . 3 If nπ ( T n ) → ∞ , then N T n ( T gw n ) − nπ ( T n ) d → N ( 0, 1 ) . � nπ ( T n ) 15 / 47
The degree sequence The degree of a node is the number of its children. The degree sequence of a tree, is the list of degrees of its nodes in Depth-First-Search order. We can count fringe subtree through degree sequence. 1 T 1 T 2 2 4 1 3 5 6 7 2 Degree sequence: ( 2, 1, 0, 3, 0, 0, 0 ) ( 1, 0 ) 16 / 47
Count fringe subtrees through the degree sequence n ) be the degree sequence of T gw Let ( ξ n 1 , . . . , ξ n n . Let ( d 1 , . . . , d | T | ) be the degree sequence of T . Then N T ( T gw n ) can be write as n � N T ( T gw n ) = I j j = 1 n � � . ≡ 1 � ( ξ n j ,..., ξ n j + | T | − 1 )=( d 1 ,..., d | T | ) j = 1 17 / 47
Why fringe subtrees are like unconditional Galton-Watson trees When n is large, ξ n 1 , . . . , ξ n n are close to ξ 1 , . . . , ξ n ( n independent copies of ξ ). Thus � � � � � ∩ | T | ξ n � P I j = 1 = P j + i − 1 = d i i = 1 | T | � P { ξ i = d i } = P {T gw = T } ≡ π ( T ) . ≈ i = 1 So I 1 , . . . , I n are close to iid Bernoulli π ( T ) . This is why n ) = � n N T ( T gw j = 1 I j ≈ Bi ( n , π ( T )) ≈ Po ( nπ ( T )) . 18 / 47
The exchangeable pair method The proof of Theorem 1.2 uses the exchangeable pair method (Ross (2011, thm. 4.37)). It is a variation of Stein’s method for Poisson distribution. Example Let X 1 , . . . , X n and Y 1 , . . . , Y n be iid Be ( p ) . Let W = X 1 + · · · + X n . Let W ′ = W − X Z + Y Z where Z L = Unif ( { 1, . . . , n } ) . L We have an exchange pair — ( W , W ′ ) = ( W ′ , W ) . Compute P { W ′ = W − 1 | X 1 , . . . , X n } , P { W ′ = W + 1 | X 1 , . . . , X n } . Then the method says d T V ( W , Po ( E W )) � p . 19 / 47
Subtree replacing – the naive way n ) = � n Recall N T ( T gw i = 1 I i . What if we do the same thing for N T ( T gw n ) ? L N = N T ( T gw Let ¯ n ) − I Z + I ′ Z with I ′ = I Z . Z N , N T ( T gw Is ( ¯ n )) an exchangeable pair? 20 / 47
Subtree replacing – the proper way Choose a fringe subtree of T gw uniformly at random. n If its size is not the same as T , do nothing. Otherwise, replace it with T gw | T | . Let ¯ N be the number of T in the new tree. Then ( N T ( T gw n ) , ¯ N ) is an exchangeable pair. T 21 / 47
Upper bound of the total variation distance Let T k be the set of all trees of size k . Let S ⊆ T k . Let N S ( T gw n ) be the number of fringe subtrees that belongs to S . Let π ( S ) ≡ P {T gw ∈ S} . So N T ( T gw n ) = N { T } ( T gw n ) . Lemma 4.1 Let k = k n = o ( n ) and k → ∞ . We have N S ( T gw � � � k 1 / 4 d T V n ) , Po ( nπ ( S )) � � k − 3 / 2 � sup � 1 + o + O . √ n � π ( S ) /π ( T k ) + π ( S ) /π ( T k ) S ⊆ T k 22 / 47
Large fringe subtrees count—set version Theorem 1.3 Let T k be the set of trees of size k . Let k n = o ( n ) and k n → ∞ . Let ( S n ) n � 1 be a sequence with S n ⊆ T k n . We have: 1 If nπ ( S n ) → 0 , then N S n ( T gw n ) = 0 whp. n ) d 2 If nπ ( S n ) → µ ∈ ( 0, ∞ ) , then N S n ( T gw → Po ( µ ) . 3 If nπ ( S n ) → ∞ , then N S n ( T gw n ) − nπ ( S n ) d → N ( 0, 1 ) . � nπ ( S n ) 4 If π ( S n ) /π ( T k n ) → 0 , then n → ∞ d T V ( N S n ( T gw lim n ) , Po ( nπ ( S n ))) = 0. 23 / 47
Outline Introduction 1 Large Fringe Subtrees 2 Large Fringe Subtrees—Applications 3 Large Non-Fringe Subtrees 4 24 / 47
Application 1—largest complete r -ary fringe subtree Let T r - ary be a complete r -ary tree of height h . h 25 / 47
Application 1—largest complete r -ary fringe subtree Lemma 5.2 & 5.3 Let H n , r be the height of the largest complete r -ary fringe subtree in T gw n . Then for r � 2 , p H n , r − log r log n → − α r , where α r is a constant. And H n ,1 log ( 1 / P { ξ = 1 } ) p → 1. log n Method: Find the maximum h such that nπ ( T r - ary ) → ∞ . h Then apply Theorem 1.2. 26 / 47
Application 2—existence of all possible subtrees Let K n be the maximum k such that T gw contains all trees n of size � k as fringe subtree. 27 / 47
The coupon collector problem Original version There are n types of coupons. Each time we draw one type of coupon uniformly at random . How many draws do we need to collect all n types? Generalized version There are n types of coupons. Each time we draw a coupon, we get type i with probability p i . How many draws do we need to collect all n types? 28 / 47
The coupon collector problem: the answer Lemma 5.1 (Generalized coupon collector) Assume X takes values in { 1, . . . , n } . Let p i ≡ P { X = i } . Let X 1 , X 2 , . . . be i.i.d. copies of X . Let N ≡ inf { i � 1 : |{ X 1 , X 2 , . . . , X i }| = n } . Let m be a positive integers. We have n � 1 ( 1 − p i ) m � P { N � m } � 1 − i = 1 ( 1 − p i ) m . � n i = 1 If p i = 1 /n , then N = n log ( n ) + o p ( 1 ) . 29 / 47
Connection to our problem Draw independent copies T gw until every tree of size k has k appeared. Let M k be the number of draws. N T k ( T gw n ) ≈ nπ ( T k ) . So if nπ ( T k ) > M k , then probably we have all trees of size k as fringe subtree, otherwise we do not. This is a coupon collector problem! 30 / 47
The least possible tree Among all coupons, there is one that is least likely to appear. If we get this one, we are likely to have all coupons. Let T min be the least possible fringe subtree of size k . k M k depends on � � T gw = T min p min . ≡ P k k Lemma If np min → 0 , then T min does not appear. k k If np min /k → ∞ , then all possible subtrees of size k appear. k 31 / 47
What can we say about the least possible subtree? p min certainly depends on ξ . k But there is a small surprise. Theorem 5.2 We have ) 1 /k → L ( p min k as k → ∞ , where 0 � L < 1 is a constant defined as � � 1 /i � � P { ξ = i } L ≡ inf P { ξ = 0 } . P { ξ = 0 } i � 1 32 / 47
Recommend
More recommend