adaptive histograms from a randomized queue that is
play

Adaptive Histograms from a Randomized Queue that is Prioritized for - PowerPoint PPT Presentation

Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Arithmetic on SRSPs Application Conclusion Adaptive Histograms from a Randomized Queue that is Prioritized for Statistically Equivalent Blocks Gloria Teng Jennifer Harlow Raazesh


  1. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Arithmetic on SRSPs Application Conclusion Adaptive Histograms from a Randomized Queue that is Prioritized for Statistically Equivalent Blocks Gloria Teng Jennifer Harlow Raazesh Sainudiin Department of Mathematics and Statistics, University of Canterbury, New Zealand August 19, 2010 Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  2. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Arithmetic on SRSPs Application Conclusion Introduction Present statistical regular sub-pavings as an efficient, data-driven, multi-dimensional data-structure for non-parametric density estimation of massive data sets; Apply our methods to earthquakes in NZ, weather and aircraft trajectories over a busy US airport and samples simulated from challenging multi-dimensional densities, including Levy and Figure: Shape of a Levy density with 700 modes. Rosenbrock. Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  3. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Intervals and Boxes Arithmetic on SRSPs Regular Sub-pavings (RSPs) Application Statistical Regular Sub-pavings (SRSPs) Conclusion Intervals and Boxes in R d Intervals and Boxes as interval vectors: x = [ x 1 , x 1 ] × [ x 2 , x 2 ] × . . . × [ x d , x d ] , x i ≤ x i . ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ ✏ 1-dim. 2-dim. 3-dim. Figure: Boxes in 1D, 2D, and 3D. Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  4. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Intervals and Boxes Arithmetic on SRSPs Regular Sub-pavings (RSPs) Application Statistical Regular Sub-pavings (SRSPs) Conclusion Binary Tree Representation These boxes can also be represented by ordered binary trees. An operation of bisection on a box is equivalent to performing the operation on its corresponding node in the tree, i.e.: ρ ③ ρ � ❅ � ❅ ③ � ❅ ③ ③ L R ✲ X X L X R Figure: Bisecting a box or its equivalent node. Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  5. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Intervals and Boxes Arithmetic on SRSPs Regular Sub-pavings (RSPs) Application Statistical Regular Sub-pavings (SRSPs) Conclusion Regular Sub-pavings (RSPs) (Jaulin et. al., 2001) A sequence of bisections of boxes; Start from the root box; Along the first widest dimension. Figure: A sequence of bisections on root box X to produce a 4-leafed RSP s . ρ ③ X Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  6. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Intervals and Boxes Arithmetic on SRSPs Regular Sub-pavings (RSPs) Application Statistical Regular Sub-pavings (SRSPs) Conclusion Regular Sub-pavings (RSPs) (Jaulin et. al., 2001) A sequence of bisections of boxes; Start from the root box; Along the first widest dimension. Figure: A sequence of bisections on root box X to produce a 4-leafed RSP s . ρ ③ � ❅ ρ � ❅ ③ � ❅ ③ ③ L R X X L X R Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  7. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Intervals and Boxes Arithmetic on SRSPs Regular Sub-pavings (RSPs) Application Statistical Regular Sub-pavings (SRSPs) Conclusion Regular Sub-pavings (RSPs) (Jaulin et. al., 2001) A sequence of bisections of boxes; Start from the root box; Along the first widest dimension. Figure: A sequence of bisections on root box X to produce a 4-leafed RSP s . ρ ③ � ❅ ρ � ❅ ③ � ❅ � ❅ ③ ρ ③ � ❅ � ❅ ③ � ❅ � ❅ R ③ ③ � ❅ ③ ③ L R LL LR X LR X X L X R X R X LL Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  8. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Intervals and Boxes Arithmetic on SRSPs Regular Sub-pavings (RSPs) Application Statistical Regular Sub-pavings (SRSPs) Conclusion Regular Sub-pavings (RSPs) (Jaulin et. al., 2001) A sequence of bisections of boxes; Start from the root box; Along the first widest dimension. Figure: A sequence of bisections on root box X to produce a 4-leafed RSP s . ρ ③ � ❅ ρ � ❅ ③ � ❅ � ❅ ③ ρ ③ � ❅ � ❅ ③ � ❅ � ❅ ③ R ρ � ❅ ③ � ❅ � ❅ � ❅ ③ ③ ③ � ❅ � ❅ R � ❅ ③ ③ � ❅ ③ LL � ❅ ③ L R � ❅ ③ ③ LL LR LRL LRR X LR X LRL X LRR X X L X R X R X R X LL X LL Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  9. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Intervals and Boxes Arithmetic on SRSPs Regular Sub-pavings (RSPs) Application Statistical Regular Sub-pavings (SRSPs) Conclusion The Space of All Possible RSPs The number of distinct RSP with i splits is equal to the Catalan number: � 2 i � 1 (2 i )! C i = = ( i + 1)!( i !) . i + 1 i s 0 s 11 s s 221 122 s 2222 s s s s 3321 2331 1332 1233 Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  10. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Intervals and Boxes Arithmetic on SRSPs Regular Sub-pavings (RSPs) Application Statistical Regular Sub-pavings (SRSPs) Conclusion Statistical Regular Sub-pavings (SRSPs) Extended from the RSP; Figure: Caching the sample count in each node (or box). Caches recursively computable statistics at each box or node as ρ ③ 10 data falls through; These statistics include: the sample count; the sample mean vector; the sample variance-covariance r matrix; r r and the volume of the box. r r r r r r r Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  11. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Intervals and Boxes Arithmetic on SRSPs Regular Sub-pavings (RSPs) Application Statistical Regular Sub-pavings (SRSPs) Conclusion Statistical Regular Sub-pavings (SRSPs) Extended from the RSP; Figure: Caching the sample count in each node (or box). Caches recursively computable statistics at each box or node as ρ ③ 10 � ❅ data falls through; � ❅ � ❅ ③ ③ 5 5 These statistics include: R the sample count; the sample mean vector; the sample variance-covariance r r matrix; r r r r and the volume of the box. r r r r r r r r r r r r r r Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  12. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Intervals and Boxes Arithmetic on SRSPs Regular Sub-pavings (RSPs) Application Statistical Regular Sub-pavings (SRSPs) Conclusion Statistical Regular Sub-pavings (SRSPs) Extended from the RSP; Figure: Caching the sample count in each node (or box). Caches recursively computable statistics at each box or node as ρ ③ 10 � ❅ data falls through; � ❅ � ❅ ③ ③ 5 5 These statistics include: � ❅ � ❅ R � ❅ ③ the sample count; ③ 3 2 LL LR the sample mean vector; the sample variance-covariance r r matrix; X LR r r r r and the volume of the box. r r X R r r r r r r r r X LL r r r r Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  13. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Arithmetic on SRSPs S.E.B. Priority Queue Application Conclusion SRSPs as Adaptive Histograms The histogram estimate of i.i.d. random variables X 1 , X 2 , . . . , X n in R d with density f is given by: n � I X i ∈ x ( x ) f n ( x ) = 1 ˆ n vol ( x ) i =1 x ( x ): the leaf box x that contains x vol ( x ): volume of box x ρ 10 ③ � ❅ � ❅ � ❅ ③ Figure: A SRSP ③ 5 5 � ❅ � ❅ R as a histogram � ❅ ③ ③ 2 3 LL LR estimate. r r X LR r r r X R r r r X LL r r Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  14. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Arithmetic on SRSPs S.E.B. Priority Queue Application Conclusion SRSPs as Adaptive Histograms The histogram estimate of i.i.d. random variables X 1 , X 2 , . . . , X n in R d with density f is given by: n � I X i ∈ x ( x ) f n ( x ) = 1 ˆ n vol ( x ) i =1 x ( x ): the leaf box x that contains x vol ( x ): volume of box x ρ 10 ③ � ❅ � ❅ � ❅ ③ Figure: A SRSP ③ 5 5 � ❅ � ❅ R as a histogram � ❅ ③ ③ 2 3 LL LR estimate. r r X LR r r r X R r r r X LL r r Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

  15. Statistical Regular Sub-pavings (SRSPs) Adaptive Histograms Arithmetic on SRSPs S.E.B. Priority Queue Application Conclusion A Prioritized Queue based Algorithm Algorithm SplitMostCounts As data arrives, order the leaf boxes of the SRSP so that the leaf box with the most number of points will be chosen for the next bisection. ρ ③ 10 r r r r r r r r r r X Teng, Harlow and Sainudiin Adaptive Histograms from SEB-based PQ

Recommend


More recommend