tutorial on statistical n body problems and proximity
play

Tutorial on Statistical N-Body Problems and Proximity Data - PowerPoint PPT Presentation

Tutorial on Statistical N-Body Problems and Proximity Data Structures Alexander Gray School of Computer Science Carnegie Mellon University Outline: 1. Physics problems and methods 2. Generalized N-body problems 3. Proximity data structures


  1. The Proximity Project [Gray, Lee, Rotella, Moore 2005] Careful agostic empirical comparison, open source 15 datasets, dimension 2-1M The most well-known methods from 1972-2004 • Exact NN: 15 methods • All-NN, mono & bichromatic: 3 methods • Approximate NN: 10 methods • Point location: 3 methods • (NN classification: 3 methods) • (Radial range search: 3 methods)

  2. …and the overall winner is? (exact NN, high-D) Ball-trees, basically – though there is high variance and dataset dependence • Auton ball-trees III [Omohundro 91],[Uhlmann 91], [Moore 99] • Cover-trees [Alina B.,Kakade,Langford 04] • Crust-trees [Yianilos 95],[Gray,Lee,Rotella,Moore 2005]

  3. A ball-tree: level 1

  4. A ball-tree: level 2

  5. A ball-tree: level 3

  6. A ball-tree: level 4

  7. A ball-tree: level 5

  8. Anchors Hierarchy [Moore 99] • ‘Middle-out’ construction • Uses farthest-point method [Gonzalez 85] to find sqrt(N) clusters – this is the middle • Bottom-up construction to get the top • Top-down division to get the bottom • Smart pruning throughout to make it fast • (NlogN), very fast in practice

  9. Outline: 1. Physics problems and methods 2. Generalized N-body problems 3. Proximity data structures 4. Dual-tree algorithms 5. Comparison

  10. Questions • What’s the magic that allows O(N) ? Is it really because of the expansions? • Can we obtain an method that’s: 1. O(N) 2. Lightweight: - works with or without ..............................expansions - simple, recursive

  11. New algorithm • Use an adaptive tree ( kd -tree or ball-tree) • Dual-tree recursion • Finite-difference approximation

  12. Single-tree : Dual-tree (symmetric):

  13. Simple recursive algorithm SingleTree (q,R) { if approximate (q,R), return. if leaf(R), SingleTreeBase (q,R). else, SingleTree (q,R.left). SingleTree (q,R.right). } (NN or range-search: recurse on the closer node first)

  14. Simple recursive algorithm DualTree (Q,R) { if approximate (Q,R), return. if leaf(Q) and leaf(R), DualTreeBase (Q,R). else, DualTree (Q.left,R.left). DualTree (Q.left,R.right). DualTree (Q.right,R.left). DualTree (Q.right,R.right). } (NN or range-search: recurse on the closer node first)

  15. Dual-tree traversal (depth-first) Reference points Query points

  16. Dual-tree traversal Reference points Query points

  17. Dual-tree traversal Reference points Query points

  18. Dual-tree traversal Reference points Query points

  19. Dual-tree traversal Reference points Query points

  20. Dual-tree traversal Reference points Query points

  21. Dual-tree traversal Reference points Query points

  22. Dual-tree traversal Reference points Query points

  23. Dual-tree traversal Reference points Query points

  24. Dual-tree traversal Reference points Query points

  25. Dual-tree traversal Reference points Query points

  26. Dual-tree traversal Reference points Query points

  27. Dual-tree traversal Reference points Query points

  28. Dual-tree traversal Reference points Query points

  29. Dual-tree traversal Reference points Query points

  30. Dual-tree traversal Reference points Query points

  31. Dual-tree traversal Reference points Query points

  32. Dual-tree traversal Reference points Query points

  33. Dual-tree traversal Reference points Query points

  34. Dual-tree traversal Reference points Query points

  35. Dual-tree traversal Reference points Query points

  36. Finite-difference function approximation. Taylor expansion: ′ ≈ + − f ( x ) f ( a ) f ( a )( x a ) Gregory-Newton finite form:  −  1 f ( x ) f ( x ) ≈ + −   + f ( x ) f ( x ) ( x x ) i 1 i i i − 2 x x   + i 1 i  δ − δ  max min 1 K ( ) K ( ) δ ≈ δ + δ − δ   min min K ( ) K ( ) ( ) δ − δ max min 2  

  37. Finite-difference function approximation. assumes monotonic decreasing kernel [ ] = δ + δ min max K K ( ) K ( ) 1 2 QR QR N [ ] N ( ) R = ∑ δ − ≤ δ − δ min max err K K K ( ) K ( ) R q qr QR QR 2 r could also use center of mass Stopping rule?

  38. Simple approximation method approximate (Q,R) { = δ = δ dl N K ( ), du N K ( ). R max R min δ ≥ τ ⋅ max( diam ( Q ), diam ( R )) if min incorporate( dl , du ). } � trivial to change kernel � hard error bounds

  39. Big issue in practice… Tweak parameters Case 1 – algorithm gives no error bounds Case 2 – algorithm gives hard error bounds: must run it many times Case 3 – algorithm automatically achives your error tolerance

  40. Automatic approximation method approximate (Q,R) { = δ = δ dl N K ( ), du N K ( ). R max R min δ − δ ≤ N φ ε K ( ) K ( ) ( Q ) 2 if min max min incorporate( dl , du ). return. } � just set error tolerance, no tweak parameters � hard error bounds

  41. Runtime analysis THEOREM: Dual-tree algorithm is O(N) ASSUMPTION: N points from density f < ≤ ≤ 0 c f C

  42. Recurrence for self-finding single-tree (point-node) = + T ( N ) T ( N / 2 ) O ( 1 ) = T ( 1 ) O ( 1 ) ⇒ N ⋅ O (log N ) dual-tree (node-node) = + T ( N ) 2 T ( N / 2 ) O ( 1 ) = T ( 1 ) O ( 1 ) ⇒ O ( N )

  43. Packing bound LEMMA: Number of nodes that are well- separated from a query node Q is bounded by a constant D 1 +   g ( s , c , C ) Thus the recurrence yields the entire runtime. Done. (cf. [Callahan-Kosaraju 95]) On a manifold , use its dimension D’ (the data’s ‘intrinsic dimension’).

Recommend


More recommend