dual tree algorithms in statistics
play

Dual-tree Algorithms in Statistics Ryan Riegel - PowerPoint PPT Presentation

Dual-tree Algorithms in Statistics Ryan Riegel rriegel@cc.gatech.edu Computational Science and Engineering College of Computing Georgia Institute of Technology Dual-tree Algorithms in Statistics p.1/77 Outline (Relevant citations at top


  1. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.15/77

  2. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.16/77

  3. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.17/77

  4. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.18/77

  5. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.19/77

  6. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.20/77

  7. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.21/77

  8. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.22/77

  9. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.23/77

  10. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.24/77

  11. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.25/77

  12. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.26/77

  13. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.27/77

  14. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.28/77

  15. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.29/77

  16. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.30/77

  17. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.31/77

  18. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.32/77

  19. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.33/77

  20. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.34/77

  21. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.35/77

  22. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.36/77

  23. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.37/77

  24. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.38/77

  25. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.39/77

  26. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.40/77

  27. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.41/77

  28. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.42/77

  29. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.43/77

  30. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.44/77

  31. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.45/77

  32. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.46/77

  33. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.47/77

  34. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.48/77

  35. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.49/77

  36. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.50/77

  37. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.51/77

  38. Monochromatic all-nearest-neighbors: map argmin d ( q, r ) q ∈ X r ∈ X − q Dual-tree Algorithms in Statistics – p.52/77

  39. Ex: Two-point Correlation Gray and Moore, NIPS 2000 � � I ( d ( x 1 , x 2 ) ≤ h ) x 1 ∈ X x 2 ∈ X function tpc( X 1 , X 2 ) if d l ( X 1 , X 2 ) > h, return 0 if d u ( X 1 , X 2 ) ≤ h, return | X 1 | · | X 2 | return tpc( X L 1 , X L 2 ) + tpc( X L 1 , X R 2 ) + tpc( X R 1 , X L 2 ) + tpc( X R 1 , X R 2 ) Dual-tree Algorithms in Statistics – p.53/77

  40. Ex: Range Count Gray and Moore, NIPS 2000 � map I ( d ( q, r ) ≤ h ) q ∈ Q r ∈ R init ∀ q ∈ Q root , a ( q ) = 0 function rng( Q, R ) if d l ( Q, R ) > h, return if d u ( Q, R ) ≤ h, ∀ q ∈ Q, a ( q ) += | R | ; return L , R L ); rng( Q L , R R ) rng( Q R , R L ); rng( Q R , R R ) rng( Q Dual-tree Algorithms in Statistics – p.54/77

  41. Ex: All-nearest-neighbors Gray and Moore, NIPS 2000 map argmin d ( q, r ) q ∈ Q r ∈ R init ∀ q ∈ Q root , a ( q ) = ∞ function allnn( Q, R ) if a u ( Q ) ≤ d l ( Q, R ) , return if ( Q, R ) = ( { q } , { r } ) , a ( q ) = min { a ( q ) , d ( q, r ) } ; return prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q L , · ) L , R 1 ); allnn( Q L , R 2 ) allnn( Q prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q R , · ) R , R 1 ); allnn( Q R , R 2 ) allnn( Q Dual-tree Algorithms in Statistics – p.55/77

  42. Ex: Kernel Density Estimation Lee et al. , NIPS 2005 Lee and Gray, UAI 2006 � map K h ( q, r ) q ∈ Q r ∈ R init ∀ q ∈ Q root , a ( q ) = 0; b = 0 function kde( Q, R, b ) h ( Q, R ) < ( a l ( Q ) + b ) | R |· ǫ if K u h ( Q, R ) − K l | R root | , ∀ q ∈ Q, a ( q ) += K l h ( Q, R ); return prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q L , · ) L , R 1 , b + K l L , R 2 )); kde( Q L , R 2 , b ) kde( Q h ( Q prioritize { R 1 , R 2 } = { R L , R R } by d l ( Q R , · ) R , R 1 , b + K l R , R 2 )); kde( Q R , R 2 , b ) kde( Q h ( Q Dual-tree Algorithms in Statistics – p.56/77

  43. Ex: Kernel Discriminant Analysis Gray and Riegel, COMPSTAT 2006 Riegel et al. , SIAM Data Mining 2008 P ( C ) � map argmax K h C ( q, r ) | R C | q ∈ Q C ∈{ C 1 ,C 2 } r ∈ R C init ∀ q ∈ Q root , a ( q ) = δ ( Q root , R root ) enqueue( Q root , R root ) while dequeue( Q, R ) // Main loop of kda if a l ( Q ) > 0 or a u ( Q ) < 0 , return ∀ q ∈ Q, a ( q ) − = δ ( Q, R ) L , a ( q ) += δ ( Q L , R L ) + δ ( Q L , R R ) ∀ q ∈ Q R , a ( q ) += δ ( Q R , R L ) + δ ( Q R , R R ) ∀ q ∈ Q L , R L ); enqueue( Q L , R R ) enqueue( Q R , R L ); enqueue( Q R , R R ) enqueue( Q Dual-tree Algorithms in Statistics – p.57/77

  44. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Dual-tree Algorithms in Statistics – p.58/77

  45. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Dual-tree Algorithms in Statistics – p.58/77

  46. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Dual-tree Algorithms in Statistics – p.58/77

  47. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes Dual-tree Algorithms in Statistics – p.58/77

  48. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes Implications for dark matter, dark energy, etc. Dual-tree Algorithms in Statistics – p.58/77

  49. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Mining for quasars in the Sloan Digital Sky Survey: Brightest objects in the universe Thus, the farthest/oldest we can see Believed to be active galactic nuclei: giant black holes Implications for dark matter, dark energy, etc. Peplow, Nature 2005 uses one of our catalogs to verify the cosmic magnification effect predicted by relativity Dual-tree Algorithms in Statistics – p.58/77

  50. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Dual-tree Algorithms in Statistics – p.59/77

  51. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building. Dual-tree Algorithms in Statistics – p.59/77

  52. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building. Naïve’s takes 380 hours, excluding bandwidth learning. Dual-tree Algorithms in Statistics – p.59/77

  53. Case Study: Quasar Identification Riegel et al. , SIAM Data Mining 2008 (Sumbitted) Richards et al. , AAS 2008 Trained a KDA classifier on 4D spectra data from about 80k known quasars and 400k non-quasars. Identified about 1m quasars from 40m unknown objects. Took 640 seconds in serial; half of that was tree-building. Naïve’s takes 380 hours, excluding bandwidth learning. Algorithmic parameters are key to performance: Hybrid breadth-depth first expansion Epanechnikov kernel (choice of f ) to maximize pruning Multi-bandwidth algorithm for faster bandwidth fitting Dual-tree Algorithms in Statistics – p.59/77

  54. Case Study: Quasar Identification LOO CV on 4D Quasar Data 4 10 Naive Heap Heap, Epan 3 Hybrid 10 Hybrid, Epan 2 10 Running Time 1 10 0 10 −1 10 −2 10 3 4 5 6 10 10 10 10 Data Set Size Dual-tree Algorithms in Statistics – p.60/77

  55. GNPs, Formally Speaking Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ , with � � ψ ( X 1 , . . . , X n ) = · · · f ( x 1 , . . . , x n ) 1 n x 1 ∈ X 1 x n ∈ X n Dual-tree Algorithms in Statistics – p.61/77

  56. GNPs, Formally Speaking Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ , with � � ψ ( X 1 , . . . , X n ) = · · · f ( x 1 , . . . , x n ) 1 n x 1 ∈ X 1 x n ∈ X n subject to decomposability requirement ψ ( . . . , X i , . . . ) = ψ ( . . . , X L i ψ ( . . . , X R i , . . . ) ⊗ i , . . . ) for all 1 ≤ i ≤ n and partitions X L i ∪ X R i = X i . Dual-tree Algorithms in Statistics – p.61/77

  57. GNPs, Formally Speaking Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Higher-order reduce problem Ψ = g ◦ ψ , with � � ψ ( X 1 , . . . , X n ) = · · · f ( x 1 , . . . , x n ) 1 n x 1 ∈ X 1 x n ∈ X n subject to decomposability requirement ψ ( . . . , X i , . . . ) = ψ ( . . . , X L i ψ ( . . . , X R i , . . . ) ⊗ i , . . . ) for all 1 ≤ i ≤ n and partitions X L i ∪ X R i = X i . We’ll also need some means of bounding the results of ψ . Dual-tree Algorithms in Statistics – p.61/77

  58. Decomposability (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗ . Dual-tree Algorithms in Statistics – p.62/77

  59. Decomposability (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗ . It is equivalent to � � � � · · · f ( x 1 , · · · , x n ) = · · · f ( x 1 , · · · , x n ) 1 n p 1 p n x 1 ∈ X 1 x n ∈ X n x p 1 ∈ X p 1 x pn ∈ X pn for all permutations p of the set { 1 , . . . , n } , Dual-tree Algorithms in Statistics – p.62/77

  60. Decomposability (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 Decomposability is restrictive; always possible for problems formed by combinations of map and some one other ⊗ . It is equivalent to � � � � · · · f ( x 1 , · · · , x n ) = · · · f ( x 1 , · · · , x n ) 1 n p 1 p n x 1 ∈ X 1 x n ∈ X n x p 1 ∈ X p 1 x pn ∈ X pn for all permutations p of the set { 1 , . . . , n } , and to ( ψ ( X L i , X L i ψ ( X R i , X L j ( ψ ( X L i , X R i ψ ( X R i , X R j ) ⊗ j )) ⊗ j ) ⊗ j )) = ( ψ ( X L i , X L j ψ ( X L i , X R i ( ψ ( X R i , X L j ψ ( X R i , X R j ) ⊗ j )) ⊗ j ) ⊗ j )) Dual-tree Algorithms in Statistics – p.62/77

  61. Decomposability � � ψ ( X, Y ) = f ( x, y ) x ∈ X y ∈ Y ( f ( x 1 , y 1 ) ⊗ f ( x 1 , y 2 ) ⊗ · · · ⊗ f ( x 1 , y M ) ) ⊙ ( f ( x 2 , y 1 ) ⊗ f ( x 2 , y 2 ) ⊗ · · · ⊗ f ( x 2 , y M ) ) ⊙ . . . ⊙ ( f ( x N , y 1 ) ⊗ f ( x N , y 2 ) ⊗ · · · ⊗ f ( x N , y M ) ) Dual-tree Algorithms in Statistics – p.63/77

  62. Decomposability ψ ( X, Y ) = ψ ( X, Y L ) ⊗ ψ ( X, Y R )     f ( x 1 , y 1 ) ( f ( x 1 , y 2 ) ⊗ · · · ⊗ f ( x 1 , y M ) ) ⊙ ⊙             f ( x 2 , y 1 ) ( f ( x 2 , y 2 ) ⊗ · · · ⊗ f ( x 2 , y M ) )         ⊙ ⊙ ⊗         . . . .     . .         ⊙ ⊙         f ( x N , y 1 ) ( f ( x N , y 2 ) ⊗ · · · ⊗ f ( x N , y M ) ) Dual-tree Algorithms in Statistics – p.64/77

  63. Transforming Problems into GNPs (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not, � � � � � � � �� · · · g n f ( x 1 , . . . , x n ) · · · g 1 g 2 1 2 n x 1 ∈ X 1 x 2 ∈ X 2 x n ∈ X n may be transformed into nested GNPs by replacing every other operator with map and factoring intermediate g i out. Dual-tree Algorithms in Statistics – p.65/77

  64. Transforming Problems into GNPs (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not, � � � � � � � �� · · · g n f ( x 1 , . . . , x n ) · · · g 1 g 2 1 2 n x 1 ∈ X 1 x 2 ∈ X 2 x n ∈ X n may be transformed into nested GNPs by replacing every other operator with map and factoring intermediate g i out. (“Parallel” GNPs.) Also GNP-able are problems such as: � j w ij K ( x i , x j ) map � j K ( x i , x j ) i Dual-tree Algorithms in Statistics – p.65/77

  65. Transforming Problems into GNPs (Planned) Riegel et al. , NIPS 2008 or JMLR 2008 (“Serial” GNPs.) Decomposable or not, � � � � � � � �� · · · g n f ( x 1 , . . . , x n ) · · · g 1 g 2 1 2 n x 1 ∈ X 1 x 2 ∈ X 2 x n ∈ X n may be transformed into nested GNPs by replacing every other operator with map and factoring intermediate g i out. (“Parallel” GNPs.) Also GNP-able are problems such as: � j w ij K ( x i , x j ) map � j K ( x i , x j ) i (“Multi” GNPs.) Wrap problem with map to vary parameter. Dual-tree Algorithms in Statistics – p.65/77

  66. The Algorithm Boyer, Riegel, and Gray’s THOR Project (Planned) Riegel et al. , ICML 2008 or JMLR 2008 “One algorithm to solve them all”: ψ ( X 1 , . . . , X n )  if bounds prove it is safe to prune to a, a   ← f ( x 1 , . . . , x n ) if each X i = { x i } , i.e. is leaf, ψ ( . . . , X L i ψ ( . . . , X R  otherwise i , . . . ) ⊗ i , . . . )  Dual-tree Algorithms in Statistics – p.66/77

Recommend


More recommend