high dimensional integration without markov chains
play

High-dimensional integration without Markov chains Alexander Gray - PowerPoint PPT Presentation

High-dimensional integration without Markov chains Alexander Gray Carnegie Mellon University School of Computer Science High-dimensional integration by: Nonparametric statistics Computational geometry Computational physics


  1. Idea #9: Sample more where the error was larger • Choose new x i with probability p i [ ] 2 ˆ − f ( x ) I q ( x ) v = = i q i v , p i ∑ i i q ( x ) v i i i • Draw from N (x i ,h*)

  2. Should we forget old points? I tried that. It doesn’t work. So I remember all the old samples.

  3. Idea #10: Incrementally update estimates   N f ( x ) ∑ ( ) ˆ ˆ = + +   Ι Ι N i / N N q q tot tot q ( x )   i i [ ]   2 ˆ − f ( x ) I q ( x ) N  ∑  ( ) ˆ ˆ ˆ ˆ 2 = + i q i + 2 V ( I ) V ( I ) N / N N   q q tot tot q ( x )   i i

  4. Overall method: FIRE Repeat: [ ] 2 ˆ − f ( x ) I q ( x ) 1. Resample N points from {x i } using = i q i v i q ( x ) { } Add to training set. i v ~ x = p i i ∑ i v T ~ Build/update i { } x i i [ ]  ~ ~  2 ˆ − f ( x ) I q ( x )   = i q h i * 2. Compute h min   ~ new q ( x )   * * * ∈ 2 3 h { h , h , h }   3 2 h i ˆ 3. Sample N points {x i } from = − α + α q () ( 1 ) f () f () 0 ˆ * 4. For each x i compute using f ( x ), T { } h i x i 5. Update I and V

  5. Properties • Because FIRE is importance sampling: – consistent – unbiased • The NWR estimate approaches f(x)/I • Somewhat reminiscent of particle filtering; EM-like; like N interacting Metropolis samplers

  6. Test problems • Thin region Anisotropic Gaussian a s 2 in off-diagonals a={0.99,0.9,0.5}, D={5,10,25,100} • Isolated modes Mixture of two normals 0.5 N (4,1) + 0.5 N (4+b,1) b={2,4,6,8,10}, D={5,10,25,100}

  7. Competitors • Standard Monte Carlo • MCMC (Metropolis-Hastings) – starting point [Gelman-Roberts-Gilks 95] – adaptation phase [Gelman-Roberts-Gilks 95] – burn-in time [Geyer 92] – multiple chains [Geyer 92] – thinning [Gelman 95]

  8. How to compare Look at its relative error over many runs When to stop it? 1. Use its actual stopping criterion 2. Use a fixed wall-clock time

  9. Anisotropic Gaussian (a=0.9,D=10) • MCMC – started at center of mass – when it wants to stop: >2 hours – after 2 hours • with best s: rel. err {24%,11%,3%,62%} • small s and large s: >250% errors • automatic s: {59%,16%,93%,71%} – ~40M samples • FIRE – when it wants to stop: ~1 hour – after 2 hours: rel. err {1%,2%,1%,1%} – ~1.5M samples

  10. Mixture of Gaussians (b=10,D=10) • MCMC – started at one mode – when it wants to stop: ~30 minutes – after 2 hours: • with best s: rel. err {54%,42%,58%,47%} • small s, large s, automatic s: similar – ~40M samples • FIRE – when it wants to stop: ~10 minutes – after 2 hours: rel. err {<1%,1%,32%,<1%} – ~1.2M samples

  11. Extension #1 Non-positive functions Positivization [Owen-Zhou 1998]

  12. Extension #2 More defensiveness, and accuracy Control variates [Veach 1997]

  13. Extension #3 More accurate regression Local linear regression

  14. Extension #4 (maybe) Fully automatic stopping Function-wide confidence bands stitch together pointwise bands, control with FDR

  15. Summary • We can do high-dimensional integration without Markov chains, by statistical inference • Promising alternative to MCMC – safer (e.g. isolated modes) – not a black art – faster • Intrinsic dimension - multiple viewpoints • MUCH more work needed – please help me!

  16. One notion of intrinsic dimension log C(r) log r ‘Correlation dimension’ Similar: notion in metric analysis

  17. N-body problems mm = K ( x , x ) i • Coulombic i a − x x i (high accuracy required)

  18. N-body problems mm = K ( x , x ) i • Coulombic i a − x x i (high accuracy required) 2 2 2 • Kernel density − − σ = x x / K ( x , x ) e i i estimation − ≤ < 2 a 1 t 0 t 1 2 / σ = − 2 = t x x K ( x , x ) i i ≥ 0 t 1 (only moderate accuracy required, often high-D)

  19. N-body problems mm = K ( x , x ) i • Coulombic i a − x x i (high accuracy required) 2 2 2 • Kernel density − − σ = x x / K ( x , x ) e i i estimation − ≤ < 2 a 1 t 0 t 1 2 / σ = − 2 = t x x K ( x , x ) i i ≥ 0 t 1 (only moderate accuracy required, often high-D) • SPH (smoothed − + 2 3 ≤ < 4 6 t 3 t 0 t 1 particle = − 3 K ( x , x ) ( 2 t ) ≤ < 1 t 2 hydrodynamics) i 0 ≥ (only moderate accuracy required) t 2 Also: different for every point, non-isotropic, edge-dependent, …

  20. N-body methods: Approximation r • Barnes-Hut s ∑ r ≈ µ K ( x , x ) N K ( x , ) s > if i R R θ i

  21. N-body methods: Approximation r • Barnes-Hut s ∑ r ≈ µ K ( x , x ) N K ( x , ) s > if i R R θ i • FMM s ∑ ∀ ≈ s > x , K ( x , x ) r multipole/Taylor expansion if i of order p i

  22. N-body methods: Runtime ≈ O ( N log N ) • Barnes-Hut ≈ non-rigorous, uniform distribution ≈ O ( N ) • FMM ≈ non-rigorous, uniform distribution

  23. N-body methods: Runtime ≈ O ( N log N ) • Barnes-Hut ≈ non-rigorous, uniform distribution ≈ O ( N ) • FMM ≈ non-rigorous, uniform distribution [Callahan-Kosaraju 95]: O(N) is impossible for log-depth tree (in the worst case)

  24. Expansions • Constants matter! p D factor is slowdown • Large dimension infeasible • Adds much complexity (software, human time) • Non-trivial to do new kernels (assuming they’re even analytic), heterogeneous kernels

  25. Expansions • Constants matter! p D factor is slowdown • Large dimension infeasible • Adds much complexity (software, human time) • Non-trivial to do new kernels (assuming they’re even analytic), heterogeneous kernels • BUT: Needed to achieve O(N) Needed to achieve high accuracy Needed to have hard error bounds

  26. Expansions • Constants matter! p D factor is slowdown • Large dimension infeasible • Adds much complexity (software, human time) • Non-trivial to do new kernels (assuming they’re even analytic), heterogeneous kernels • BUT: Needed to achieve O(N) (?) Needed to achieve high accuracy (?) Needed to have hard error bounds (?)

  27. N-body methods: Adaptivity • Barnes-Hut recursive � can use any kind of tree • FMM hand-organized control flow � requires grid structure quad-tree/oct-tree not very adaptive kd -tree adaptive ball-tree/metric tree very adaptive

  28. kd -trees: most widely-used space- partitioning tree [Friedman, Bentley & Finkel 1977] • Univariate axis-aligned splits • Split on widest dimension • O(N log N) to build, O(N) space

  29. A kd- tree: level 1

  30. A kd -tree: level 2

  31. A kd -tree: level 3

  32. A kd- tree: level 4

  33. A kd -tree: level 5

  34. A kd -tree: level 6

  35. A ball-tree: level 1 [Uhlmann 1991], [Omohundro 1991]

  36. A ball-tree: level 2

  37. A ball-tree: level 3

  38. A ball-tree: level 4

  39. A ball-tree: level 5

  40. N-body methods: Comparison Barnes-Hut FMM runtime O ( N log N ) O(N) expansions optional required simple,recursive? yes no adaptive trees? yes no error bounds? no yes

  41. Questions • What’s the magic that allows O(N) ? Is it really because of the expansions? • Can we obtain an method that’s: 1. O(N) 2. lightweight: works with or without ..............................expansions simple, recursive

  42. New algorithm • Use an adaptive tree ( kd -tree or ball-tree) • Dual-tree recursion • Finite-difference approximation

  43. Single-tree : Dual-tree (symmetric):

  44. Simple recursive algorithm SingleTree (q,R) { if approximate (q,R), return. if leaf(R), SingleTreeBase (q,R). else, SingleTree (q,R.left). SingleTree (q,R.right). } (NN or range-search: recurse on the closer node first)

  45. Simple recursive algorithm DualTree (Q,R) { if approximate (Q,R), return. if leaf(Q) and leaf(R), DualTreeBase (Q,R). else, DualTree (Q.left,R.left). DualTree (Q.left,R.right). DualTree (Q.right,R.left). DualTree (Q.right,R.right). } (NN or range-search: recurse on the closer node first)

  46. Dual-tree traversal (depth-first) Reference points Query points

  47. Dual-tree traversal Reference points Query points

  48. Dual-tree traversal Reference points Query points

  49. Dual-tree traversal Reference points Query points

  50. Dual-tree traversal Reference points Query points

  51. Dual-tree traversal Reference points Query points

  52. Dual-tree traversal Reference points Query points

  53. Dual-tree traversal Reference points Query points

  54. Dual-tree traversal Reference points Query points

  55. Dual-tree traversal Reference points Query points

  56. Dual-tree traversal Reference points Query points

  57. Dual-tree traversal Reference points Query points

  58. Dual-tree traversal Reference points Query points

  59. Dual-tree traversal Reference points Query points

Recommend


More recommend