principles of database systems
play

Principles of Database Systems V. Megalooikonomou Fractals and - PowerPoint PPT Presentation

Principles of Database Systems V. Megalooikonomou Fractals and Databases (based on notes by C. Faloutsos at CMU) Indexing - Detailed outline fractals intro applications 2 Intro to fractals - outline Motivation 3 problems /


  1. Principles of Database Systems V. Megalooikonomou Fractals and Databases (based on notes by C. Faloutsos at CMU)

  2. Indexing - Detailed outline  fractals  intro  applications 2

  3. Intro to fractals - outline  Motivation – 3 problems / case studies  Definition of fractals and power laws  Solutions to posed problems  More examples and tools  Discussion - putting fractals to work!  Conclusions – practitioner’s guide  Appendix: gory details - boxcounting plots 3

  4. Problem # 1: GIS - points Road end-points of Montgomery county: •Q1: how many d.a. for an R-tree? •Q2 : distribution? •not uniform •not Gaussian •no rules?? 4

  5. Problem # 2 - spatial d.m. Galaxies (Sloan Digital Sky Survey -B. Nichol) - ‘spiral’ and ‘elliptical’ galaxies (stores and households ...) - patterns? - attraction/ repulsion? - how many ‘spi’ within r from an ‘ell’? 5

  6. Problem # 3: traffic  disk trace (from HP - J. Wilkes); Web traffic - fit a model # bytes - how many explosions to expect? Poisson - queue length distr.? time 6

  7. Common answer:  Fractals / self-similarities / power laws  Seminal works from Hilbert, Minkowski, Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …) 7

  8. Road map  Motivation – 3 problems / case studies  Definition of fractals and power laws  Solutions to posed problems  More examples and tools  Discussion - putting fractals to work!  Conclusions – practitioner’s guide  Appendix: gory details - boxcounting plots 8

  9. What is a fractal? = self-similar point set, e.g., Sierpinski triangle: zero area; ... infinite perimeter! 9

  10. Definitions (cont’d)  Paradox: Infinite perimeter ; Zero area!  ‘dimensionality’: between 1 and 2  actually: Log(3)/Log(2) = 1.58... 10

  11. Dfn of fd: ONLY for a perfectly self-similar point set: zero area; ... infinite length! = log(n)/ log(f) = log(3)/ log(2) = 1.58 a perfectly self-similar object with n similar pieces each scaled down by a factor f 11

  12. Intrinsic (‘fractal’) dimension  Q: fractal dimension of a line?  A: 1 (= log(2)/log(2)!) 12

  13. Intrinsic (‘fractal’) dimension  Q: dfn for a given set of points? x y 5 1 4 2 3 3 2 4 13

  14. Intrinsic (‘fractal’) dimension  Q: fractal dimension of  Q: fd of a plane? a line?  A: nn ( < = r ) ~ r^ 2  A: nn ( < = r ) ~ r^ 1 fd= = slope of (log(nn) vs (‘power law’: y= x^ a) log(r) ) 14

  15. Intrinsic (‘fractal’) dimension  Algorithm, to estimate it? Notice  avg nn(< = r) is exactly tot# pairs(< = r) / (2* N) including ‘mirror’ pairs 15

  16. Sierpinsky triangle = = ‘correlation integral’ log(# pairs within < = r ) 1.58 log( r ) 16

  17. Observations:  Euclidean objects have integer fractal dimensions  point: 0  lines and smooth curves: 1  smooth surfaces: 2  fractal dimension -> roughness of the periphery 17

  18. Important properties  fd = embedding dimension -> uniform pointset  a point set may have several fd, depending on scale 18

  19. Road map  Motivation – 3 problems / case studies  Definition of fractals and power laws  Solutions to posed problems  More examples and tools  Discussion - putting fractals to work!  Conclusions – practitioner’s guide  Appendix: gory details - boxcounting plots 19

  20. Problem # 1: GIS points Cross-roads of Montgomery county: •any rules? 20

  21. Solution # 1 A: self-similarity -> log(# pairs(within < = r))  < = > fractals  < = > scale-free  < = > power-laws 1.51 (y= x^ a, F= C* r^ (- 2))  avg# neighbors(< = r ) = r^ D log( r ) 21

  22. Solution # 1 A: self-similarity log(# pairs(within < = r))  avg# neighbors(< = r ) ~ r^ (1.51) 1.51 log( r ) 22

  23. Examples:MG county  Montgomery County of MD (road end- points) 23

  24. Examples:LB county  Long Beach county of CA (road end- points) 24

  25. Solution# 2: spatial d.m. Galaxies ( ‘BOPS’ plot - [sigmod2000]) log(# pairs) log(r) 25

  26. Solution# 2: spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - spi-spi repulsion! spi-ell log(r) 26

  27. spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - spi-spi repulsion! spi-ell log(r) 27

  28. spatial d.m. r1 r2 Heuristic on choosing # of clusters r2 r1 28

  29. spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - spi-spi repulsion! spi-ell log(r) 29

  30. spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - repulsion spi-spi - duplicates !! spi-ell log(r) 30

  31. Solution # 3: traffic  disk traces: self-similar: # bytes time 31

  32. Solution # 3: traffic  disk traces (80-20 ‘law’ = ‘multifractal’) 20% 80% # bytes time 32

  33. Solution# 3: traffic Clarification:  fractal: a set of points that is self-similar  multifractal: a probability density function that is self-similar Many other time-sequences are bursty/clustered: (such as?) 33

  34. Tape accesses # tapes needed, to retrieve n records? Tape# 1 Tape# N (# days down, due to failures / hurricanes / communication time noise...) 34

  35. Tape accesses 50-50 = Poisson # tapes retrieved Tape# 1 Tape# N real time # qual. records 35

  36. Road map  Motivation – 3 problems / case studies  Definition of fractals and power laws  Solutions to posed problems  More tools and examples  Discussion - putting fractals to work!  Conclusions – practitioner’s guide  Appendix: gory details - boxcounting plots 36

  37. More tools  Zipf’s law  Korcak’s law / “fat fractals” 37

  38. A famous power law: Zipf’s law • Q: vocabulary word frequency in a document - any pattern? freq. aaron zoo 38

  39. A famous power law: Zipf’s law log(freq) “a” • Bible - rank vs frequency (log- “the” log) log(rank) 39

  40. A famous power law: Zipf’s law log(freq) • Bible - rank vs frequency (log-log) • similarly, in many other languages; for customers and log(rank) sales volume; city populations etc etc 40

  41. A famous power law: Zipf’s law log(freq) •Zipf distr: freq = 1/ rank •generalized Zipf: freq = 1 / (rank)^ a log(rank) 41

  42. Olympic medals (Sidney): log(# medals) 2.5 2 1.5 Series1 Linear (Series1) 1 y = -0.9676x + 2.3054 R 2 = 0.9458 0.5 0 0 0.5 1 1.5 2 rank 42

  43. More power laws: areas – Korcak’s law Scandinavian lakes Any pattern? 43

  44. More power laws: areas – Korcak’s law log(count( > = area)) Scandinavian lakes area vs complementary cumulative count log(area) (log-log axes) 44

  45. More power laws: Korcak Japan islands 45

  46. More power laws: Korcak log(count( > = area)) Japan islands; area vs cumulative log(area) count (log-log axes) 46

  47. (Korcak’s law: Aegean islands) 47

  48. Korcak’s law & “fat fractals” How to generate such regions? 48

  49. Korcak’s law & “fat fractals” Q: How to generate such regions? A: recursively, from a single region 49

  50. so far we’ve seen:  concepts:  fractals, multifractals and fat fractals  tools:  correlation integral (= pair-count plot)  rank/frequency plot (Zipf’s law)  CCDF (Korcak’s law) 50

  51. Road map  Motivation – 3 problems / case studies  Definition of fractals and power laws  Solutions to posed problems  More tools and examples  Discussion - putting fractals to work!  Conclusions – practitioner’s guide  Appendix: gory details - boxcounting plots 51

  52. Other applications: Internet  How does the internet look like? CMU 52

  53. Other applications: Internet  How does the internet look like?  Internet routers: how many neighbors within h hops? CMU 53

  54. (reminder: our tool-box:)  concepts:  fractals, multifractals and fat fractals  tools:  correlation integral (= pair-count plot)  rank/frequency plot (Zipf’s law)  CCDF (Korcak’s law) 54

  55. Internet topology  Internet routers: how many neighbors within h hops? log(# pairs) Reachability function: number of neighbors 2.8 within r hops, vs r (log- log). Mbone routers, 1995 log(hops ) 55

  56. More power laws on the Internet log(degree) -0.82 log(rank) degree vs rank, for Internet domains (log-log) [sigcomm99] 56

  57. More power laws - internet  pdf of degrees: (slope: 2.2 ) Log(count) -2.2 Log(degree) 57

  58. Even more power laws on the Internet log( i-th eigenvalue) 0.47 log(i) Scree plot for Internet domains (log-log) [sigcomm99] 58

  59. More apps: Brain scans  Oct-trees; brain-scans Log(# octants) 2.63 = fd octree levels 59

  60. More apps: Medical images [Burdett et al, SPIE ‘93]:  benign tumors: fd ~ 2.37  malignant: fd ~ 2.56 60

  61. More fractals:  cardiovascular system: 3 (!)  stock prices (LYCOS) - random walks: 1.5 1 year 2 years  Coastlines: 1.2-1.58 (Norway!) 61

  62. 62

Recommend


More recommend