Principles of Database Systems V. Megalooikonomou Fractals and Databases (based on notes by C. Faloutsos at CMU)
Indexing - Detailed outline fractals intro applications 2
Intro to fractals - outline Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots 3
Problem # 1: GIS - points Road end-points of Montgomery county: •Q1: how many d.a. for an R-tree? •Q2 : distribution? •not uniform •not Gaussian •no rules?? 4
Problem # 2 - spatial d.m. Galaxies (Sloan Digital Sky Survey -B. Nichol) - ‘spiral’ and ‘elliptical’ galaxies (stores and households ...) - patterns? - attraction/ repulsion? - how many ‘spi’ within r from an ‘ell’? 5
Problem # 3: traffic disk trace (from HP - J. Wilkes); Web traffic - fit a model # bytes - how many explosions to expect? Poisson - queue length distr.? time 6
Common answer: Fractals / self-similarities / power laws Seminal works from Hilbert, Minkowski, Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …) 7
Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots 8
What is a fractal? = self-similar point set, e.g., Sierpinski triangle: zero area; ... infinite perimeter! 9
Definitions (cont’d) Paradox: Infinite perimeter ; Zero area! ‘dimensionality’: between 1 and 2 actually: Log(3)/Log(2) = 1.58... 10
Dfn of fd: ONLY for a perfectly self-similar point set: zero area; ... infinite length! = log(n)/ log(f) = log(3)/ log(2) = 1.58 a perfectly self-similar object with n similar pieces each scaled down by a factor f 11
Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: 1 (= log(2)/log(2)!) 12
Intrinsic (‘fractal’) dimension Q: dfn for a given set of points? x y 5 1 4 2 3 3 2 4 13
Intrinsic (‘fractal’) dimension Q: fractal dimension of Q: fd of a plane? a line? A: nn ( < = r ) ~ r^ 2 A: nn ( < = r ) ~ r^ 1 fd= = slope of (log(nn) vs (‘power law’: y= x^ a) log(r) ) 14
Intrinsic (‘fractal’) dimension Algorithm, to estimate it? Notice avg nn(< = r) is exactly tot# pairs(< = r) / (2* N) including ‘mirror’ pairs 15
Sierpinsky triangle = = ‘correlation integral’ log(# pairs within < = r ) 1.58 log( r ) 16
Observations: Euclidean objects have integer fractal dimensions point: 0 lines and smooth curves: 1 smooth surfaces: 2 fractal dimension -> roughness of the periphery 17
Important properties fd = embedding dimension -> uniform pointset a point set may have several fd, depending on scale 18
Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots 19
Problem # 1: GIS points Cross-roads of Montgomery county: •any rules? 20
Solution # 1 A: self-similarity -> log(# pairs(within < = r)) < = > fractals < = > scale-free < = > power-laws 1.51 (y= x^ a, F= C* r^ (- 2)) avg# neighbors(< = r ) = r^ D log( r ) 21
Solution # 1 A: self-similarity log(# pairs(within < = r)) avg# neighbors(< = r ) ~ r^ (1.51) 1.51 log( r ) 22
Examples:MG county Montgomery County of MD (road end- points) 23
Examples:LB county Long Beach county of CA (road end- points) 24
Solution# 2: spatial d.m. Galaxies ( ‘BOPS’ plot - [sigmod2000]) log(# pairs) log(r) 25
Solution# 2: spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - spi-spi repulsion! spi-ell log(r) 26
spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - spi-spi repulsion! spi-ell log(r) 27
spatial d.m. r1 r2 Heuristic on choosing # of clusters r2 r1 28
spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - spi-spi repulsion! spi-ell log(r) 29
spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - repulsion spi-spi - duplicates !! spi-ell log(r) 30
Solution # 3: traffic disk traces: self-similar: # bytes time 31
Solution # 3: traffic disk traces (80-20 ‘law’ = ‘multifractal’) 20% 80% # bytes time 32
Solution# 3: traffic Clarification: fractal: a set of points that is self-similar multifractal: a probability density function that is self-similar Many other time-sequences are bursty/clustered: (such as?) 33
Tape accesses # tapes needed, to retrieve n records? Tape# 1 Tape# N (# days down, due to failures / hurricanes / communication time noise...) 34
Tape accesses 50-50 = Poisson # tapes retrieved Tape# 1 Tape# N real time # qual. records 35
Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More tools and examples Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots 36
More tools Zipf’s law Korcak’s law / “fat fractals” 37
A famous power law: Zipf’s law • Q: vocabulary word frequency in a document - any pattern? freq. aaron zoo 38
A famous power law: Zipf’s law log(freq) “a” • Bible - rank vs frequency (log- “the” log) log(rank) 39
A famous power law: Zipf’s law log(freq) • Bible - rank vs frequency (log-log) • similarly, in many other languages; for customers and log(rank) sales volume; city populations etc etc 40
A famous power law: Zipf’s law log(freq) •Zipf distr: freq = 1/ rank •generalized Zipf: freq = 1 / (rank)^ a log(rank) 41
Olympic medals (Sidney): log(# medals) 2.5 2 1.5 Series1 Linear (Series1) 1 y = -0.9676x + 2.3054 R 2 = 0.9458 0.5 0 0 0.5 1 1.5 2 rank 42
More power laws: areas – Korcak’s law Scandinavian lakes Any pattern? 43
More power laws: areas – Korcak’s law log(count( > = area)) Scandinavian lakes area vs complementary cumulative count log(area) (log-log axes) 44
More power laws: Korcak Japan islands 45
More power laws: Korcak log(count( > = area)) Japan islands; area vs cumulative log(area) count (log-log axes) 46
(Korcak’s law: Aegean islands) 47
Korcak’s law & “fat fractals” How to generate such regions? 48
Korcak’s law & “fat fractals” Q: How to generate such regions? A: recursively, from a single region 49
so far we’ve seen: concepts: fractals, multifractals and fat fractals tools: correlation integral (= pair-count plot) rank/frequency plot (Zipf’s law) CCDF (Korcak’s law) 50
Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More tools and examples Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots 51
Other applications: Internet How does the internet look like? CMU 52
Other applications: Internet How does the internet look like? Internet routers: how many neighbors within h hops? CMU 53
(reminder: our tool-box:) concepts: fractals, multifractals and fat fractals tools: correlation integral (= pair-count plot) rank/frequency plot (Zipf’s law) CCDF (Korcak’s law) 54
Internet topology Internet routers: how many neighbors within h hops? log(# pairs) Reachability function: number of neighbors 2.8 within r hops, vs r (log- log). Mbone routers, 1995 log(hops ) 55
More power laws on the Internet log(degree) -0.82 log(rank) degree vs rank, for Internet domains (log-log) [sigcomm99] 56
More power laws - internet pdf of degrees: (slope: 2.2 ) Log(count) -2.2 Log(degree) 57
Even more power laws on the Internet log( i-th eigenvalue) 0.47 log(i) Scree plot for Internet domains (log-log) [sigcomm99] 58
More apps: Brain scans Oct-trees; brain-scans Log(# octants) 2.63 = fd octree levels 59
More apps: Medical images [Burdett et al, SPIE ‘93]: benign tumors: fd ~ 2.37 malignant: fd ~ 2.56 60
More fractals: cardiovascular system: 3 (!) stock prices (LYCOS) - random walks: 1.5 1 year 2 years Coastlines: 1.2-1.58 (Norway!) 61
62
Recommend
More recommend