power law size distributions
play

Power Law Size Distributions Overview Introduction Principles of - PowerPoint PPT Presentation

Power Law Size Distributions Power Law Size Distributions Overview Introduction Principles of Complex Systems Examples Zipfs law Course 300, Fall, 2008 Wild vs. Mild CCDFs References Prof. Peter Dodds Department of Mathematics &


  1. Power Law Size Distributions Power Law Size Distributions Overview Introduction Principles of Complex Systems Examples Zipf’s law Course 300, Fall, 2008 Wild vs. Mild CCDFs References Prof. Peter Dodds Department of Mathematics & Statistics University of Vermont Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License . Frame 1/33

  2. Power Law Size Outline Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs Overview References Introduction Examples Zipf’s law Wild vs. Mild CCDFs References Frame 2/33

  3. Power Law Size The Don Distributions Overview Introduction Extreme deviations in test cricket Examples Zipf’s law Wild vs. Mild CCDFs References 0 10 20 30 40 50 60 70 80 90 100 Don Bradman’s batting average = 166% next best. Frame 3/33

  4. Power Law Size Size distributions Distributions Overview Introduction Examples The sizes of many systems’ elements appear to obey an Zipf’s law Wild vs. Mild inverse power-law size distribution: CCDFs References P ( size = x ) ∼ c x − γ where x min < x < x max and γ > 1 ◮ Typically, 2 < γ < 3. ◮ x min = lower cutoff ◮ x max = upper cutoff Frame 5/33

  5. Power Law Size Size distributions Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs References ◮ Usually, only the tail of the distribution obeys a power law: P ( x ) ∼ c x − γ as x → ∞ . ◮ Still use term ‘power law distribution’ Frame 6/33

  6. Power Law Size Size distributions Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild Many systems have discrete sizes k : CCDFs References ◮ Word frequency ◮ Node degree (as we have seen): # hyperlinks, etc. ◮ number of citations for articles, court decisions, etc. P ( k ) ∼ c k − γ where k min ≤ k ≤ k max Frame 7/33

  7. Power Law Size Size distributions Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs References Power law size distributions are sometimes called Pareto distributions after Italian scholar Vilfredo Pareto. ◮ Pareto noted wealth in Italy was distributed unevenly (80–20 rule). ◮ Term used especially by economists Frame 8/33

  8. Power Law Size Size distributions Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs References ◮ Negative linear relationship in log-log space: log P ( x ) = log c − γ log x Frame 9/33

  9. Power Law Size Size distributions Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs Examples: References ◮ Earthquake magnitude (Gutenberg Richter law): P ( M ) ∝ M − 3 ◮ Number of war deaths: P ( d ) ∝ d − 1 . 8 ◮ Sizes of forest fires ◮ Sizes of cities: P ( n ) ∝ n − 2 . 1 ◮ Number of links to and from websites Frame 11/33

  10. Power Law Size Size distributions Distributions Overview Introduction Examples Examples: Zipf’s law Wild vs. Mild CCDFs ◮ Number of citations to papers: P ( k ) ∝ k − 3 . References ◮ Individual wealth (maybe): P ( W ) ∝ W − 2 . ◮ Distributions of tree trunk diameters: P ( d ) ∝ d − 2 . ◮ The gravitational force at a random point in the universe: P ( F ) ∝ F − 5 / 2 . ◮ Diameter of moon craters: P ( d ) ∝ d − 3 . ◮ Word frequency: e.g., P ( k ) ∝ k − 2 . 2 (variable) (Note: Exponents range in error; see M.E.J. Newman arxiv.org/cond-mat/0412004v3 ( ⊞ )) Frame 12/33

  11. Power Law Size Size distributions Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs Power-law distributions are.. References ◮ often called ‘heavy-tailed’ ◮ or said to have ‘fat tails’ Important!: ◮ Inverse power laws aren’t the only ones: ◮ lognormals, stretched exponentials, ... Frame 13/33

  12. Power Law Size Zipfian rank-frequency plots Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs George Kingsley Zipf: References ◮ noted various rank distributions followed power laws, often with exponent -1 (word frequency, city sizes...) “Human Behaviour and the Principle of Least-Effort” [2] Addison-Wesley, Cambridge MA, 1949. ◮ We’ll study Zipf’s law in depth... Frame 15/33

  13. Power Law Size Zipfian rank-frequency plots Distributions Overview Introduction Examples Zipf’s law Zipf’s way: Wild vs. Mild CCDFs ◮ s i = the size of the i th ranked object. References ◮ i = 1 corresponds to the largest size. ◮ s 1 could be the frequency of occurrence of the most common word in a text. ◮ Zipf’s observation: s i ∝ i − α Frame 16/33

  14. Power Law Size Power law distributions Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs References Gaussians versus power-law distributions: ◮ Example: Height versus wealth. ◮ Mild versus Wild (Mandelbrot) ◮ Mediocristan versus Extremistan (See “The Black Swan” by Nassim Taleb [1] ) Frame 18/33

  15. Power Law Size Turkeys... Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs References Frame 19/33 From “The Black Swan” [1]

  16. Taleb’s table [1] Power Law Size Distributions Overview Introduction Examples Mediocristan/Extremistan Zipf’s law Wild vs. Mild CCDFs ◮ Most typical member is mediocre/Most typical is either References giant or tiny ◮ Winners get a small segment/Winner take almost all effects ◮ When you observe for a while, you know what’s going on/ It takes a very long time to figure out what’s going on ◮ Prediction is easy/Prediction is hard ◮ History crawls/History makes jumps ◮ Tyranny of the collective/Tyranny of the accidental Frame 20/33

  17. Power Law Size Complementary Cumulative Distribution Distributions Function: Overview CCDF: Introduction Examples Zipf’s law ◮ Wild vs. Mild P ≥ ( x ) = P ( x ′ ≥ x ) = 1 − P ( x ′ < x ) CCDFs References ◮ � ∞ P ( x ′ ) d x ′ = x ′ = x ◮ � ∞ ( x ′ ) − γ d x ′ ∝ x ′ = x ◮ ∞ 1 � − γ + 1 ( x ′ ) − γ + 1 � = � � x ′ = x ◮ ∝ x − γ + 1 Frame 22/33

  18. Power Law Size Complementary Cumulative Distribution Distributions Function: Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs References CCDF: ◮ P ≥ ( x ) ∝ x − γ + 1 ◮ Use when tail of P follows a power law. ◮ Increases exponent by one. ◮ Useful in cleaning up data. Frame 23/33

  19. Power Law Size Complementary Cumulative Distribution Distributions Function: Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs ◮ Discrete variables: References P ≥ ( k ) = P ( k ′ ≥ k ) ∞ � = P ( k ) k ′ = k ∝ k − γ + 1 ◮ Use integrals to approximate sums. Frame 24/33

  20. Power Law Size Size distributions Distributions Overview Introduction Brown Corpus (1,015,945 words): Examples Zipf’s law CCDF: Zipf: Wild vs. Mild CCDFs 3.5 1 References 3 0.5 2.5 0 2 N > n −0.5 n i 1.5 −1 1 −1.5 0.5 −2 0 −2.5 0 0.5 1 1.5 2 2.5 3 3.5 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 rank i n ◮ The, of, and, to, a, ... = ‘objects’ ◮ ‘Size’ = word frequency ◮ Beep: CCDF and Zipf plots are related... Frame 25/33

  21. Power Law Size Size distributions Distributions Observe: Overview ◮ NP ≥ ( x ) = the number of objects with size at least x Introduction Examples Zipf’s law where N = total number of objects. Wild vs. Mild CCDFs ◮ If an object has size x i , then NP ≥ ( x i ) is its rank i . References ◮ So x i ∝ i − α = ( NP ≥ ( x i )) − α ∝ x ( − γ + 1 )( − α ) i Since P ≥ ( x ) ∼ x − γ + 1 , 1 α = γ − 1 A rank distribution exponent of α = 1 corresponds to Frame 26/33 a size distribution exponent γ = 2.

  22. Power Law Size Details on the lack of scale: Distributions Overview Introduction Examples Zipf’s law Let’s find the mean: Wild vs. Mild CCDFs References ◮ � x max � x � = xP ( x ) d x x = x min � x max xx − γ d x = c x = x min c � � x 2 − γ max − x 2 − γ = . min 2 − γ Frame 27/33

  23. Power Law Size The mean: Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs c � � References x 2 − γ max − x 2 − γ � x � ∼ . min 2 − γ ◮ Mean blows up with upper cutoff if γ < 2. ◮ Mean depends on lower cutoff if γ > 2. ◮ γ < 2: Typical sample is large. ◮ γ > 2: Typical sample is small. Frame 28/33

  24. Power Law Size And in general... Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs References Moments: ◮ All moments depend only on cutoffs. ◮ No internal scale dominates (even matters). ◮ Compare to a Gaussian, exponential, etc. Frame 29/33

  25. Power Law Size Moments Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild CCDFs For many real size distributions: References 2 < γ < 3 ◮ mean is finite (depends on lower cutoff) ◮ σ 2 = variance is ‘infinite’ (depends on upper cutoff) ◮ Width of distribution is ‘infinite’ Frame 30/33

  26. Power Law Size Moments Distributions Overview Introduction Examples Zipf’s law Wild vs. Mild Standard deviation is a mathematical convenience!: CCDFs References ◮ Variance is nice analytically... ◮ Another measure of distribution width: Mean average deviation (MAD) = �| x − � x �|� ◮ MAD is unpleasant analytically... Frame 31/33

Recommend


More recommend