Steven Phillips with Miro Dudik & Rob Schapire Modeling species - PowerPoint PPT Presentation

Maximum entropy modeling of species geographic distributions Steven Phillips with Miro Dudik & Rob Schapire

Modeling species distributions Yellow-throated Vireo occurrence points … environmental Predicted distribution variables 2

Estimating a probability distribution Given: • Map divided into cells • Environmental variables, with values in each cell • Occurrence points: samples from an unknown distribution Our task is to estimate the unknown probability distribution Note: • The distribution sums to 1 over the whole map • Different from estimating probability of presence • Pr( t | y =1) instead of Pr( y =1| x ) ( t =cell, y =response, x =environ) 3

The Maximum Entropy Method Origins: Jaynes 1957, statistical mechanics Recent use: machine learning, eg. automatic language translation macroecology: SAD, SAR (Harte et al. 2009) To estimate an unknown distribution: 1. Determine what you know (constraints) 2. Among distributions satisfying constraints: Output the one with maximum entropy 4

Entropy More entropy : more spread out, closer to uniform distribution 2 nd law of thermodynamics: - Without external influences, a system moves to increase entropy Maximum entropy method: - Apply constraints to remove external influences - Species spreads out to fill areas with suitable conditions 5

Using Maxent for Species Distributions “Features” “Constraints” “Regularization” Free software: www.cs.princeton.edu/~schapire/maxent/ 6

Features impose constraints Feature = environmental variable, or function thereof precipitation sample average temperature find distribution of maximum entropy such that find distribution such that for all features f : mean( f ) = sample average of f for all features f : mean( f ) = sample average of f 7

Features Environmental variables or simple functions thereof. Maxent software has these classes of features (others are possible): 1. Linear … variable itself 2. Quadratic … square of variable 3. Product … product of two variables 4. Binary (indicator) … membership in a category 1 5. Threshold … 0 Environmental variable 1 6. Hinge … 0 Environmental variable 8

Constraints Each feature type imposes constraints on output distribution Linear features … mean Quadratic features … variance Product features … covariance Threshold features … proportion above threshold Hinge features … mean above threshold Binary features (categorical) … proportion in each category 9

Regularization confidence region precipitation sample average true mean temperature find distribution of maximum entropy such that Mean( f) in confidence region of sample average of f 10

The Maxent distribution … is always a Gibbs distribution: q λ (x) = exp( Σ j λ j f j (x)) / Z Z is a scaling factor so distribution sums to 1 f j is the j’th feature λ j is a coefficient, calculated by the program 11

Maxent is penalized maximum likelihood Log likelihood: LogLikelihood(q λ ) = 1/m Σ i ln(q λ (x i )) where x 1 … x m are the occurrence points. Maxent maximizes regularized likelihood: LogLikelihood(q λ ) - Σ j β j | λ j | where β j is the width of the confidence interval for f j Similar to Akaike Information Criterion (AIC), lasso. 12

Performance guarantees If true mean lies in confidence region then for best Gibbs q λ : β Maxent software: β tuned on a reference data set 13

Estimating probability of presence • Prevalence: Number of sites where the species is present, or sum of probability of presence • Prevalence not identifiable from occurrence data (Ward et al. 2009) – Example: sparrow and sparrow-hawk – Both have same range map – Both have same geographic distribution of occurrences – Hawk is rarer within its range: lower prevalence • Probability of presence & prevalence depend on sampling: – Site size – Observation time 14

Logistic output format • Minimax: maximize performance for worst-case prevalence • Exponential → logistic model – Offset term: entropy • Scaled so “typical” presences have value 0.5 15

Response curves • How does probability of presence depend on each variable? • Simple features → simpler model • Complex features → complex model • Linear + quadratic (top) • Threshold features (middle) • All feature types (bottom) 16

Effect of regularization: multiplier = 0.2 Smaller confidence Intervals Lower entropy Less spread-out 17

Effect of regularization: over-fitting Regularization multiplier = 1.0 Regularization multiplier = 0.2 (not over-fit) (clearly over-fit) 18

The dangers of bias • Virtual species in Ontario, Canada – prefers mid-range of all climatic variables 19

Boosted regression tree model: biased p/a data Presence-absence model recovers species distribution 20

Model from biased occurrence data Model recovers sampling bias, not species distribution 21

Correcting bias: golden-crowned kinglet AUC=0.3 Maxent model from biased occurrence data 22

Correcting bias with target-group background AUC=0.8 Infer sampling distribution from other species’ records – “Target group”, collected by same methods 23

Aligning Conservation Priorities Across Taxa in Madagascar w ith High- Resolution Planning Tools C. Kremen, A. Cameron et al. Science 3 2 0 , 222 (2008) 24

Madagascar: Opportunity Knocks ? 2002: 1.7 million ha = 2.9% 2003 Durban Vision: 6 million ha = 10% ? 2006: 3.88 million ha = 6.3% ? ? ? ? ? 25

Study outline • Gather biodiversity data • 2315 species: lemurs, frogs, geckos, ants, butterflies, plants • Presences only, limited data, sampling biases • Model species distributions: Maxent • New reserve selection software : Zonation • 1 km2 resolution for entire country • > 700,000 units 26

Mystrium mysticum, dracula ant 27

Adansonia grandidieri, Grandidier’s baobab 28

Uroplatus fimbriatus, common leaf-tailed gecko 29

Indri indri 30

Propithecus diadema, diademed sifaka 31

Grandidier’s baobab Dracula ant I ndri 32

Starting from PA system: Ideal = unconstrained Constrained, optimized optimized Includes temporary areas through 2006 Multi-taxon Solutions TOP 5% 5 to 10% 33 10 to 15%

Spare slides 34

Maximum Entropy Principle The fact that a certain probability distribution maximizes entropy subject to certain constraints representing our incomplete information, is the fundamental property which justifies the use of that distribution for inference; it agrees with everything that is known but carefully avoids assuming anything that is not known (Jaynes, 1990). 35

Maximizing “gain” Unregularized gain: Gain(q λ ) = Log likelihood - ln(1/n) E.g. if UGain=1.5, then average training sample is exp(1.5) (about 4.5) times more likely than a random background pixel Maxent maximizes regularized gain: Gain(q λ ) - Σ j β j | λ j | 36

Maxent algorithms Goal: maximize the regularized gain Algorithm: Start with uniform distribution (gain=0) Iteratively update λ to increase the gain The gain is convex: • Variety of algorithms: gradient descent, conjugate gradient, Newton, iterative scaling • Our algorithm: coordinate descent 37

Interpretation of regularization 38

Conditional vs unconditional Maxent One class: • Distribution over sites: p(x|y=1) Maximize entropy: - Σ p(x|y=1) ln(p(x|y=1)) • Multiclass: • Conditional probability of presence: Pr(y| z ) Maximize conditional entropy: - Σ p’( z ) p(y| z ) ln(p(y| z )) • Notation: • y 0 or 1, species presence • x a site in our study region • z a vector of environmental conditions • p’(z) the empirical probability of z 39

Effect of regularization: multiplier = 5 Larger confidence Intervals Higher entropy More spread-out 40

Sample selection bias in Ontario birds 41

Performance guarantees Solution SOL returned by Maxent is almost as good as the best q λ relative entropy (KL divergence) Guarantees should depend on � number of samples m � number of features n (or “complexity” of features) � “complexity” of the best q λ 42

Steven Phillips with Miro Dudik & Rob Schapire Modeling species - PowerPoint PPT Presentation

Maximum entropy modeling of species geographic distributions Steven Phillips with Miro Dudik & Rob Schapire Modeling species distributions Yellow-throated Vireo occurrence points environmental Predicted distribution variables 2

ST PHILLIPS ISLAND PLANTATION SERVICES Boarding.jpg ST PHILLIPS ISLAND PLANTATION SERVICES

Presented by: Ken Phillips Phillips Associates February 21, 2018 Phillips Associates 1 Agenda

Testing Observability Amy Phillips Testing Observability | Amy Phillips | @amyjph Amy

The Kalman Filter An Algorithm for Dealing with Uncertainty Steven Janke May 2011 Steven Janke

James Phillips Andrew Smith James Phillips: NLD Young Match Officials Programme Lead My role

Capture Elusive Level 3 Data: The Secrets of Survey Design Presented by: Ken Phillips

Phillips Road Roadway Improvements Hainesport Township, NJ The Presidential Center, Lincoln

Inter Partes Reviews Tales From the Trenches Matthew C. Phillips Laurence & Phillips IP

GENI WiMax @ GEC11 Coverage Measurements Caleb Phillips caleb.phillips@colorado.edu Dirk

Zumastor: Enterprise NAS for Linux Daniel Phillips phillips@google.com or: It is high time Tux

2 3 4 5 6 7 Baltzan,P. & Phillips, A., 2010. Business Driven Technology, 4 th edition .

Extensions of valuations to the Henselization and completion Steven Dale Cutkosky Steven Dale

Shadows in Computer Graphics Steven Janke November 2014 Steven Janke (Seminar) Shadows in

MACRA vs MIPS Steven L. Phillips, MD Medical Director, Sanford Center for Aging Professor,

Coordinating International Shipping Steven Y. Goldsmith Laurence R. Phillips Shannon V. Spires

YOUR WORKPLACE Presented by: Travis Vance Fisher & Phillips LLP Robert Smith Fisher

INTRODUCTION Ghanas Largest Corporate Bond Issuance continent. The ALCB Fund has supported

Feedback: Journal 2 Progress Difficulties: absolute and relative error floating point

SNMP Transition Tool Management of IPv6 Networks with IPv4/IPv6 SNMP Gateway Wiktor Procyk

MA/CSSE 473 Day 27 Student questions Leftovers from Boyer-Moore Knuth-Morris-Pratt String

Inclusive Digital Future: A Summit on Responsible Finance in Action Responsible Digital

ICT4D Merrick Schaefer, UNICEF Jeff Wishnie, ThoughtWorks Information and Communication

WEB-Access European Project Ceclia Sik Lnyi University of Pannonia, 8200 Veszprm,

Conference Call 2011 Budget December 2, 2010 THE PREMIUM VALUE Canadian Natural Canadian

Steven Phillips with Miro Dudik & Rob Schapire Modeling species - PowerPoint PPT Presentation

Maximum entropy modeling of species geographic distributions Steven Phillips with Miro Dudik & Rob Schapire Modeling species distributions Yellow-throated Vireo occurrence points environmental Predicted distribution variables 2

ST PHILLIPS ISLAND PLANTATION SERVICES Boarding.jpg ST PHILLIPS ISLAND PLANTATION SERVICES

Presented by: Ken Phillips Phillips Associates February 21, 2018 Phillips Associates 1 Agenda

Testing Observability Amy Phillips Testing Observability | Amy Phillips | @amyjph Amy

The Kalman Filter An Algorithm for Dealing with Uncertainty Steven Janke May 2011 Steven Janke

James Phillips Andrew Smith James Phillips: NLD Young Match Officials Programme Lead My role

Capture Elusive Level 3 Data: The Secrets of Survey Design Presented by: Ken Phillips

Phillips Road Roadway Improvements Hainesport Township, NJ The Presidential Center, Lincoln

Inter Partes Reviews Tales From the Trenches Matthew C. Phillips Laurence &amp; Phillips IP

GENI WiMax @ GEC11 Coverage Measurements Caleb Phillips caleb.phillips@colorado.edu Dirk

Zumastor: Enterprise NAS for Linux Daniel Phillips phillips@google.com or: It is high time Tux

2 3 4 5 6 7 Baltzan,P. &amp; Phillips, A., 2010. Business Driven Technology, 4 th edition .

Extensions of valuations to the Henselization and completion Steven Dale Cutkosky Steven Dale

Shadows in Computer Graphics Steven Janke November 2014 Steven Janke (Seminar) Shadows in

MACRA vs MIPS Steven L. Phillips, MD Medical Director, Sanford Center for Aging Professor,

Coordinating International Shipping Steven Y. Goldsmith Laurence R. Phillips Shannon V. Spires

YOUR WORKPLACE Presented by: Travis Vance Fisher &amp; Phillips LLP Robert Smith Fisher

INTRODUCTION Ghanas Largest Corporate Bond Issuance continent. The ALCB Fund has supported

Feedback: Journal 2 Progress Difficulties: absolute and relative error floating point

SNMP Transition Tool Management of IPv6 Networks with IPv4/IPv6 SNMP Gateway Wiktor Procyk

MA/CSSE 473 Day 27 Student questions Leftovers from Boyer-Moore Knuth-Morris-Pratt String

Inclusive Digital Future: A Summit on Responsible Finance in Action Responsible Digital

ICT4D Merrick Schaefer, UNICEF Jeff Wishnie, ThoughtWorks Information and Communication

WEB-Access European Project Ceclia Sik Lnyi University of Pannonia, 8200 Veszprm,

Conference Call 2011 Budget December 2, 2010 THE PREMIUM VALUE Canadian Natural Canadian

Inter Partes Reviews Tales From the Trenches Matthew C. Phillips Laurence & Phillips IP

2 3 4 5 6 7 Baltzan,P. & Phillips, A., 2010. Business Driven Technology, 4 th edition .

YOUR WORKPLACE Presented by: Travis Vance Fisher & Phillips LLP Robert Smith Fisher