For orest est Mon Monitor itorin ing g For or Action Action David Wheeler, Center for Global Development Robin Kraft and Dan Hammer, University of Maryland Scipy 2010 rkraft@umd.edu cgdev.org/forest
Forest Monitoring for Action (FORMA) • Set of algorithms to identify deforestation using satellite imagery quickly and cheaply Outline • Why deforestation? • What is FORMA? • Python and FORMA
Caveats • Economist/geographer, not a programmer • Parallel computing broadly speaking – not necessarily in robust compsci terms
Why do we care about deforestation? • Biodiversity • Local and regional environment • Climate change
How do we deal with deforestation? • Address economic drivers • Monitor outcomes
Monitoring: Issues with traditional methods • Slow turnaround • Relatively opaque methodologies • Data black box • Local accuracy vs. wide coverage • Needed software tools expensive and hard to use • Research aimed at scientists more than people on the ground. • BUT – it’s still really cool stuff!
Forest Monitoring for Action • Set of algorithms to identify deforestation using rapidly updated, free satellite imagery. • Prototype: monthly maps at 1km resolution • Early-warning system to complement hi-res approaches • Special features – Rapid (monthly) updates: 2006-present – Potential pan-tropical application – Indonesia-wide prototype is unique
The (very) basic intuition • Forested areas look green • Forest + fires + browning = deforestation? • FORMA algorithms search for telltale patterns of fires and “browning”
FORMA Methodology • Identify relevant patterns and time trends in “greenness” and fires • Train algorithms using historical data (2000-5) • Apply parameters from training to subsequent data • Output: probability of forest clearing, by pixel
Original workflow for Indonesia: Doesn’t scale • Two dual core Windows desktops • One 1tb hard drive that got quite full • One ArcGIS license ($3000) for GIS data and imagery • One Stata license ($1500) for statistical modeling • Python 2.5 as glue • Start to finish: 4 weeks • Start to finish on 4 cores: ~1+ week
Pan-tropical application @ 250m resolution • ~130x more raw data • Data comes split into pieces – thanks NASA! • “Easy” to parallelize if you can get rid of pricey software and manage all the data …
New workflow: Homebrew “Map/Reduce” in the cloud • Amazon Web Services (AWS): – EC2: small Linux spot instances – approx. $0.06/hr – SQS: fault tolerant job management – S3: persistent storage for intermediate and output data • Boto: interact with AWS • GDAL/OGR: load images as Numpy arrays; re-project images; GIS data management • Numpy: – Slice images into pieces, stack slices as time series in tabular format – Statistical modeling by pixel, pixel neighborhood or ecoregion means plenty of room for parallelization
The cloud/FOSS difference for Indonesia prototype • 20 Linux instances: – 10x the compute power for $1.20/hour, with no software costs • Easily scalable with SQS – just re-run queue processing script on new instance • Per-image processing significantly faster – Numpy arrays are great! • Parallel pre-processing in the cloud – RUNNING FOR THE FIRST TIME THIS MORNING! – Preprocessing time drops from from 3 days to <5 hours • Allows us to experiment with different algorithms and data much more easily
Future plans • Matplotlib + basemap for visualization • Numpy to mask out water and non-forest pixels • PiCloud? StarCluster?
Maps of forest clearing: monthly time series for Riau, Indonesia
Forested in 2000 270 km 167 mi Forest in 2000
Cleared by 2005 270 km 167 mi Forest in 2000 Cleared 2000 - 2005
What happened after 2005?
Cleared by 10/2009 Probability 50 – 60% 60 – 70% 70 – 80% 80 – 90% > 90% 270 km 167 mi Forest in 2000 Cleared 2000 - 2005
11/2005 Forest in 2000 Cleared 2000 - 2005
12/2005
1/2006
2/2006
3/2006
4/2006
5/2006
6/2006
7/2006
8/2006
9/2006
10/2006
11/2006
12/2006
1/2007
2/2007
3/2007
4/2007
5/2007
6/2007
7/2007
8/2007
9/2007
10/2007
11/2007
12/2007
1/2008
2/2008
3/2008
4/2008
5/2008
6/2008
7/2008
8/2008
9/2008
10/2008
11/2008
12/2008
1/2009
2/2009
3/2009
4/2009
5/2009
6/2009
7/2009
8/2009
9/2009
10/2009
Cleared by 10/2009 Probability 50 – 60% 60 – 70% 70 – 80% 80 – 90% > 90% 270 km 167 mi Forest in 2000 Cleared 2000 - 2005
12/2005
1/2006
2/2006
3/2006
4/2006
5/2006
6/2006
7/2006
8/2006
9/2006
10/2006
11/2006
12/2006
1/2007
2/2007
3/2007
4/2007
5/2007
6/2007
7/2007
8/2007
9/2007
10/2007
11/2007
12/2007
1/2008
Recommend
More recommend