Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond - PowerPoint PPT Presentation

Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond Julian Borrill Computational Cosmology Center, LBL & Space Sciences Laboratory, UCB with Christopher Cantalupo, Theodore Kisner, Radek Stompor et al and the BOOMERanG, EBEX, MAXIMA, Planck & PolarBear teams 3DAPAS – June 8th, 2011

The Cosmic Microwave Background About 400,000 years after the Big Bang, the expanding Universe cools through the ionization temperature of hydrogen: p + + e - => H. Without free electrons to scatter off, the photons free-stream to us today. • COSMIC - filling all of space. • MICROWAVE - redshifted by the expansion of the Universe from 3000K to 3K. • BACKGROUND - primordial photons coming from “behind” all astrophysical sources. 3DAPAS – June 8th, 2011

CMB Science • Primordial photons give the earliest possible image of the Universe. • The existence of the CMB supports a Big Bang over a Steady State cosmology (NP1). • Tiny fluctuations in the CMB temperature (NP2) and polarization encode the fundamentals of – Cosmology • geometry, topology, composition, history, … – Highest energy physics • grand unified theories, the dark sector, inflation, … • Current goals: – definitive T measurement provides complementary constraints for all dark energy experiments. – detection of cosmological B-mode gives energy scale of inflation from primordial gravity waves. (NP3) 3DAPAS – June 8th, 2011

The Concordance Cosmology High-z Supernova & Supernova Cosmology (1998): Cosmic Dynamics ( Ω Λ - Ω m ) BOOMERanG & MAXIMA (2000): Cosmic Geometry ( Ω Λ + Ω m ) 70% Dark Energy + 25% Dark Matter + 5% Baryons 95% Ignorance What (and why) is the Dark Universe ? 3DAPAS – June 8th, 2011

Observing the CMB • With very sensitive, very cold, detectors – coloured noise. • Scanning all of the sky from space, or some of the sky from the stratosphere or high dry ground. 3DAPAS – June 8th, 2011

Analysing The CMB 3DAPAS – June 8th, 2011

CMB Satellite Evolution Evolving science goals require (i) higher resolution & (ii) polarization sensitivity. 3DAPAS – June 8th, 2011

CMB Data Analysis • In principle very simple – Assume Gaussianity and maximize the likelihood 1. of maps given the observations and their noise statistics (analytic). 2. of power spectra given maps and their noise statistics (iterative). • In practice very complex – Foregrounds, glitches, asymmetric beams, non-Gaussian noise, etc. – Algorithm & implementation scaling with evolution of • CMB data-set size • HPC architecture 3DAPAS – June 8th, 2011

The CMB Data Challenge • Extracting fainter signals (polarization, high resolution) from the data requires: – larger data volumes to provide higher signal-to-noise. – more complex analyses to control fainter systematic effects. Experiment Start Date Goals N t N p 10 9 10 4 COBE 1989 All-sky, low res, T 10 9 10 6 BOOMERanG 1997 Cut-sky, high-res, T 10 10 10 7 WMAP 2001 All-sky, mid-res, T+E 10 12 10 9 Planck 2009 All-sky, high-res, T+E(+B) 10 13 10 7 PolarBear 2012 Cut-sky, high-res, T+E+B 10 14 10 7 QUIET-II 2015 Cut-sky, high-res, T+E+B 10 15 10 10 CMBpol 2020+ All-sky, high-res, T+E+B • 1000x increase in data volume over past & future 15 years – need linear analysis algorithms to scale through 10 + 10 M-foldings ! 3DAPAS – June 8th, 2011

CMB Data Analysis Evolution Data volume & computational capability dictate analysis approach. Date Data System Map Power Spectrum Explicit Maximum Likelihood Explicit Maximum Likelihood 1997 - Cray T3E B98 2000 x 700 3 ) 3 ) (Matrix Invert - N p (Matrix Cholesky + Tri-solve - N p Algorithms IBM SP3 Explicit Maximum Likelihood Explicit Maximum Likelihood 2000 - B2K2 2003 3 ) 3 ) x 3,000 (Matrix Invert - N p (Matrix Invert + Multiply - N p IBM SP3 PCG Maximum Likelihood Monte Carlo 2003 - Planck SF 2007 x 6,000 (band-limited FFT – few N t ) (Sim + Map - many N t ) Implementations Planck AF PCG Maximum Likelihood Monte Carlo 2007 - Cray XT4 2010 x 40,000 EBEX (band-limited FFT – few N t ) (SimMap - many N t ) Planck MC PCG Maximum Likelihood Monte Carlo 2010 - Cray XE6 2014 x 150,000 PolarBear (band-limited FFT – few N t ) (Hybrid SimMap - many N t ) 3DAPAS – June 8th, 2011

Scaling In Practice • 2000: BOOMERanG T-map – 10 8 samples => 10 5 pixels – 128 Cray T3E processors; • 2005: Planck T-map – 10 10 samples => 10 8 pixels – 6000 IBM SP3 processors; • 2008: EBEX T/P-maps – 10 11 samples, 10 6 pixels – 15360 Cray XT4 cores . • 2010: Planck Monte Carlo 1000 noise T-maps – 10 14 samples => 10 11 pixels – 32000 Cray XT4 cores. 3DAPAS – June 8th, 2011

Simulation & Mapping: Calculations Given the instrument noise statistics & beams, a scanning strategy, and a sky: 1) SIMULATION: d t = n t + s t = n t + P tp s p – A realization of the piecewise stationary noise time-stream: • Pseudo-random number generation & FFT – A signal time-stream scanned & beam-smoothed from the sky map: • SHT (P T N -1 P) d p = P T N -1 d t 2) MAPPING: (A x = b) – Build the RHS • FFT & sparse matrix-vector multiply – Solve for the map • PCG over FFT & sparse matrix-vector multiply 3DAPAS – June 8th, 2011

Simulation & Mapping: Scaling • In theory such analyses should scale – Perfectly to arbitrary numbers of cores (strong – within an experiment). – Linearly with number of observations (weak – between experiments). • In practice this does not happen because of – IO (reading pointing, writing time-streams; reading time-streams, writing maps) – Communication (gathering maps from all processes) – Calculation inefficiency (linear operations only, minimal data re-use) • Ultimately all comes down to delivering data to cores fast enough. • Code development has been an ongoing history of addressing this challenge anew with each new CMB data volume and HPC system/scale. 3DAPAS – June 8th, 2011

IO - Before For each MC realization For each detector Read detector pointing Sim Write detector timestream For all detectors Read detector timestream & pointing Map Write map ⇒ Read: 56 x Realizations x Detectors x Observations bytes Write: 8 x Realizations x (Detectors x Observations + Pixels) bytes E.g. for Planck, read 500PB & write 70PB. 3DAPAS – June 8th, 2011

IO - Optimizations • Read sparse telescope pointing instead of dense detector pointing – Calculate individual detector pointing on the fly. • Remove redundant write/read of time-streams between simulation & mapping – Generate simulations on the fly only when map-maker requests data. • Put MC loop inside map-maker – Amortize common data reads over all realizations. 3DAPAS – June 8th, 2011

IO – After Read telescope pointing For each detector Calculate detector pointing For each MC realization SimMap For all detectors Simulate time-stream Write map ⇒ Read: 24 x Sparse Observations bytes Write: 8 x Realizations x Pixels bytes E.g. for Planck, read 2GB & write 70TB (10 8 read & 10 3 write compression). 3DAPAS – June 8th, 2011

Communication Details • The time-ordered data from all the detectors are distributed over the processes subject to: – Load-balance – Common telescope pointing • Each process therefore holds – some of the observations – for some of the pixels. • In each PCG iteration, each process solves with its observations. • At the end of each iteration, each process needs to gather the total result for all of the pixels it observed. 3DAPAS – June 8th, 2011

Communication Implementation • Initialize a process & MPI task on every core • Distribute time-stream data & hence pixels • After each PCG iteration – Each process creates a full map vector by zero-padding – Call MPI_Allreduce(map, world) – Each process extracts the pixels of interest to it & discards the rest 3DAPAS – June 8th, 2011

Communication Challenge 3DAPAS – June 8th, 2011

Communication Optimizations • Reduce the number of MPI tasks (hybridize) – Use MPI only for off-node communication – Use threads for on-node communication • Minimize the total message volume – Determine the pixel overlap for every process pair. • One-time initialization cost – If the total overlap data volume is smaller than a full map, use Alltoallv in place of Allreduce • Typically Alltoallv for all-sky, Allreduce for part-sky surveys 3DAPAS – June 8th, 2011

Communication Improvement 3DAPAS – June 8th, 2011

Communication Evaluation • Which enhancement is more important? • Is this system-dependent? • Compare – unthreaded/threaded – allreduce/alltoallv on Cray XT4, XT5 & XE6 on 200 – 16,000 cores: – Alltoallv at low end – Threads at high end 3DAPAS – June 8th, 2011

Petascale Communication (I) 3DAPAS – June 8th, 2011

HPC System Evolution • Clock speed is no longer able to maintain Moore’s Law. • Multi-core CPU and GPGPU are two major approaches. • Both of these will require performance modeling, experiment & auto-tuning • E.g. NERSC’s new XE6 system Hopper – 6384 nodes – 2 MagnyCours processors per node – 2 NUMA nodes per processor – 6 cores per NUMA node • What is the best way to run hybrid code on such a system? – “wisdom” says 4 processes x 6 threads to avoid NUMA effects. 3DAPAS – June 8th, 2011

Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond - PowerPoint PPT Presentation

Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond Julian Borrill Computational Cosmology Center, LBL & Space Sciences Laboratory, UCB with Christopher Cantalupo, Theodore Kisner, Radek Stompor et al and the BOOMERanG,

What is iron for ? Fe 2+ Ferrous iron 2 What is iron for ? Fe 2+ Ferrous iron Heme 2 What is

Bang Bang control of elliptic PDEs M. Hinze Fachbereich Mathematik Optimierung und

THE SOUTH AFRICAN IRON ANd STEEl vAlUE CHAIN KUMBA IRON ORE March 2011 BACKGROUNd TO KUMBA

ANEMIA IN CANCER ROLE OF IV IRON IRON DEFICIENCY Absolute vs functional Absolute iron

Big Bang, Big Data, Big Iron: High Performance Computing for Cosmic Microwave Background Data

Presented Jan. 8, 2014 by Peter Seiler Does any one have a joke about Big Bang? Try

Life & Complexity Organisation, information and optimisation in real and simulated biological

TESTING EQUIPMENTS FOR IRON LIST OF TEST EQUIPMENT Routine / Lab Test: ENDURANCE TEST SETUP

Investor Presentation Jogjakarta Iron Sands Project Overview of Iron Sands Project Large &

Phase Behavior in Iron/Carbon System Callister P. 252 Chapter 9 1 Iron Age 1500 to 1000 BC

The Origins of the Cold War The Iron Curtain Winston Churchill gave the Map of the Iron Iron

Pricing Bounds and Bang-bang Analysis of the Polaris Variable Annuities Zhiyi (Joey) Shen

The bang-bang funnel controller Daniel Liberzon and Stephan Trenn 49th IEEE Conference on

Optimality conditions for bang-bang controls (theory and examples) Joint work with Helmut Maurer

The bang-bang funnel controller Stephan Trenn (joint work with Daniel Liberzon, UIUC)

The bang-bang funnel controller: time delays and case study Stephan Trenn (joint work with Daniel

Aerosol Indirect Efgects, Bufgering Mechanisms, and Connections to Small-Scale Dynamics Yangang

LHC Computing LHC Computing Nick Brook The LHC & experiments Requirements

Unification and Local Baryon Number Sebastian Ohmer Collaborator: Pavel Fileviez Prez

LEADING AGAINST THE GRAIN THE 5 RULES FOR EFFECTIVE UPSTREAM COMMUNICATION AND LEADERSHIP JOEL

ATLAS Worldwide Distributed Computing ATLAS Worldwide Distributed Computing Zhongliang Ren 03

San Diego Unified School District Accountability Report LCAP Goal 5: Highly Regarded

Frankly, we just dont understand it. Craig Hogan, University of Washington at Seattle

On Gender & CIMIC OF3 Jeroen Sennef www.cimic-coe.org 1 24/06/2009 Introduction

Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond - PowerPoint PPT Presentation

Big Bang, Big Iron: CMB Data Analysis at the Petascale and Beyond Julian Borrill Computational Cosmology Center, LBL & Space Sciences Laboratory, UCB with Christopher Cantalupo, Theodore Kisner, Radek Stompor et al and the BOOMERanG,

What is iron for ? Fe 2+ Ferrous iron 2 What is iron for ? Fe 2+ Ferrous iron Heme 2 What is

Bang Bang control of elliptic PDEs M. Hinze Fachbereich Mathematik Optimierung und

THE SOUTH AFRICAN IRON ANd STEEl vAlUE CHAIN KUMBA IRON ORE March 2011 BACKGROUNd TO KUMBA

ANEMIA IN CANCER ROLE OF IV IRON IRON DEFICIENCY Absolute vs functional Absolute iron

Big Bang, Big Data, Big Iron: High Performance Computing for Cosmic Microwave Background Data

Presented Jan. 8, 2014 by Peter Seiler Does any one have a joke about Big Bang? Try

Life &amp; Complexity Organisation, information and optimisation in real and simulated biological

TESTING EQUIPMENTS FOR IRON LIST OF TEST EQUIPMENT Routine / Lab Test: ENDURANCE TEST SETUP

Investor Presentation Jogjakarta Iron Sands Project Overview of Iron Sands Project Large &amp;

Phase Behavior in Iron/Carbon System Callister P. 252 Chapter 9 1 Iron Age 1500 to 1000 BC

The Origins of the Cold War The Iron Curtain Winston Churchill gave the Map of the Iron Iron

Pricing Bounds and Bang-bang Analysis of the Polaris Variable Annuities Zhiyi (Joey) Shen

The bang-bang funnel controller Daniel Liberzon and Stephan Trenn 49th IEEE Conference on

Optimality conditions for bang-bang controls (theory and examples) Joint work with Helmut Maurer

The bang-bang funnel controller Stephan Trenn (joint work with Daniel Liberzon, UIUC)

The bang-bang funnel controller: time delays and case study Stephan Trenn (joint work with Daniel

Aerosol Indirect Efgects, Bufgering Mechanisms, and Connections to Small-Scale Dynamics Yangang

LHC Computing LHC Computing Nick Brook The LHC &amp; experiments Requirements

Unification and Local Baryon Number Sebastian Ohmer Collaborator: Pavel Fileviez Prez

LEADING AGAINST THE GRAIN THE 5 RULES FOR EFFECTIVE UPSTREAM COMMUNICATION AND LEADERSHIP JOEL

ATLAS Worldwide Distributed Computing ATLAS Worldwide Distributed Computing Zhongliang Ren 03

San Diego Unified School District Accountability Report LCAP Goal 5: Highly Regarded

Frankly, we just dont understand it. Craig Hogan, University of Washington at Seattle

On Gender &amp; CIMIC OF3 Jeroen Sennef www.cimic-coe.org 1 24/06/2009 Introduction

Life & Complexity Organisation, information and optimisation in real and simulated biological

Investor Presentation Jogjakarta Iron Sands Project Overview of Iron Sands Project Large &

LHC Computing LHC Computing Nick Brook The LHC & experiments Requirements

On Gender & CIMIC OF3 Jeroen Sennef www.cimic-coe.org 1 24/06/2009 Introduction