prediction via advancing
play

prediction via advancing model initialization Brian Etherton, with - PowerPoint PPT Presentation

Improving weather prediction via advancing model initialization Brian Etherton, with Christopher W. Harrop, Lidia Trailovic, and Mark W. Govett NOAA/ESRL/GSD 15 November 2016 The HPC group at NOAA/ESRL/GSD Strong track record in high


  1. Improving weather prediction via advancing model initialization Brian Etherton, with Christopher W. Harrop, Lidia Trailovic, and Mark W. Govett NOAA/ESRL/GSD 15 November 2016

  2. The HPC group at NOAA/ESRL/GSD • Strong track record in high performance computing • Massively Parallel Fine Grain (MPFG) Computing • Graphics Processing Units (GPUs) • Many Integrated Core (MIC) • Working to advance the state of the art in data assimilation, in particular, via improved performance and design • NOAA/NCEP GSI has a core limit in the hundreds • 4D-Var approaches are time consuming • 4D-Ensemble memory & I/O intensive • Wish to use a ‘great’ solver with any model (atmos, ocean …) • First steps into data assimilation (started this year) TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 2

  3. Keys to accurate weather prediction models 1. Intrinsic Predictability Limitations a) Is the system inherently chaotic? 2. Errors in the a) Does the model represent the system correctly? b) Is model resolution sufficient? c) Are unresolved physical processes well parameterized? 3. Errors in the Initial Conditions and Boundary Conditions TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 3

  4. Data Assimilation – What is it? • Consider two estimates of the temperature in this room – T F shall be what we set the thermostat to (a forecast) – T O shall be the value from my phone (an observation) • Use average squared errors (Variance) to weight the two estimates – Where s O 2 = Error Variance associated with T O – Where s F 2 = Error Variance associated with T F • The optimal estimate (most likely value) of the temperature in the room, ( T A ), is: T A – T F = ( s F 2 )( s F 2 + s O 2 ) -1 [ T O - T F ] TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 4

  5. Data Assimilation – What is it? • The estimate of the temperature with minimum error variance, the analysis value ( T A ), is: T A – T F = ( s F 2 )( s F 2 + s O 2 ) -1 [ T O - T F ] • What if the thermostat is perfect? – Then s F 2 = 0 – Then T A – T F = 0, so T A = T F • What if the my phone is perfect? – Then s O 2 = 0 – Then T A – T F = T O - T F , so T A = T O • T A is a weighted average of the observation and first guess TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 5

  6. All the data we wish to incorporate From the NASA hyperwall

  7. Data Assimilation – Full Model - Challenges Data Assimilation – Full T A – T F = ( s F 2 )( s F 2 + s O 2 ) -1 [ T O - T F ] model • Assuming that observation and forecast errors are uncorrelated, the analysis increment ( x a - x f ) that minimizes analysis error variance is (Cohn, 1997): x a - x f = BH T ( HBH T + R ) -1 [ y - Hx f ] • The vectors x a and x f are equal to the number of prediction points (gridpoints * vertical levels * variables) in the model. For the ECMWF global model, that is about 1-billion • The matrix B is, for the full model, 1-billion*1-billion in size • The matrix H can involve compute-needy processes TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 7

  8. Four Dimensional Data Assimilation (4D DA) • In prior equations all 4D Data Assimilation data assumed to be at the analysis time. • All data in time window assumed to occur at the middle of that window. • Introduces some errors ⇒ weather systems move and develop! (12 hours) TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 8

  9. Four Dimensional Data Assimilation (4D DA) • In some approaches, 4D Data Assimilation information at different times is achieved by running a model forward (Tangent-Linear) and backward (Adjoint) in time • Optimal results with a TL and AD that mimic the true model (12 hours) TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 9

  10. Time Parallel 4D DA - Motivation • The time spent running the TL and AD is, roughly: LENGTH OF ASSIM WINDOW * 2 (TL & AD) * 1.5 (TL TAKES LONGER) * 1.5 (AD TAKES LONGER) * NUMBER OF ITERATIONS • For a 12 hour window, 40 iterations, this value is 54*40=2160 hours, or 90 days • This is, perhaps, 6x longer than the forecast itself - this must be improved TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 10

  11. Time Parallel 4D DA - Motivation 4DVAR traditionally involves taking one state (bucket), moving it all the way from the start to finish to start Time parallel 4DVAR sends a number of states (buckets) from one time to the adjacent time TRADITIONAL TIME PARALLEL TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 11

  12. Time Parallel 4D DA - Motivation We did not invent time-parallel 4DVAR – the ECMWF has done this sort of work, as have others (Virginia Tech) Our goal is not to develop a brand new DA system, but to explore promising existing approaches TRADITIONAL TIME PARALLEL TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 12

  13. Time Parallel 4D DA - Motivation • If the assimilation window could be broken into 48 ¼-hour windows, then run time could be closer to 2 model days (rather than 90) • Would take ~27-minutes to compute for 1% real-time model • Achieve scaling when your model is no longer scaling • If scaling achieved, is the solution from this time-parallel version just as good? TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 13

  14. Results – Assimilation Methods Test 1: The eastward propagation of a 1D sine wave • Timing results (seconds): 3DVAR 35.0 4DVAR 108.8 4DVAR-TP-1 108.6 4DVAR-TP-3 44.2 • Results show that using the 3 OMP-THREAD Time Parallel 4DVAR results in a substantial reduction in run-time length TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 14

  15. Results – Assimilation Methods Test 2: The Lorenz96 Model • The time parallel 4DVAR (yellow line) performed better than 3DVAR, but not quite as well as 4DVAR • No great performance statistics here – the 40-point problem was not taxing • Nonetheless – the Time Parallel 4DVAR results encourage us to continue on TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 15

  16. Results – Time to Completion Performance of Procedural Implementation Lorenz Model with 4000 points TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 16

  17. 4D DA – Full Earth - Requirements • From Govett et al (BAMS paper) G11 NIM (3.75KM resolution) using 64*20=1280 K80 GPUs runs in 1.6% of forecast time • 12-hour forecast takes 12- minutes (could do only ONE iteration of 4DVAR) • Time-parallel could do 40 iterations in ~30 minutes if the iterations could be subdivided into ~48 sections (60,000 K80s, Cray CS Storm Node Architecture 30,000 Pascals) TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 17

  18. NOAA Fine Grain System • NOAA has received large GPU cluster from Cray / NVIDIA • 760 Pascal GPUs – 3584 cores / GPU • Cray Storm, 8 GPUs / node • Mellanox InfiniBand – QDR (40 Gb/s) • Status – Delivered October 2016 – Now in acceptance • Plans – Support development of FV3 and next- gen data assimilation Cray CS Storm Node – Architecture Parallelization of FV3 in progress TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 18

  19. Thoughts for the future • What would it take to produce a 3km resolution global analysis of the atmosphere (10-billion prediction points)? TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 19

  20. Thoughts for the future • Total amount of memory required for the analysis: 40GB (10 billion points, 4 bytes per value) • For time parallel of 48 intervals in a window: ~2PB • Observational data also could be quite sizable (TBs) • Our issues are not just processing, but also the speed of memory and I/O TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 20

  21. This Presentation is Now Complete • NOAA has received GPU cluster from Cray / NVIDIA • 760 Pascal GPUs – 3584 cores / GPU • Cray Storm, 8 GPUs / node • Mellanox InfiniBand – QDR (40 Gb/s) • Status – Delivered October 2016 – Now in acceptance • Plans – Support development of FV3 and next- gen data assimilation Cray CS Storm Node – Architecture Parallelization of FV3 in progress TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 21

Recommend


More recommend