Accelerating Science Platforms for Machine Learning, Big Data, and Earth System Science John Taylor, John Zic, Jose Alverez, Oliver Obst George Opletal, Maciej Golebiewski, Amanda Barnard Emlyn Jones, Josh Bowden August 2015 www.csiro.au
About CSIRO Darwin Cairns Atherton People 5000 Townsville 2 sites Alice Springs Locations 58 Rockhampton Bribie Flagships 9 Island Murchison Toowoomba Brisbane Gatton 6 sites Myall Vale Armidale Geraldton Narrabri Budget $1.3B+ 2 sites 2 sites Mopra Newcastle Perth Parkes Adelaide Griffith Sydney 5 sites Irymple 3 sites 2 sites Canberra 7 sites Wodonga Werribee 2 sites Belmont Melbourne 5 sites Geelong 62% of our people hold Hobart Sandy Bay In partnership with Top 1% of global research university degrees universities, we institutions in 14 of 22 research 2000 doctorates develop 650 fields 500 masters postgraduate Top 0.1% in 4 research fields research students
2009: CSIRO Bragg Cluster Launch, first of its kind in AU 2013: Bragg upgrade - 384 Kepler K20M GPUs November 2014: November 2015: #154 TOP500 List #298 TOP500 List #11 Green500 List #24 Green500 List CSIRO Computational and Simulation Sciences/IMT
ACCELERATORS SURGE IN WORLD’S TOP SUPERCOMPUTERS 125 100 Top500: # of Accelerated Supercomputers 100+ accelerated systems now on Top500 list 75 1/3 of total FLOPS powered by accelerators NVIDIA Tesla GPUs sweep 23 of 24 new 50 accelerated supercomputers Tesla supercomputers growing at 50% CAGR 25 over past five years 0 2013 2014 2015 Source: NVIDIA, TOP500 List
CSIRO Bragg GPU Cluster TOP500 and Green500 Rankings 350 300 250 200 150 100 50 0 2010/11 2011/6 2011/11 2012/6 2012/11 2013/6 2013/11 2014/6 2014/11 2015/6 2015/11 TOP500 Rank Green500 rank CSIRO Computational and Simulation Sciences/IMT
Section 1: ConvNets in Bragg Jose Alverez
Simplifying ConvNets via Filter Compositions • Key properties of the network: • Low-rank filter restrictions during training. • Larger receptive fields. • Deeper models (more non-linear layers). • Additional parameter sharing. • Reduced parameter redundancy. • Overall important reduction in the number of parameters. *Alvarez and Petersson, Simplifying ConvNets for End-to-End Learning. To appear 7 | Presentation title | Presenter name
Quantitative Results: ImageNet • ImageNet dataset: • 1.2 million training images and 50.000 for validation split in 1000 categories. • Between 5000 and 30000 training images per class • Accuracy reported as Top-1 using a single centered crop. • No data augmentation for training NUMBER OF NUMBER OF TOP-1 NETWORK PARAMETERS CONV. LAYERS ACCURACY AlexNet OWT Bn 61M 5 57.9% B-NET (VGG-B) 133M 10 62.5% OURS* 15M 16 66.6% *Alvarez and Petersson, Simplifying ConvNets for End-to-End Learning. To appear 8 | Presentation title | Presenter name
Quantitative Results: Places2 • Places2 dataset: • 10+ million images split 401 unique scene categories. • Between 5000 and 30000 training images per class and 20000 validation images. • Accuracy reported as Top-1 using a single centered crop. • No data augmentation for training. NUMBER OF NUMBER OF TOP-1 NETWORK PARAMETERS CONV. LAYERS ACCURACY AlexNet OWT Bn 58.6 M 5 44.5% B-NET (VGG-B) 130M 10 44.0% OURS* 10.2M 16 47.4% *Alvarez and Petersson, Simplifying ConvNets for End-to-End Learning. To appear 9 | Presentation title | Presenter name
Timings 10 | Presentation title | Presenter name
Section 2: Simulated Nanostructure Assembly (SNAP) George Opletal, Maciej Golebiewski, Amanda Barnard
SNAP - Introduction o Traditional atomistic molecular dynamics (MD) modelling of nanoparticle self-assembly is computational prohibitive. o However, in many cases, the interactions between nanoparticles are dominated by surface electrostatic forces, and thus internal bonding can be neglected. o Approximate many atom nanoparticle by a course grained surface point mesh model. o Developed the Simulated Nanostructure Assembly E. Osawa, D. Ho, Nanodiamond and its application with Protoparticles (SNAP) package. to drug delivery, J. Med. Applied. Sci. 2(2) 2012, 31- 40. Atomistic Nanoparticle Surface mesh representation
SNAP package Generator Designs particles, initial configuration and potentials Simulator Usually an NVT simulation quenched to produce a particle aggregate. 1.0 (100)|(100) (111)|(111) Analyser Interfacial Probability Analysis of the final configuration and dynamical evolution of particle assembly. 0.0 o SNAP is installed on CSIRO BRAGG GPU Cluster -1000 1000 3000 5000 Time (ps)
SNAP – Simulator Modelling Interactions Interactions between pairs of nanoparticle facets in • different orientations calculated via ab-initio methods. Binding energy curves then fitted to Morse • potentials with parameters for each pair of facet combination interactions. The parameters are then distributed over a facet’s points. Morse parameters can incorporate functionalised • surfaces (hydroxylation, hydrogenation etc). User defined nanoparticles held together by a • harmonic potential. Clean Hydrogen Passivation Hydroxyl functionalization
2. SNAP – Simulator Acceleration via parallelization 9 GPUs CUDA-MPI Versus Serial CPU Code CUDA-MPI 9 GPUs Serial CPU 12 10 8 STEPS/SEC 6 4 2 0 1000 5000 10000 20000 50000 100000 Number of Nanodiamonds
2. SNAP – Analyser Reads in output from Simulator and performs a variety of analysis including, • Interfacial Probabilities (which facets align and which are free pointing into voids) • Pore size distributions (shows the range of void sizes in the aggregate) • Particle distribution functions (gives information on the short, medium and long range ordering) • Fractal dimension (probes self similarity at different size scales and useful for characterization of aggregates) • Visualization via POVRAY or VMD Often analysis is dynamical (as a function of time) 1.0 Interfacial Probability (100)|(100 ) (110)|(110 ) 0.0 Void locations where a 3.2 0 1000 2000 3000 4000 Time (ps) nm particle could fit
3. Vast experimental parameter space RHOMBIC CUBE OCTAHEDRON DODECAHEDRON Particle Geometry Particle Geometry Composition Particle Size 100-100 facet binding energy Surface Functionalization Particle Density
3. A few points in parameter space….. CUBE 100 facets CSIRO Bragg GPU Cluster (6 GPUs used over 15 hours), 5832 particles X 570 points, 664Å, 150,000 x 1fs steps
4. A few points in parameter space….. OCTAHEDRON 111 facets CSIRO Bragg GPU Cluster (6 GPUs used over 15 hours), 5832 particles X 544 points, 664Å, 150,000 x 1fs steps
4. A few points in parameter space….. OCTAHEDRON 111 facets CSIRO Bragg GPU Cluster (6 GPUs used over 15 hours), 5832 particles X 544 points, 664Å, 150,000 x 1fs steps
4. Larger, more complex using Bragg GPU cluster Size distribution - 22Å (20%), 27Å (50%), 32Å • (30%) Experimental density - 2x10 19 particles / cm 3 • Facet interaction energies from DFT • Clean facets • 46656 particles (about 25 million surface • interaction points) 0.132 µm cell length • 0.15 ns simulation time (150,000 steps at 1 fs) • 6 GPUs over 130 hours RED – (100) BLUE – (111) GREEN – (110)
Applications – Nanodiamonds Polydisperse aggregate 0.03 Pore Size Distribution All 32Å All 32Å 0.025 Type 1 Mixed Sizes Mixed Sizes (cm 3 /g.Å) 0.02 Mixtures 0.015 produce 0.01 larger pore 0.005 sizes Largest ‘111’ facets 0 dominate interaction 0 20 40 60 80 100 0.5 22Å (100) 22Å (111) Pore Diameter (Å) Number of nanoparticles 400 22Å (110) 27Å (100) 0.4 27Å (111) 27Å (110) All 32Å 350 Interfacial Probability Type 1 32Å (100) 32Å (111) (out of 5000) 300 Mixed Sizes 0.3 250 Mixtures are 200 0.2 150 more 100 “random” 0.1 50 0 0 2 3 4 5 6 7 Number of q6·q6 interactions Nanoparticle Facet
GPU-Accelerated Molecular Dynamics “We performed the largest self-assembly simulation of organic cages” Mega-clusters Wall time reduced from Porous Cages 100 to 15 hours using GPUs • 424,000 atoms • 47,000 bonds • 786,000 angles • 126,000 dihedrals • 2 million molecular dynamics steps • Pairwise interactions • Long-range coulombic interactions • Periodic boundary conditions Evans et al. Journal of Physical Chemistry C, 2015 , DOI:101.1021/jp512944r
Section 3: Big Data Analytics John Zic, Emlyn Jones, Josh Bowden
Pulsar data from CSIRO's Parkes telescope Presentation title | Presenter name | Page 25
PPTA-HPC progress to date • Opportunity? Providing external collaborators access to internationally significant science data + compute to process = “Science as a Service” DAP pulsar repository Compute on Bragg Cluster
Eigenvalue decomposition using MAGMA MAGMA magma_2stage_syevdx() and MAGMAMIC magma_dsyevd() speedup over 16 core Sandybridge MKL dsyevr() R function eigen() 14 12 The functionality is being 10 incorporated into an R Speedup package used for 8 predictive genomic 6 modelling from large 4 sequencing datasets. 2 0 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 Problem Size (N = M = K) 3 K20 2 K20 1 K20 MIC (7120) 27 | More information josh.bowden@csiro.au
Recommend
More recommend