machine learning applications
play

Machine Learning Applications Ben Chandler Hewlett Packard Labs - PowerPoint PPT Presentation

A Platform for Accelerating Machine Learning Applications Ben Chandler Hewlett Packard Labs April 6th, 2016 HPE Big Data and HPC portfolio strategy Design and deliver comprehensive solutions with purpose-built platforms Innovate, design &


  1. A Platform for Accelerating Machine Learning Applications Ben Chandler Hewlett Packard Labs April 6th, 2016

  2. HPE Big Data and HPC portfolio strategy Design and deliver comprehensive solutions with purpose-built platforms Innovate, design & deliver the best-in-class 1 hardware and software to support foundational infrastructure needs of the Big Data customers Optimized HW/SW Platforms Provide vertical solutions by building software 2 stack and partner ecosystem Enable Advisory Services to help manage 3 customer’s technology journey Drive HPC and Big Data across all Enterprises 2

  3. Modernize your datacenter for massive parallel processing innovation Deliver automated intelligence, real-time insights and optimized performance Navigate the data-driven transformation journey across all enterprises with new HPC and Big Data capabilities that accelerate time-to-value for increased competitive differentiation Automated Real-time insights Optimized performance intelligence Deep Learning HPC Compute HPE Vertica for Integrity MC990 Trade & Match HPC for Trader Risk Compliant Innovation & Storage SQL on Hadoop X for Database Server Solution Workstation Archive Solution Solution Processing Apollo 6500, Apollo 4520 Apollo 2000 Apollo 4000 Series Apollo 4510 HPE Moonshot Extreme performance capabilities to process, manage and analyze data, I/O and storage intensive application workloads with high speed, scale, efficiency and enable high flexibility for open infrastructure innovation 3

  4. Deliver automated intelligence in real-time for Deep Learning Unprecedented performance and scale with HPE Apollo 6500 high density GPU solution Use Cases Automated Intelligence delivered by HPE Apollo 6500 and Deep Transform Protect to a hybrid your digital Learning software infrastructure enterprise solutions Video, Image, Text, Large, highly complex, Real-time, near Enable Empower Audio, time series real-time analytics workplace a data-driven unstructured simulation productivity organization pattern recognition and modeling Faster Model training time, better fusion of data* Customer benefits HPE Apollo 6500 is an ideal HPC and Deep Learning platform providing unprecedented performance with 8 GPUs, high bandwidth fabric and a configurable GPU topology to match deep learning workloads − Up to 8 high powered GPUs per tray (node), 2P Intel E5-2600 v4 support − Choice of high-speed, low latency fabrics with 2x IO expansion − Workload optimized using flexible configuration capabilities * Benchmarking results provided at or shortly after announcement 4

  5. HPE Apollo 6500 solution innovation System Design Innovation to maximize GPU capacity and performance with lower TCO New technologies, products Deep Learning, HPC Software platform Enablement Unique (HPE CCTK, Caffe, CUDA, Google TensorFlow, HPE IDOL) Solution differentiators − GPU density Cluster Management Enhancements − Configurable GPU topologies (Massive Scaling, Open APIs, tight Integration, multiple user interfaces) − More network bandwidth − Power and cooling optimization HPE Apollo 6500 − Manageability – Dense GPU server optimized for Deep − Better productivity Learning and HPC workloads – Density optimization – High performance fabrics 5

  6. Roadmap – Motivating evidence – The CogX project and vision – Open-source availability

  7. A simple data-intensive program val movie1 = ... val movie2 = ... val average = (movie1 + movie2) / 2 movie1 + movie2 average / 2

  8. Simplified architecture diagram CPU Mem GPU Mem CPU GPU

  9. Naïve data flow in practice CPU Mem GPU Mem CPU GPU val average = (movie1 + movie2) / 2

  10. Optimized data flow in practice CPU Mem GPU Mem CPU GPU val average = fusedOp(movie1, movie2, 2)

  11. Performance portability on GPUs 11

  12. Roadmap – Motivating evidence – The CogX project and vision – Open-source availability

  13. Vision performance-portable, high-productivity programming for accelerators 13

  14. CogX What is CogX? • Domain-specific embedded language with associated optimizing compiler and runtime • Array programming language embedded in a state machine execution model • Targets advanced analytics workloads on massively parallel distributed systems • Design Goals – Optimal deployment on parallel hardware – Fast design iterations – Enforce scalability – Broad COTS hardware support – Compatible with shared infrastructure – High productivity for analysts and algorithm engineers

  15. CogX compute model • Compute Graphs – Fields – Operators – Sensors/Actuators – Feedback/Time Compute Graph

  16. CogX compute model val movie = ColorMovie ( “courtyard.mp4” ) val background = VectorField ( movie . fieldShape , Shape (3)) val nextBackground = 0.999f * background + 0.001f * movie background <== nextBackground val suspicious = reduceSum(abs( movie - background ))

  17. Demo: Hello World application 17

  18. CogX compute model val movie = ColorMovie ( “courtyard.mp4” ) ColorMovie movie t Compute graph

  19. CogX compute model val background = VectorField ( movie . fieldShape , Shape (3)) ColorMovie movie t background t Compute graph

  20. CogX compute model val nextBackground = 0.999f * background + 0.001f * movie ColorMovie movie t * 0.001f nextBackground t background t + * 0.999f Compute graph

  21. CogX compute model background <== nextBackground ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f Compute graph

  22. CogX compute model val suspicious = reduceSum(abs( movie - background )) ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f suspicious t reduce - abs Sum Compute graph

  23. CogX compute model movie 0 movie 1 movie 2 * 0.001f * 0.001f * 0.001f + + + * 0.999f * 0.999f * 0.999f background 0 background 1 background 2 background 3 = 0 - - - reduceSum reduceSum reduceSum abs abs abs suspicious 0 suspicious 1 suspicious 2

  24. Opportunities for optimization ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f suspicious t reduce - abs Sum Compute graph

  25. Opportunities for optimization Initially: 6 separate device kernels. ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f suspicious t reduce - abs Sum device kernel Compute graph

  26. Opportunities for optimization After a “single - output” kernel fuser pass: 2 device kernels remain. ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f suspicious t reduce - abs Sum device kernel Compute graph

  27. Opportunities for optimization After a “multi - output” kernel fuser pass : only a single device kernel remains. ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f suspicious t reduce - abs Sum device kernel Compute graph

  28. CogX compiler: translating CogX to OpenCL with kernel fusion parsing and optimizations, OpenCL code including kernel User CogX Kernel Optimized Syntax generation fusion model circuit kernel tree circuit (scala) (ops, fields) (kernels, (merged (ops, field bufs) kernels) fields) CogX code snippet A A val A = ScalarField(10,10) val B = ScalarField(10,10) C * val C = A * B val D = ScalarField(10,10) + B opencl E * val E = C + D + E multiply B kernel fused opencl opencl add D multiply/ kernel D add kernel

  29. CogX core functions and operators • Basic operators • FFT/DCT • Type coercion • +, -, *, /, % • fft, fftInverse • toScalarField, toVectorField • Logical operators • fftRI, fftInverseRI • toMatrixField, toComplexField • >, >=, <, <=, ===, !=== • fftRows, fftInverseRows • toComplexVectorField, toColorField • Pointwise functions • fftColumns, fftInverseColumns • toGenericComplexField • cos, cosh, acos • dct, dctInverse, dctTransposed • Type construction • sin, sinh, asin • dctInverseTransposed • complex, polarComplex • tan, tanh, atan2 • Complex numbers • vectorField, complexVectorField • sq, sqrt, log, signum • phase, magnitude, conjugate • matrixField, colorField • pow, reciprocal • realPart, imaginaryPart • Reductions • exp, abs, floor • Convolution-like • reduceSum, blockReduceSum • Comparison functions • crossCorrelate, • reduceMin, blockReduceMin • max, min • reduceMax, blockReduceMax crossCorrelateSeparable • Shape manipulation • convolve, convolveSeparable • fieldReduceMax, fieldReduceMin • flip, shift, shiftCyclic • projectFrame, backProjectFrame • fieldReduceSum, fieldReduceMedian • transpose, subfield • crossCorrelateFilterAdjoint • Normalizations • expand, select, stack • convolveFilterAdjoint • normalizeL1, normalizeL2 • matrixRow, reshape • Gradient/divergence • Resampling • subfields, trim • backwardDivergence • supersample, downsample, upsample • vectorElement, vectorElements • backwardGradient • Special operators • transposeMatrices • centralGradient • winnerTakeAll • transposeVectors • forwardGradient • random • replicate, slice • Linear algebra • solve • dot, crossDot • transform • reverseCrossDot • warp • Debugging • <== • probe

Recommend


More recommend