julia
play

Julia A Fresh Approach to GPU Computing What is Julia? function - PowerPoint PPT Presentation

Julia A Fresh Approach to GPU Computing What is Julia? function mandel (z) c = z Technical computing language maxiter = 80 for n = 1:maxiter if abs(z) > 2 High-level like Python return n - 1 end z = z ^ 2 + c end Performance of C return


  1. Julia A Fresh Approach to GPU Computing

  2. What is Julia? function mandel (z) c = z Technical computing language maxiter = 80 for n = 1:maxiter if abs(z) > 2 High-level like Python return n - 1 end z = z ^ 2 + c end Performance of C return maxiter end

  3. Julia for GPU programming

  4. What is GPU programming? High-level TensorFlow, Keras Flux.jl, Knet.jl ArrayFire, Thrust GPUArrays.jl cuBLAS, cuDNN CuArrays.jl CUB, MGPU CUDAnative.jl CUDA C Low-level

  5. Programming with libraries a = CuArray(Float32,2) ฀ cuBLAS.jl b = curand(Float32, 2) ฀ cuFFT.jl ฀ cuRAND.jl a*a CuArrays.jl ฀ cuSPARSE.jl fft(a) ฀ cuDNN.jl qrfact(a) ฀ cuSOLVER.jl softmax(a)

  6. Programming with kernels Much harder!

  7. Designed for performance Multiple dispatch function foo (x) if isa (x, Int64) … foo (x::Int64) = … elseif isa (x, Float64) foo (x::Float64) = … … end end

  8. Designed for performance Type inference function sigmoid (x) temp = exp( - x) return (1 / (1 + temp)) end

  9. Designed for performance Type inference function sigmoid (x::Int) temp = exp( - x)::Float64 return (1 / (1 + temp))::Float64 end

  10. Designed for performance Type inference function sigmoid (x::Float32) temp = exp( - x)::Float32 return (1 / (1 + temp))::Float32 end Machine-native types

  11. Designed for performance Multiple dispatch Type inference High-quality Machine-native types stand-alone machine code Specializing JIT compiler

  12. Extensible language Inspect & Inject Machine Source Source AST Julia IR LLVM IR code Configure

  13. How does it look? function vadd (a, b, c) i = threadIdx().x No DSL, no subset, just Julia c[i] = a[i] + b[i] return end CUDA abstraction level a = CuArray(randn(2,2)) b = CuArray(randn(2,2)) c = similar(a) Performance parity @cuda threads = 4 vadd(a,b,c)

  14. How does it run?

  15. How does it work? function vadd (a, b, c) vadd i = threadIdx().x c[i] = a[i] + b[i] return vadd end CuArray{Float64,2} a = CuArray(randn(2,2)) LLVM IR b = CuArray(randn(2,2)) c = similar(a) PTX @cuda threads = 4 vadd(a,b,c)

  16. How does it work? vadd Run time JIT compiler vadd vadd CuArray{Float64,2} CuArray{Int32,3} Fully transparent LLVM IR LLVM IR No overhead! PTX PTX

  17. High-level GPU programming Great performance Clean & concise ( (a .+ b) ./ d ) .- e Generic code

  18. function vadd (a, b, c) i = threadIdx().x c[i] = a[i] + b[i] return end W = randn(2, 10) b = randn(2) f (x) = softmax(W * x .+ b) model = Chain( Dense(10, 5, σ), Dense(5, 2), From GPU kernels, to differentiable algorithms, to softmax) high-level layer stacking. All on one platform.

  19. Pkg.add(“Flux”)

  20. The Julia Magic Everything just works with everything else! Differential Equations Machine Learning Everything You Build Automatic Differentiation CUDA

  21. All the HPC Tooling Differential Operations Deep Learning Equations Research JuliaDB.jl & StructOfArrays.jl DistributedArrays.jl DataFrames.jl Generic programming is extremely powerful.

  22. function model(tree) if isleaf(tree) tree.value else model(tree.left) + model(tree.right) end

  23. Case Studies

  24. JuliaCon – 300 Attendees, 150 Talks

  25. Julia https://github.com/JuliaGPU/ NVIDIA Parallel Forall blog https://github.com/FluxML/

Recommend


More recommend