Julia A Fresh Approach to GPU Computing
What is Julia? function mandel (z) c = z Technical computing language maxiter = 80 for n = 1:maxiter if abs(z) > 2 High-level like Python return n - 1 end z = z ^ 2 + c end Performance of C return maxiter end
Julia for GPU programming
What is GPU programming? High-level TensorFlow, Keras Flux.jl, Knet.jl ArrayFire, Thrust GPUArrays.jl cuBLAS, cuDNN CuArrays.jl CUB, MGPU CUDAnative.jl CUDA C Low-level
Programming with libraries a = CuArray(Float32,2) cuBLAS.jl b = curand(Float32, 2) cuFFT.jl cuRAND.jl a*a CuArrays.jl cuSPARSE.jl fft(a) cuDNN.jl qrfact(a) cuSOLVER.jl softmax(a)
Programming with kernels Much harder!
Designed for performance Multiple dispatch function foo (x) if isa (x, Int64) … foo (x::Int64) = … elseif isa (x, Float64) foo (x::Float64) = … … end end
Designed for performance Type inference function sigmoid (x) temp = exp( - x) return (1 / (1 + temp)) end
Designed for performance Type inference function sigmoid (x::Int) temp = exp( - x)::Float64 return (1 / (1 + temp))::Float64 end
Designed for performance Type inference function sigmoid (x::Float32) temp = exp( - x)::Float32 return (1 / (1 + temp))::Float32 end Machine-native types
Designed for performance Multiple dispatch Type inference High-quality Machine-native types stand-alone machine code Specializing JIT compiler
Extensible language Inspect & Inject Machine Source Source AST Julia IR LLVM IR code Configure
How does it look? function vadd (a, b, c) i = threadIdx().x No DSL, no subset, just Julia c[i] = a[i] + b[i] return end CUDA abstraction level a = CuArray(randn(2,2)) b = CuArray(randn(2,2)) c = similar(a) Performance parity @cuda threads = 4 vadd(a,b,c)
How does it run?
How does it work? function vadd (a, b, c) vadd i = threadIdx().x c[i] = a[i] + b[i] return vadd end CuArray{Float64,2} a = CuArray(randn(2,2)) LLVM IR b = CuArray(randn(2,2)) c = similar(a) PTX @cuda threads = 4 vadd(a,b,c)
How does it work? vadd Run time JIT compiler vadd vadd CuArray{Float64,2} CuArray{Int32,3} Fully transparent LLVM IR LLVM IR No overhead! PTX PTX
High-level GPU programming Great performance Clean & concise ( (a .+ b) ./ d ) .- e Generic code
function vadd (a, b, c) i = threadIdx().x c[i] = a[i] + b[i] return end W = randn(2, 10) b = randn(2) f (x) = softmax(W * x .+ b) model = Chain( Dense(10, 5, σ), Dense(5, 2), From GPU kernels, to differentiable algorithms, to softmax) high-level layer stacking. All on one platform.
Pkg.add(“Flux”)
The Julia Magic Everything just works with everything else! Differential Equations Machine Learning Everything You Build Automatic Differentiation CUDA
All the HPC Tooling Differential Operations Deep Learning Equations Research JuliaDB.jl & StructOfArrays.jl DistributedArrays.jl DataFrames.jl Generic programming is extremely powerful.
function model(tree) if isleaf(tree) tree.value else model(tree.left) + model(tree.right) end
Case Studies
JuliaCon – 300 Attendees, 150 Talks
Julia https://github.com/JuliaGPU/ NVIDIA Parallel Forall blog https://github.com/FluxML/
Recommend
More recommend