difftaichi differentiable programming for physical
play

DiffTaichi: Differentiable Programming for Physical Simulation - PowerPoint PPT Presentation

1 Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, Fredo Durand (ICLR 2020) DiffTaichi: Differentiable Programming for Physical Simulation End2end optimization of neural network controllers with gradient


  1. 1 Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, Fredo Durand (ICLR 2020) DiffTaichi: Differentiable Programming for Physical Simulation End2end optimization of neural network controllers with gradient descent Yuanming Hu MIT CSAIL

  2. 内容概览 2 ✦ Taichi 项⽬盯简介 (10min) ✦ DiffTaichi 可微编程原理痢 (ICLR 2020, 20min) ✦ Tachi 与 DiffTaichi ⼊兦⻔闩教程 (5min) ✦ Q&A (10 min)

  3. 3 Two Missions of the Taichi Project ✦ Explore novel language abstractions and compilation approaches for visual computing ✦ Practically simplify the process of computer graphics development/deployment

  4. The Life of a Taichi Kernel 4 Python C++ Simplifications Taichi Frontend AST IR Kernel Registration (@ti.kernel) Reverse Mode AST Lowering Autodiff Template Instantiation Type Checking (Sparse) Access Lowering Template Taichi Hierarchical Inst. Cache SSA IR AS Simplifications x86_64 Python AST Transform Backend Compiler Loop Vectorize GPU LLVM (x64/NVPTX) Taichi AST Generation Kernel Launch Bound Inference & Compile-Time & Computation (static if, Scratch Pad Insertion Data Structure Info loop unroll, const fold…)

  5. 5 Moving Least Squares Material Point Method Hu, Fang, Ge, Qu, Zhu, Pradhana, Jiang (SIGGRAPH 2018)

  6. 6 Moving Least Squares Material Point Method Hu, Fang, Ge, Qu, Zhu, Pradhana, Jiang (SIGGRAPH 2018)

  7. 7 Moving Least Squares Material Point Method Hu, Fang, Ge, Qu, Zhu, Pradhana, Jiang (SIGGRAPH 2018)

  8. 8 Top view Side view Back view Sparse Topology Optimization Liu, Hu, Zhu, Matusik, Sifakis (SIGGRAPH Asia 2018)

  9. 9 #voxels= 1,040,875,347 Grid resolution= 3000 × 2400 × 1600 Sparse Topology Optimization Liu, Hu, Zhu, Matusik, Sifakis (SIGGRAPH Asia 2018)

  10. 10 Want High-Resolution?

  11. 11 Want High-Resolution?

  12. 12 Want Performance?

  13. high-level Productivity programming low-level programming Performance

  14. How to get here? high-level Abstractions that Exploit Productivity programming Domain-Specific Knowledge! low-level programming Performance

  15. 15 3 million particles simulated with MLS-MPM; rendered with path tracing. Using programs written in Taichi .

  16. 16 Bounding Volume Spatial Sparsity: Regions of interest only occupy a small fraction of the bounding volume. Region of Interest

  17. Particles 17 1x1x1 4x4x4 16x16x16

  18. 18 Essential Computation Data Structure Overhead 1% Hash table lookup: 10s of clock cycles Indirection: cache/TLB misses Node allocation: locks, atomics, barriers Branching: misprediction / warp divergence … Low-level engineering reduces data 99% structure overhead, but harms productivity and couples algorithms and data structures, making it difficult to In reality… explore different data structure designs and find the optimal one.

  19. 19 Our Solution: The Taichi Programming Language 10x shorter code, 4.55x faster 1) Decouple computation from data structures High-Performance CPU/GPU Kernels Computational Kernels (Sparse) Data Structures Ours v.s. State-of-the-art: IR MLS-MPM 13x shorter code, 1.2x faster & FEM Kernel 13x shorter code, 14.5x faster MGPCG 7x shorter code, 1.9x faster Optimizing 2D Laplace operator 1024 2 sparse grid with 8 2 Sparse CNN 9x shorter code, 13x faster Compiler 3) Hierarchical data 2) Imperative computation Runtime System structure description language language 4) Intermediate 5) Auto parallelization , representation (IR) & memory management, … data structure access optimizations

  20. 20 Defining Computation Finite Difference Stencil Taichi Kernel Program on sparse data structures as if they are dense ; • Parallel for-loops (Single-Program-Multiple-Data, like CUDA/ispc); • Loop over only active elements in the sparse data structure; • Complex control flows (e.g. If, While) supported. •

  21. 21

  22. 22 Results 10.0x shorter code 4.55x higher performance High-Performance CPU/GPU Kernels Ours v.s. State-of-the-art: MLS-MPM 13x shorter code, 1.2x faster FEM Kernel 13x shorter code, 14.5x faster MGPCG 7x shorter code, 1.9x faster Sparse CNN 9x shorter code, 13x faster

  23. The Life of a Taichi Kernel 23 Simplifications Taichi Frontend AST IR Kernel Registration (@ti.kernel) Reverse Mode AST Lowering Autodiff Template Instantiation Type Checking (Sparse) Access Lowering Template Taichi Hierarchical Inst. Cache SSA IR Simplifications x86_64 Python AST Transform Backend Compiler Loop Vectorize GPU LLVM (x64/NVPTX) Taichi AST Generation Kernel Launch Bound Inference & Compile-Time & Computation (static if, Scratch Pad Insertion Data Structure Info loop unroll, const fold…)

  24. • 24 Taichi’s Intermediate Representation (IR) CHI 气 C HI H ierarchical I nstructions 「阴阳,气之大者也。」 ——《庄子·则阳》 ~ 300 B.C.

  25. 25 Optimization-Oriented Intermediate Representation Design ✦ Hierarchical IR ๏ Keeps loop information ๏ Static scoping ๏ Strictly (strongly) & statically typed ✦ Static Single Assignment (SSA) ✦ Progressive lowering. ~70 Instructions in total.

  26. 26 Why can’t traditional compilers do the optimizations? 1) Index analysis 2) Instruction granularity 3) Data access semantics

  27. 27 The Granularity Spectrum access1(i,j) x[i, j] access2(i,j) Taichi IR Machine code End2end access Level-wise Access LLVM IR (CHI) Coarser Finer

  28. 28 Hidden Optimization Analysis Difficulty Opportunities Level-wise Access Taichi IR LLVM IR Machine code End2end access (CHI) Coarser Finer

  29. 29 Productivity Taichi: 10.0x shorter code 4.55x higher performance 2) abstraction-specific compiler optimization 3) algorithm data structure decoupling high-level interface 1) data structure abstraction data structure library + low-level general-purpose compiler interface Performance

  30. 30 Hu, Anderson, Li, Sun, Carr, Ragan-Kelley, Durand (ICLR 2020) DiffTaichi: Differentiable Programming on Taichi (for physical simulation and many other apps) End2end optimization of neural network controllers with gradient descent

  31. Exposure: A White-Box Photo Post-Processing Framework 31 (TOG 2018) Yuanming Hu 1,2 Hao He 1,2 Chenxi Xu 1,3 Baoyuan Wang 1 Stephen Lin 1 1 Microsoft Research 2 MIT CSAIL 3 Peking University

  32. 32 Exposure: Learn image operations , instead of pixels . Differentiable Photo Differentiable Photo Deep Reinforcement Generative Adversarial Postprocessing Model Postprocessing Model Learning Networks resolution independent resolution independent content preserving content preserving Learn image operations , human-understandable human-understandable instead of pixels Training without pairs Modelling Optimization

  33. Iteration 58 33 Hand-written CUDA 132x faster than TensorFlow Iteration 0 ChainQueen: Differentiable MLS-MPM Hu, Liu, Spielberg, Tenenbaum Freeman, Wu, Rus, Matusik (ICRA 2019)

  34. The Life of a Taichi Kernel 34 Simplifications Taichi Frontend AST IR Kernel Registration (@ti.kernel) Reverse Mode AST Lowering Autodiff Template Instantiation Type Checking (Sparse) Access Lowering Template Taichi Hierarchical Inst. Cache SSA IR Simplifications x86_64 Python AST Transform Backend Compiler Loop Vectorize GPU LLVM (x64/NVPTX) Taichi AST Generation Kernel Launch Bound Inference & Compile-Time & Computation (static if, Scratch Pad Insertion Data Structure Info loop unroll, const fold…)

  35. 35 Differentiable Programming v.s. Deep Learning: What are they? ∂ L L ( x ) ∂ x Optimization/Learning via gradient descent !

  36. 36 Differentiable Programming v.s. Deep Learning: What are the differences? ✦ Deep learning operations: ๏ convolution, batch normalization, pooling… ✦ Differentiable programming further enables ๏ Stencils, gathering/scattering, fine-grained branching and loops… ๏ More expressive & higher performance for irregular operations ✦ Granularity ๏ Why not TensorFlow/PyTorch? ‣ Physical simulator written in TF is 132x slower than CUDA [Hu et al. 2019, ChainQueen] ✦ Reverse-Mode Automatic Differentiation is the key component to differentiable programming

  37. 37 The DiffTaichi Programming Language & Compiler: Automatic Differentiation for Physical Simulation Key language designs: Differentiable • Imperative • Parallel • Megakernels • 4.2x shorter code compared to hand-engineered CUDA. 188x faster than TensorFlow. Please check out our paper for more details.

  38. 38 Control Weights/biases 1 Weights/biases 2 Hidden Controller Output Phase Goal State Network FC, tanh FC, tanh NN Controller NN Controller NN Controller Parameterization Differentiable Differentiable Differentiable Loss Function Initial State Simulation Simulation Simulation State 2047 State 0 State 1 … 2045 time steps … Time step 0 Time step 1 Time step 2047 Our language allows programmers to easily build differentiable physical modules that work in deep neural networks. The whole program is end-to-end differentiable .

  39. 39

  40. 40

  41. 41 Reverse-Mode Auto Differentiation ✦ Example: ✦

  42. 42 Two-Scale AutoDiff

  43. 43 Related Work (DiffSim=DiffTaichi)

Recommend


More recommend