Customizable Domain- Customizable Domain -Specific Computing Specific Computing Jason Cong Center for Domain-Specific Computing UCLA Computer Science Department cong@cs.ucla.edu http://cadlab.cs.ucla.edu/~cong 1 The Power Barrier … … The Power Barrier Source : Shekhar Borkar, Intel 2
Focus: New Transformative Approach to Focus: New Transformative Approach to Power/Energy Efficient Computing Power/Energy Efficient Computing Current Solution: Parallelization Current Solution: Parallelization Parallelization Source: Shekhar Borkar, Intel 3 Cost and Energy are Still a Big Issue … … Cost and Energy are Still a Big Issue Cost of computing • HW acquisition • Energy bill • Heat removal • Space • … 4
Next Significant Opportunity -- Next Significant Opportunity -- Customization Customization Parallelization Customization Adapt the architecture to Application domain Source: Shekhar Borkar, Intel 5 Motivation Motivation � A few facts A few facts � � We have sufficient computing power for most applications � Each user/enterprise need high computing power for only selected tasks in its domain � Application-specific integrated circuits (ASIC) can lead to 10,000X+ better power performance efficiency, but are too expensive to design and manufacture � Our proposal Our proposal � � A general, customizable platform for the given domain(s) • Can be customized to a wide-range of applications in the domain • Can be massively produced with cost efficiency • Can be programmed efficiently with novel compilation and runtime systems � Goal: Goal: � � A � A “ “supercomputer supercomputer- -in in- -a a- -box box” ” with 100X performance/power improvement via with 100X performance/power improvement via customization for the intended domain(s domain(s) ) customization for the intended � Analogy: Analogy: � � Advance of civilization via specialization/customization � Advance of civilization via specialization/customization 6
Example Application Domain: Healthcare Example Application Domain: Healthcare � � Medical imaging has transformed healthcare Medical imaging has transformed healthcare � An in vivo method for understanding disease development and patient condition � Estimated to be $100 billion/year � More powerful & efficient computation can help • Fewer exposures using compressive sensing • Better clinical assessment (e.g., for cancer) using improved registration and segmentation algorithms � Hemodynamic Hemodynamic simulation simulation � � Very useful for surgical procedures involving blood Magnetic resonance (MR) angiograph of an aneurysm flow and vasculature � Both may take hours to days to construct Both may take hours to days to construct � � � Clinical requirement: 1 Clinical requirement: 1- -2 min 2 min � � Cloud computing won Cloud computing won’ ’t work t work – – • • Communication, real- Communication, real -time requirement, privacy time requirement, privacy Intracranial aneurysm reconstruction with hemodynamics � � A megawatt A megawatt- -datacenter for each hospital? datacenter for each hospital? 7 Medical Image Processing Pipeline Medical Image Processing Pipeline reconstruction reconstruction Medical images exhibit sparsity, and can be sampled at compressive << a rate classical Shannon - Nyquist theory : ∑ ∑ 2 + λ sensing min AR u - S grad ( u ) u ∀ sampled points voxels total variational denoising denoising S 1 ∑ 2 − y z k k ⎛ ⎞ S = 1 − k 1 ⎜ ∑ ⎟ ∀ = 2 − σ 2 = voxel : u ( i ) w f ( j ) 2 , w e h algorithm ⎜ i , j ⎟ i,j Z ( i ) ⎝ ⎠ ∈ voxel j volume registration registration fluid ∂ u = + ⋅ ∇ v v u ∂ t ( ) [ ] registration μ Δ + μ + η ∇ ∇ ⋅ = − − − ∇ − v ( ) v T ( x u ) R ( x ) T ( x u ) segmentation segmentation ⎡ ⎤ ⎛ ⎞ ∂ ϕ ∇ ϕ level set ⎜ ⎟ = ∇ ϕ ⎢ φ + λ ⎥ F ( data , ) div ⎜ ⎟ ∂ ∇ ϕ t ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ methods { } = ϕ = surface ( t ) voxels x : (x, t) 0 analysis analysis ∂ v + ⋅ ∇ = −∇ + υ Δ + Navier-Stokes ( v ) v p v f ( x , t ) ∂ t ∂ ∂ ∂ ∂ 2 3 3 v ∑ v p ∑ v equations + = − + υ + i i i ( , ) v v f x t ∂ j ∂ ∂ j i ∂ 2 t x x x = = j j 1 j i j 1 8
Application Domains: Medical Image Processing Pipeline Application Domains: Medical Image Processing Pipeline reconstruction reconstruction compressive iterative, local or global communication dense and sparse linear algebra, optimization methods sensing total variational denoising denoising non-iterative, highly parallel, local & global communication sparse linear algebra, structured grid, optimization methods algorithm • • These algorithms have diverse These algorithms have diverse computation & computation & communication patterns communication patterns registration registration fluid • parallel, global communication • A single homogenous system A single homogenous system registration can not perform very well on dense linear algebra, optimization methods can not perform very well on all these algorithms all these algorithms segmentation segmentation level set local communication dense linear algebra, spectral methods, MapReduce methods analysis analysis Navier-Stokes local communication sparse linear algebra, n-body methods, graphical models equations 9 Need of Customization for Medical Image Processing Pipeline Need of Customization for Medical Image Processing Pipeline reconstruction reconstruction compressive iterative, local or global communication • These algorithms have diverse These algorithms have diverse dense and sparse linear algebra, optimization methods sensing • Bi Bi- -harmonic registration (Using the same algorithm on all harmonic registration (Using the same algorithm on all computation & communication computation & communication platforms) platforms) patterns patterns CPU (Xenon 2.0 GHz) CPU (Xenon 2.0 GHz) GPU (Tesla C1060) GPU (Tesla C1060) FPGA (xc4vlx100) FPGA (xc4vlx100) total variational denoising denoising Non-iterative, highly parallel, local & global communication 1x 1x 93x 93x 11x 11x • A single, homogeneous system • A single, homogeneous system sparse linear algebra, structured grid, optimization methods ~100 W ~100 W ~150 W ~150 W ~5W ~5W algorithm cannot perform very well on all cannot perform very well on all of these algorithms of these algorithms 3D median filter: For each 3D median filter: For each voxel voxel, compute the median of , compute the median of • Need architecture • Need architecture registration registration fluid the 3 x 3 x 3 neighboring the 3 x 3 x 3 neighboring voxels voxels parallel, global communication customization and hardware- - customization and hardware registration dense linear algebra, optimization methods CPU (Xenon 2.0 GHz) GPU (Tesla C1060) FPGA (xc4vlx100) software co- -optimization optimization CPU (Xenon 2.0 GHz) GPU (Tesla C1060) FPGA (xc4vlx100) software co Quick select Quick select Median of medians Median of medians Bit- Bit -by by- -bit majority voting bit majority voting • Include many common Include many common • 1x 1x 70x 70x 1200x 1200x computation kernels (“ “motifs motifs” ”) ) segmentation segmentation computation kernels ( ~100 W ~100 W ~140 W ~140 W ~3 W ~3 W level set local communication • • Applicable to other domains Applicable to other domains dense linear algebra, spectral methods, MapReduce methods analysis analysis Navier-Stokes local communication sparse linear algebra, n-body methods, graphical models equations 10
11 Center for Domain-Specific Computing (CDSC) Organization • A diversified & highly accomplished faculty team: 8 in CS&E; 1 in EE; 2 in medical school; 1 in applied math • 15-20 postdocs and graduate students in four universities – UCLA, Rice, Ohio-State, and UC Santa Barbara Aberle Baraniuk Bui Chang Cheng Cong (Director) (UCLA) (Rice) (UCLA) (UCLA) (UCSB) (UCLA) Palsberg Potkonjak Reinman Sadayappan Sarkar Vese (UCLA) (UCLA) (UCLA) (Ohio-State) (Associate Dir) (UCLA) (Rice) 12
Recommend
More recommend