Architecture-aware Automatic Computation Offload for Native Applications Gwangmu angmu Lee, e, Hyunjoon Park, Seonyeong Heo, Kyung-Ah Chang * , Hyogun Lee * , and Hanjun Kim. POSTECH / Samsung Electronics * 1 /23
Mo Mobi bile le de devic ices es ar are e slo low void runGame () { while (!gameover) { Move mv; /* User Inputs */ mv = getPlayerTurn (); 5 ~ 6x pieces[mv.tar] = mv.to; Performance Gap Mobile /* Heavy Computation */ mv = getAITurn (); Desktop pieces[mv.tar] = mv.to; } 1x 2x 3x 4x 5x } Chess Movement Computation 2 /23
Offl ffloadi oading ng can an bo boos ost t you our mo mobi bile le de devic ice! e! Move the knight to A6! Mobile Device Which piece to move? Computation High-performance Result Server 3 /23
Mo Most t of offloa oadi ding ng sys ystems tems ar are e ba based ed on on V VMs Ms. Application (Offloaded) (Offloaded) VM VM Android Linux ARM x86 Mobile Server CloneCloud (EuroSys`11), MAUI (MobiSys`10) , CMCloud (CCGrid`14) 4 /23
Nat ative ive Cod ode e Ex Exec ecut utio ion n Ti Time me of of Top op 20 20 And ndroi oid d App ppli lica catio tions ns AdAway Web Browser Orbot Firefox VLC Player Video Player Open Camera osmAnd Navigation Syncthing AFWall+ 2048 K-9 Mail PDF Reader PDF Reader ownCloud DAVdroid Barcode Scanner E-book Reader SatStat Cool Reader OS Monitor Console Emulator Orweb PPSSPP Adblock Plus 0% 20% 40% 60% 80% 100% 5 /23
Cha hall llen enge ges s in in of offl floa oadin ding g na nati tive work orklo loads ads ARM ARM Application Application Android Linux ARM x86 Mobile Server Different Different Distinct Different Architecture Architecture Memory Memory Layouts 6 /23
Cha hall llen enge ges in in of offl floa oadin ding g na nati tive work orklo loads ads Stack ARM ARM Stack Application Application Heap Android Linux Heap Text ARM x86 Text Mobile Server Different Distinct Different Architecture Memory Memory Layouts 7 /23
Cha hall llen enge ges s in in of offl floa oadin ding g na nati tive work orklo loads ads ARM ARM int int double Stack Application Application double Android Linux ptr = 0x1234 ptr = 0x1234 ARM x86 Mobile Server Different Distinct Different Architecture Memory Memory Layouts 8 /23
Our ur Str trate ategy gy 1: 1: Com ompi pile e Bot oth h Bina nari ries es! ARM ARM x86 Application Offloaded Application Android Linux ARM x86 Mobile Server Different Different Distinct Different Architecture Architecture Memory Memory Layouts 9 /23
Our ur Str trate ategy gy 2: 2: Uni nifi fied ed Vir irtua tual l Add ddres ess Stack ARM x86 Stack Application Offloaded Heap Android Linux Heap Text ARM x86 Text Mobile Server Different Distinct Different Architecture Memory Memory Layouts 10 /23
Our ur Str trate ategy gy 3: 3: Uni nifi fied ed Me Memo mory ry La Layou out ARM x86 int int int int double Application Offloaded double double double double Android Linux ptr = 0x1234 ptr = 0x1234 ARM x86 Mobile Server Different Distinct Different Architecture Memory Memory Layouts 11 /23
Native Offloader : Structure Overview runGame:70% Target Selection getAITurn:68% Profile Virtual Address Unification Mobile Binary Compiler Offload Partition Sources Server Specific Opt. Server Binary 12 /23
Target Selection VA Unification Partition Server Specific Opt. Sel elec ecting ting Prof ofitab itable le Tar arge gets ts Exec. Call Mem. Candidate Time Count Size runGame 27 s 1 20 MB getAITurn 26 s 3 12 MB getPlayerTurn 1.5 s 3 10 MB Estimation Ideal Estimated Candidate Comm. Gain Gain runGam nGame 21.6 6 s 4 s s 17.6 6 s getAITu Turn rn 20.8 8 s 7.2 s 13.6 6 s getPlayerTurn 1.2 s 6 s -4.8 s 13 /23
Target Selection VA Unification Partition Server Specific Opt. Sel elec ecting ting Prof ofitab itable le Tar arge gets ts Exec. Call Mem. Candidate runGame runGame Time Count Size runGame 27 s 1 20 MB getAITurn getPlayerTurn getPlayerTurn getAITurn 26 s 3 12 MB getPlayerTurn 1.5 s 3 10 MB This calls input operations. Estimation Ideal Comm. Estimated Candidate Gain Gain runGam nGame 21.6 6 s 4 s s 17.6 6 s Selecte cted d getAITu getAITu Turn Turn rn rn 20.8 20.8 8 s 8 s 7.2 s 7.2 s 13.6 13.6 6 s 6 s getPlayerTurn 1.2 s 6 s -4.8 s 14 /23
Target Selection VA Unification Partition Server Specific Opt. Uni nifyi fying ng Str truc ucture ure La Layou outs ts typedef struct { char tar, to; double score; <pad>; } Move; x86 ARM tar to score tar to score Unified tar to padding score 15 /23
Target Selection VA Unification Partition Server Specific Opt. Uni nifyi fying ng Hea eap p Area eas board = malloc (sizeof(char) * 32); u_malloc free (board); u_free Heap Mobile Server 16 /23
Target Selection VA Unification Partition Server Specific Opt. Ali lign gnin ing g Two o Sta tack ck Area eas int foo () { int bar (int *i) { int i = 30; int j = 60; *i==60 bar (&i); if (*i!=30) crash; if (*i!=30) crash; Crash! } } Pointed by int *i Mobile Stack i = 30 j = 60 Server Stack j = 60 Mobile Server 17 /23
Target Selection VA Unification Partition Server Specific Opt. Par artition titionin ing g fo for The he Sep epar arat ate e Bin inar aries ies void runGame () { void listenClient () { while (!gameover) { FunctionID offID; Move mv; while (true) { mv = getPlayerTurn (); pieces[mv.tar] = mv.to; Offload mv = getAITurn (); requestOffload (getAITurn); offID = acceptOffload (); sendData (); receiveData (); Execute executeFunction (offID); Return receiveData (); sendData (); pieces[mv.tar] = mv.to; } } } } Mobile Source Server Source 18 /23
Target Selection VA Unification Partition Server Specific Opt. Ser erver ver Spe pecific ific Opt ptimization imizations Remote I/O :: The server’s request for I/O operations remotely. Function Pointer Mapper :: Maps the mobile’s function address to the server’s one Please refer to our paper for more details! 19 /23
Rem emai ainin ning g Cha hall llen enge ges s fo for Mo Mobi bile le App pps Multi-threaded Applications :: Emerging mobile applications are multi-threaded. Multi-language Support :: Mobile apps are written in multiple languages. ex) Android Apps w/ NDK (Java + C/C++) So, this work uses SPEC benchmark suites. 20 /23
Eval alua uation tion Galaxy S5 as the Mobile Device, Dell XPS8700 as the Server :: Mobile (2.5GHz Quad-core Krait 400) Server (Intel 3.60GHz Quad-core i7-4790) 17 SPEC 2000/2006 C Benchmarks :: LLVM Compile Error: 400.perlbench, 403.gcc Non-profitable target: 197.parser, 254.gap, 255.vortex 2 Different Network Bandwidths :: 802.11n (Maximum 144Mbps) 802.11ac (Maximum 844Mbps) LLVM Compiler Framework 21 /23
Nor ormaliz malized ed Exec ecution ution Tim ime 1.0 Native Offloader (144 Mbps) Normalized Execution Time 0.9 Native Offloader (844 Mbps) 0.8 Ideal Offloading 0.7 6.42x 0.6 0.5 Speed-up! 0.4 0.3 0.2 0.1 0.0 22 /23
Nor ormalized malized Bat atte tery ry Con onsumption umption Normalized Battery Consumption 1.0 Native Offloader (144 Mbps) 0.9 Native Offloader (844 Mbps) 0.8 Ideal Offloading 0.7 82% 0.6 0.5 Saving! 0.4 0.3 0.2 0.1 0.0 23 /23
Con onclusio lusion Native Offloader :: Compiler/runtime cooperative offloading system for general-purpose native applications :: To minimize offloading overheads, this work unified virtual address spaces. Fast & Battery-friendly Woo-hoo! :: 6.42x Speed-up / 82% Battery saving for 17 SPEC 2000/2006 C Benchmarks The paper also includes remote I/O, function pointer mapping, pointer size / endianness translation, runtime system and comm. optimizations. 24 /23 Image: http://www.worth1000.com/contests/27097/interspecies-teamwork-2
Na Native e Of Offlo floade ader Architecture-aware Automatic Computation Offload for Native Applications Backup Slides 25 /23
How ow It It Wor orks ks at at Run unti time me Stage: Local Execution Stage: Initialization Synchronize Page Table Physical Pages Prefetch Mobile Server 26 /23
How ow It It Wor orks ks at at Run unti time me Stage: Offloading Execution Dirty Pages Page Table Page Fault Physical Pages On-demand Mobile Server Copy 27 /23
How ow It It Wor orks ks at at Run unti time me Stage: Finalization Synchronize Dirty Pages Page Table Physical Pages Write-back Mobile Server 28 /23
Se Severa ral l offlo load ading ng system stems s are alread ady y propose osed. d. Obj. Obj. Obj. Obj. Instr. Instr. Instr. Pointer Analysis-based System Li et al. (CASES`01), Wang and Li (PLDI`04) 29 /23
Fun unctio tion n Poi oint nter er Ma Mapp ppin ing Stack void (* fptr ) (); fptr = foo; fptr (“ARGS”); Heap fptr = toServer (fptr); fptr (“ARGS”); foo Text foo Mobile Server 30 /23
Recommend
More recommend