trading communication and computing for
play

Trading Communication and Computing for Distributed Matrix - PowerPoint PPT Presentation

Multi-Cell Mobile Edge Coded Computing: Trading Communication and Computing for Distributed Matrix Multiplication ISIT, June 21-26, 2020 Emerging Mobile Applications Computation-intensive Delay-sensitive 2 Mobile Edge Computing


  1. Multi-Cell Mobile Edge Coded Computing: Trading Communication and Computing for Distributed Matrix Multiplication ISIT, June 21-26, 2020

  2. Emerging Mobile Applications Computation-intensive ◼ Delay-sensitive ◼ 2

  3. Mobile Edge Computing (MEC) Provides IT and cloud-computing capabilities within the Radio Access Network (RAN) in ◼ close proximity to mobile subscribers [ETSI'14] Data center Web Gateway CDN Promote user experience: ◆ Save energy ◆ Reduce latency [ETSI’14] “Mobile -edge computing —Introductory technical white paper,” White Paper, ETSI, Sophia Antipolis, France, Sep. 2014. 3

  4. Challenge Task offloading procedure Uplink Computation Downlink Computation timeline Downlink Uplink EN 1 EN k EN K User i User 1 User M Input data uploading Challenges: ◼ ◆ Severe interferences or deep fading Distributed edge computing ◼ ◆ Random server computing times, i.e., stragglers Output data downloading ◆ End-to-end times are significantly prolonged ◼ 4

  5. Our Approach Exploit computation replication and coded computing ◼ ◆ Consider matrix multiplication in linear inference task: — U : user ’s input vectors, A : network-stored model, V : desired output vectors ◆ Assign the input vectors U from users to multiple ENs ◆ Encode model A by hybrid MDS and repetition codes Overcome Reduce recovery Repeated assignment of U straggler threshold Create spatial redundancy Transmission Mitigate MDS-Repetition coding for A cooperation interferences Investing more time in any one of three task offloading steps can reduce the ◼ time needed for subsequent steps Tradeoffs among upload, computing, and download latencies 5

  6. Related Works [Zhang’19] utilizes MDS-Repetition codes ◼ ◆ Assume input vectors from all users are available at all ENs ◆ Propose a computing-downloading strategy [Li’20] exploits computation replication ◼ ◆ Assume computing times of ENs are deterministic; adopt general task model ◆ Characterize an upload-download latency tradeoff Our work ◼ ◆ Propose a joint task assignment, upload, computing, and download policy ◆ Study tradeoffs among upload, computing, and download latencies ◆ Converse: our policy is approximately optimal for sufficiently large upload times. [Zhang’19]J. Zhang and O. Simeone, “ On model coding for distributed inference and transmission in mobile edge computing systems, ” IEEE Commun. Letters, vol. 23, no. 6, pp. 1065 – 1068, Jun. 2019. [Li’20] K. Li, M. Tao, and Z. Chen, “Exploiting computation replication for mobile edge computing: A fundamental 6 computation- communication tradeoff study,” IEEE Trans. Wireless Commun., pp. 1 – 1, 2020.

  7. MEC Network Model Stored encoded model Uplink Downlink User 1 User i User M Input data Desired output Each user i has N input vectors and desires N output vectors ◼ Each EN stores rows of A m × n , ◼ is the time to compute a row-vector product ◼ is the set of input vectors from all users assigned to EN k ◼ The computing time of EN k is ◼ 7

  8. Performance Metric the average number of ENs that Repetition order r : ◼ are assigned the same input vector The rest K-q ENs Recovery order q: the number of non-straggling ENs to return outputs are stragglers ◼ Feasible region: To store enough information ◆ of A for computing outputs Normalized uploading time (NULT): ◼ reference time Normalized computing time (NCT): ◼ reference time Normalized downloading time (NDLT): ◼ reference time avoid rounding complications For an NULT , the compute-download latency region: ◼ 8

  9. Fundamental Question Given an upload latency , what is the optimal trade-off region between computing and download latencies ? 4.5 4 3.5  d ) 3 Inner Bound NDLT ( 2.5 2 1.5 Outer Bound 1 0 2 4 6 8 10 12 14 16 NCT (  c ) Compute-download latency region at for M=K=10 ◆ Characterize the inner bound and outer bound on the compute-download latency region at any given upload latency ◆ Present tradeoffs among upload, computing, and download latencies. 9

  10. Example: Task Assignment & Upload M=5 users, K=5 ENs, N=5 input vectors, m=40 row vectors, μ=3/5, (r, q)=(4, 3) ◼ Each user divides input vectors into 5 4 = 5 subsets, each has 1 input and is ◼ assigned to a distinct subset of 4 ENs Uplink: 5-transmitter 5-receiver X-multicast user ◼ i = 1, ..., 5 channel with multicast group size 4 u i,5 u i,2 u i,3 u i,4 1 EN 1 Interference ◆ Optimal per-receiver DoF … alignment u i,1 u i,3 u i,4 u i,5 u i,5 u i,4 u i,3 EN 2 ◆ Approximated transmission rate: u i,1 u i,2 u i,4 u i,5 i EN 3 ◆ Upload time: u i,1 u i,2 u i,1 u i,2 u i,3 u i,5 EN 4 … The NULT: ◼ 5 u i,1 u i,2 u i,3 u i,4 EN 5 Any 2 ENs may be stragglers at edge computing phases 10

  11. Example: Coding & Edge Computing Repetition code MDS code rate Hybrid MDS-Repetition codes rate ◼ select to m store at each ◆ Coding rates : EN storage constraint: MDS code Inputs assignment recovery condition: (60, 40) a a A A c = [ , ..., ] at r = 4 1 60 choose maximum : store a i = 1, ..., 5 a a a a a a a 7 19 22 1 4 1 0 13 16 a a a a a a a a u i,2 u i,3 u i,4 u i,5 ◆ Encode A into A c with 60 rows, then split into × 8 20 23 2 5 1 1 14 17 EN 1 a a a a a a a a 9 21 24 3 6 1 2 15 18 submatrices, each with 6 rows and a a a a a a a a 25 28 1 4 31 33 37 40 a a a a a a a a u i,1 u i,3 u i,4 u i,5 × EN 2 replicated at 2 ENs. 26 29 32 2 5 41 34 38 a a a a a a a a 27 30 33 39 42 3 6 36 a a a a a a a a 25 28 1 0 43 49 52 7 46 a a a a a a a a u i,1 u i,2 u i,4 u i,5 × EN 3 Edge computing ( q = 3 ) 29 26 8 1 1 47 50 53 44 a a a a a a a a ◼ 30 9 1 2 45 54 27 48 51 a a a a a a a a ◆ Each EN computes 24 × 20 row-vector products 43 13 16 31 33 46 55 58 a a a a a a a a u i,1 u i,2 u i,3 u i,5 EN 4 × 32 56 59 14 17 34 44 47 a a a a a a a a ◆ Waiting for the fastest 3 ENs, the NCT is 60 57 15 18 33 36 45 48 a a a a a a a a 19 22 37 40 58 49 52 55 u i,1 u i,2 u i,3 u i,4 a a a a a a a × a EN 5 20 23 41 50 53 56 59 38 a a a a a a a a 60 21 24 51 54 57 39 42 11 Any 2 ENs may be stragglers

  12. Example: Output Data Download Divide needed outputs into multiple groups ◼ MISO broadcast ◆ Different groups are transmitted using TDMA channel ◆ Downlink channel for transmitting outputs in each group is cooperative X channel Computation results of : ◼ ◆ 30 outputs — 2-transmitter 5-receiver MISO broadcast channel — optimal per-receiver DoF: 2/5 by zero-forcing (ZF) precoding MDS code store (60, 40) Inputs assignment a a A c = [ , ..., ] A 1 60 at r = 4 12

  13. Example: Output Data Download Divide needed outputs into multiple groups ◼ MISO broadcast ◆ Different groups are transmitted using TDMA X channel channel ◆ Downlink channel for transmitting outputs in each group is cooperative X channel Computation results of : ◼ ◆ 30 outputs — 2-transmitter 5-receiver MISO broadcast channel — optimal per-receiver DoF: 2/5 by zero-forcing (ZF) precoding MDS code store ◆ The rest needed 34×5=170 outputs: (60, 40) Inputs assignment a a — 2-transmitter 5-receiver X channel, A c = [ , ..., ] A 1 60 at r = 4 — optimal per-receiver DoF: 1/3 by asymptotic interference alignment (IA) ◆ NDLT: 3/40+51/100=117/200 13

  14. Example: Output Data Download Computation results of : ◼ MISO broadcast X channel channel ◆ NDLT: 117/200×2 = 117/100 MDS code store (60, 40) Inputs assignment a a A c = [ , ..., ] A 1 60 at r = 4 14

  15. Example: Output Data Download Computation results of : cooperative X ◼ channel X channel ◆ 3-transmitter 5-receiver cooperative X channel with cooperation group size 2 ◆ 3-transmitter 5-receiver X channel ◆ NDLT: (21/100+77/300)×2=14/15 Note that the rest number of outputs in the last round can be regarded as an integer divided evenly by when Total NDLT: ◼ =14/15+(117/200)×3=1613/600 MDS code store (60, 40) Inputs assignment a a A c = [ , ..., ] A 1 60 at r = 4 15

  16. Achievable Results At a pair (r, q) in ◼ NULT: ◆ NCT: ◆ NDLT: ◆ where and is determined by Consider all feasible values of q For an NULT , the inner bound of compute-download latency region: ◼ (time- and memory-sharing) 16

Recommend


More recommend