accel ac celeration of er eration of erasur asure e cod
play

Accel Ac celeration of Er eration of Erasur asure e Cod Codin - PowerPoint PPT Presentation

To Towar ards ds In In-networ network k Accel Ac celeration of Er eration of Erasur asure e Cod Codin ing Yi Qiao, Xiao Kong, Menghao Zhang, Yu Zhou, Mingwei Xu, Jun Bi Tsinghua University Eras rasure ure Coding ing (EC) In


  1. To Towar ards ds In In-networ network k Accel Ac celeration of Er eration of Erasur asure e Cod Codin ing Yi Qiao, Xiao Kong, Menghao Zhang, Yu Zhou, Mingwei Xu, Jun Bi Tsinghua University

  2. Eras rasure ure Coding ing (EC) • In data centers, machine failures happen very frequently. Facebook reports up to 50 machine failures per day in their data warehouses. • EC provides data fault tolerance with much lower storage overheads (~1.4x) than replication (3x), with similar degree of availability. • EC reconstructs missing data with remaining data and pre-calculated parities. • For example: • XOR (RAID 5) • Reed-Solomon Codes

  3. EC C Examples xamples • Conclusion: EC reconstruction can be modelled with XOR (RAID 5) 𝑙 𝑛 = ෍ 𝑏 𝑗 𝑦 𝑗 , 𝑗=1 𝑏 𝑐 𝑑 𝑞 = 𝑏⨁𝑐⨁𝑑 𝑛 : reconstructed symbol 𝑦 𝑗 : symbols from remaining machines Reconstruct b with 𝑐 = 𝑏⨁𝑑⨁𝑞 𝑏 𝑗 : pre-computed coefficients • Addition refers to XOR Reed Solomon Code ( Conceptual ) • Multiplication is on Galois Field 𝑏 𝑐 𝑑 𝑞 1 = 𝑏 + 𝑐 + 𝑑 𝑞 2 = 𝑏 + 2𝑐 + 2𝑑 linear combinations These are Galois Field arithmetics. For simplicity, just comprehend them as integer arithmetics. Reconstruct a with 𝑏 = 2𝑞 1 − 𝑞 2 Reconstruct c with 𝑑 = 𝑞 2 − 𝑞 1 − 𝑐

  4. EC Pro roblems blems • Low reconstruction rate • Several hours to reconstruct a disk • Several seconds for degraded reads • EC is mostly used for storing “cold” data in data warehouses. • Why so slow?

  5. Line width represents throughput Motiva tivation tion Multiplexed Forward NIC ToR CPU 𝑩 𝑪 𝟐 𝑪 𝟑 𝑪 𝟒 DISK Disk Reconstruction Rate = 1/3 of available NIC capacity No NIC Sharing/multiplexing Forward NetEC NIC ToR CPU 𝑩 𝑪 𝟐 𝑪 𝟑 𝑪 𝟒 DISK Near 100% of available NIC capacity

  6. Ne NetE tEC • We present NetEC that offloads EC reconstruction to programmable switches. • It improves reconstruction rates by k times, where k is the number of the machines to download from. • It also entirely removes CPU usage.

  7. Bri rief ef Ov Over erview view of f Ne NetE tEC Da Data ta Pla lane ne 000 0 P1 𝒚 𝟐 On Switch Decoding Buffer 𝒃 𝟐 𝒚 𝟐 P1 arrives 𝐶 1 100 𝒃 𝟐 𝒚 𝟐 𝒛 𝟐 𝒃 𝟐 𝒛 𝟐 Progress Drop Partial XOR … … P2 arrives Tracker Sum Buffer 110 𝒃 𝟐 𝒚 𝟐 + 𝒃 𝟑 𝒚 𝟑 P2 𝒚 𝟑 𝒃 𝟑 𝒚 𝟑 GF … 𝐶 2 Drop P3 arrives 𝒛 𝟑 𝒃 𝟑 𝒛 𝟑 111 𝒃 𝟐 𝒚 𝟐 + 𝒃 𝟑 𝒚 𝟑 + 𝒃 𝟒 𝒚 𝟒 … Mult. … … P3 𝒃 𝟐 𝒚 𝟐 + 𝒃 𝟑 𝒚 𝟑 + 𝒃 𝟒 𝒚 𝟒 𝒚 𝟒 𝒃 𝟒 𝒚 𝟒 A 𝐶 3 …… …… 𝒃 𝟐 𝒛 𝟐 + 𝒃 𝟑 𝒛 𝟑 + 𝒃 𝟒 𝒛 𝟒 𝒛 𝟒 𝒃 𝟒 𝒛 𝟒 … … … ① ② ③ ④ ⑤ ⑥ ⑧ ⑦ Extracted In PHVs Stateful Registers

  8. Cha halle llenges nges an and Des Design ign • Galois Field Multiplication Offloading • Rate Synchronization • Deep Payload Inspection/assembly

  9. Cha halle llenges nges an and Des Design( ign(1) 1) • Galois Field Multiplication Offloading • We convert it to addition, logarithm and exponents • To calculate 𝒃 𝟐 𝒚 𝟐 , • Look up 𝒎𝒑𝒉(𝒚 𝟐 ) in the logarithm table • Add with a pre-known 𝒎𝒑𝒉(𝒃 𝟐 ) : 𝒎𝒑𝒉(𝒃 𝟐 𝒚 𝟐 ) = 𝒎𝒑𝒉(𝒃 𝟐 ) + 𝒎𝒑𝒉(𝒚 𝟐 ) • Look up 𝒃 𝟐 𝒚 𝟐 in the exponent table: 𝒃 𝟐 𝒚 𝟐 = 𝒇 𝒎𝒑𝒉(𝒃 𝟐 𝒚 𝟐 ) • Note that the logarithms and exponents are also on the Galois Field, where this method is valid. • Rate Synchronization • Deep Payload Inspection/assembly

  10. Cha halle llenges nges an and Des Design( ign(2) 2) • Computation Offloading • Rate Synchronization • Switch has to temporarily buffer partial XOR sums since first packet arrives until last packet leaves. • One-to-many TCP • The switch only needs to buffer partial XOR sums whose size is equal to in-flight packets, bounded by BDP (bandwidth-delay product) • SSD peak write speed: 1GB/s • DC RTT : 250 us • BDP = 250KB • Deep Payload Inspection/assembly

  11. Cha halle llenges nges an and Des Design( ign(3) 3) • Computation Offloading • Rate Synchronization • Deep Payload Inspection/assembly • Many switch constraints leads to limited number of processed bytes, while small-sized packets reduce throughput. • Use recirculation inspired by PPS (SOSR 19) • Redesign l4 checkcum updates.

  12. Di Discu scussion ssions s an and li limitation mitations • Will NetEC cause incast? • NetEC actually prevents incast. • Most incoming packets are dropped in the ingress pipeline. • O utbound PPS ≈ Inbound PPS • Is NetEC scalable? • The number of machines to download from: 3, 6, 10 • The number of concurrent tasks • Problem: • Currently, a table or register can only be accessed once per packet, so that we need multiple logarithm/exponent tables. • Limited number of registers per stage.

  13. Im Implementatio plementation n an and Eva valuat luation ion • We implement a prototype of NetEC on commodity switches, and integrate it with HDFS-EC.

  14. Conc nclusion lusion • EC low reconstruction rate is due to multiplexed NIC capacity • In-network computation resolves this problem, leading to great performance improvement. • We design and implement NetEC, addressing three challenges, and conduct preliminary evaluations to show effectiveness.

  15. Tha hank nk yo you! u!

Recommend


More recommend