Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture Markus Ongyerth Chair for Network Architectures and Services Department for Computer Science Technische Universit¨ at M¨ unchen September 30, 2014 Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 1
Outline Motivation 1 Network Coding ARMv8 Algorithms 2 Results 3 4 Contributions Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 2
Network Coding A B C D E F Figure: Node composition of a butterfly network Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 3
Network Coding A B C D E F Figure: Message passing on normal routed network Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 4
Network Coding A B C D E F Figure: Message passing with network coding Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 5
ARMv8-Apple A7 64bit 1.3 GHz 64/64kib L1 cache 1MiB L2 cache Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 6
imul and shuffle imul shuffle Possible on GPRs Uses precomputed values Benefits naturally from Only useable with SIMD bigger registers extensions (table-lookup) Can benefit from SIMD Currently not supported by Apple-LLVM Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 7
Benchmark Generation of 16 128B to 8KiB Results in Gbps Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 8
Result for iPad 11.0 L1 L2 XOR NEON 10.0 XOR 32 bit 9.0 XOR 64 bit 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(2) - base performance Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 9
Results in GF(2) 11.0 XOR 32 bit A7 10.0 XOR 32 bit Exynos5 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(2) - base performance Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 10
Results in GF(2) 11.0 XOR 32 bit A7 10.0 XOR 32 bit Exynos5 9.0 XOR 64 bit A7 XOR 64 bit Exynos5 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(2) - 64bit GPR Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 10
Results in GF(2) 11.0 XOR 32 bit A7 10.0 XOR 128 bit A7 9.0 XOR 128 bit Exynos5 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(2) - NEON Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 10
Results in GF(4) imul NEON ipad 4.0 imul NEON exynos5 3.6 table lookup ipad table lookup exynos5 3.2 2.8 2.4 2.0 1.6 1.2 0.8 0.4 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(4) - base performance Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 11
Results in GF(4) shuffle 4.0 imul NEON ipad 3.6 imul NEON exynos5 3.2 2.8 2.4 2.0 1.6 1.2 0.8 0.4 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(4) - performance with shuffle Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 11
Results in GF(16) 2.0 imul NEON ipad 1.8 imul NEON exynos5 1.6 table lookup ipad table lookup exynos5 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(16) - base performance Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 12
Results in GF(16) 2.0 shuffle 1.8 imul NEON ipad 1.6 imul NEON exynos5 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(16) - performance with shuffle Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 12
Results in GF(256) 2.0 imul NEON ipad 1.8 imul NEON exynos5 1.6 table lookup ipad table lookup exynos5 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(256) - base performance Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 13
Results in GF(256) 2.0 shuffle 1.8 imul NEON ipad 1.6 imul NEON exynos5 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 0.25 KiB 1 2 4 8 16 32 64 256 1024 4096 Figure: GF(256) - performance with shuffle Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 13
What I did Deactivate unsupported parts GUI for libmoepgf benchmark on IOS Benchmark on ARMv8 Expected results Shuffle is still to see Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 14
Selected references The libmoepgf library is available for download at: http://moep80211.net/plink/netcod2014 Stephan G¨ unther Efficient GF Arithmetic for Linear Network Coding using Hardware SIMD extensions (2014) Shuo-Yen Robert Li, Senior Member, IEEE, Raymond W. Yeung, Fellow, IEEE, and Ning Cai: Linear Network Coding, IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 2, FEBRUARY 2003 Markus Ongyerth: Hardware-accelerated Galois Field Arithmetic on the ARMv8 Architecture 15
Recommend
More recommend