Swapping and embedded: compression relieves the pressure? Vitaly Wool Embedded Linux Conference 2016
Intro> Swapping (Paging) ● Paging: [OS capability of] using a secondary storage to store and retrieve data – With RAM being primary – Storing and retrieving happens on a per-page basis ● Page – Uni-size storage block, usually of size 2 n – Corresponds to a single record in page table ● Paging is only possible with VM enabled
Intro> Swapping
Intro> Embedded device objectives ● [very] limited RAM ● [relatively] slow storage – Using swap will hurt performance ● [relatively] small storage – Hardly is there a place for big swap ● Flash chip used as a storage – Swap on flash wears it out fast
Intro> Swapping in Embedded ● Should be applicable – Constrained RAM ● But is isn't sometimes – Constrained storage ● May have adverse effects – Flash storage faster wear-out – Longer delays if the storage device is slow ● There has to be a way out...
Smarter swapping> Swapping optimization: zswap ● zswap: compressed write-back cache for swapped pages – Write operation completion signaled on write-to- cache completion ● Compresses swapped-out pages and moves them into a pool – This pool is dynamically allocated in RAM ● Configurable parameters – Pool size – Compression algorithm
Smarter swapping> zswap backend: zbud ● zbud: special purpose memory allocator – allocation is always per-page ● Stores up to 2 compressed pages per page – One bound to the beginning, one to the end – The in-page pages are called “buddies” ● Key characteristics – Simplicity and stability ● zbud is the allocator backend for zswap
Smarter swapping> RAM as a swap storage ● Compression required – No gain otherwise – But increases CPU load ● Implementation of a [virtual] block device required ● Careful memory management is required – Should not use high-order page allocations
Smarter swapping> ZRAM ● Block device for compressed data storage in RAM – Compression algorithm is configurable – Default algorithm is LZO – LZ4 is used mostly ● Usually deployed as a self-contained swap device – The size is specified in runtime (via sysfs) – Configuration is the same otherwise
Smarter swapping> ZRAM vs Flash swap ● Compared on Carambola (MIPS24kc) – Details on the configuration will follow ● Standard I/O measurement tools – 'fio' with 'tiobench' script ● Results – Average read speed: 730 vs 699 (kb/s) – Average write speed: 180.5 vs 172 (kb/s) ● Difference is larger where RAM is faster
Smarter swapping> zsmalloc: ZRAM backend ● Special purpose pool-based memory allocator ● Packs objects into a set of non- contiguous pages – ZRAM calls into zsmalloc to allocate space for compressed data – Compressed data is stored in scattered pages within the pool
z--- in detail> zsmalloc and zbud compared zsmalloc zbud Compression ratio High (3x – 4x) Medium/Low (1.8x – 2x) CPU utilization Medium/High Medium Internal yes no fragmentation Latencies Medium/Low Low
z--- in detail> zpool: a unified API ● Common API for compressed memory storage ● Any memory allocator can implement zpool API – And register in zpool ● 2 main zpool users – zbud – zsmalloc
z--- in detail> zswap uses zpool API! ● zswap is now backend-independent – As long as the backend implements zpool API ● zswap can use zsmalloc – Better compression ratio – Less disk/flash utilization
ZRAM moving forward> What if ZRAM used zbud? ● Persistent storage is not used anyway – Compression ratio may not be the key ● No performance degrade over time ● Less dependency on memory subsystem ● CPU utilization may get lower ● Throughput may get higher ● Latencies may get lower
ZRAM moving forward> Why can't ZRAM use zbud? ● zbud can't handle PAGE_SIZE allocations – Uses small part of the page for internal structure ● Called struct zbud_header – Easy to fix: it can go to struct page ● ZRAM doesn't use zpool API – zsmalloc API fits zpool API nicely – Easy to fix: just implement it
ZRAM moving forward> Allow ZRAM to use zbud ● An initiative taken by the author – Allow PAGE_SIZE allocations in ZBUD – Make ZRAM use zpool ● Two mainlining attempts ● https://lkml.org/lkml/2015/9/14/356 [1] ● https://lkml.org/lkml/2015/9/22/220 [2] – Faced strong opposition from ZRAM authors – Vendor neutrality questionable ● More attempts to come
Measurements> Prerequisites ● Use fio for performance measurement – Written by Jens Axboe – Flexible and versatile ● EXT4 file system on /dev/zram0 – 50% full ● A flavor of fio 'enospc' script – Adapted for smaller block device (zram) ● 40 iterations per z--- backend (zbud/zsmalloc)
Measurements> Test device 1 ● Sony Xperia Z2 – MSM8974 CPU ● 2.3 GHz Quad-Core Krait TM – 3 GB RAM ● Cyanogenmod build as of Jan 15, 2016 (12.1) – A flavor of Android 5.1.1 – Custom 3.10-based kernel
Measurements> ZRAM performance: Android 200000 180000 160000 140000 120000 100000 zsmalloc zbud 80000 60000 Outcome: zbud clearly outperforms 40000 20000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Measurements> ZRAM latency: Android 80000 70000 Outcome: zbud outperforms again 60000 50000 zsmalloc zbud 40000 30000 20000 10000 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75
Measurements> ZRAM performance: Android Okay what happens in the long run, does zbud remain superior to zsmalloc?
Measurements> ZRAM performance: Android 200000 200000 180000 180000 160000 160000 140000 140000 120000 120000 zsmalloc zsmalloc zbud zbud 100000 100000 80000 80000 60000 60000 Outcome: yes it does. 40000 40000 20000 20000 0 0 2 2 4 4 6 6 8 8 10 10 12 12 14 14 16 16 18 18 20 20 22 22 24 24 26 26 28 28 30 30 32 32 34 34 36 36 38 38 40 40 42 42 44 44 46 46 48 48 50 50 52 52 54 54 56 56 58 58 60 60 62 62 64 64 66 66 68 68 70 70 72 72 74 74 76 76 78 78 80 80 82 82 84 84 86 86 88 88 90 90 92 92 94 94 96 96 1 1 3 3 5 5 7 7 9 9 11 11 13 13 15 15 17 17 19 19 21 21 23 23 25 25 27 27 29 29 31 31 33 33 35 35 37 37 39 39 41 41 43 43 45 45 47 47 49 49 51 51 53 53 55 55 57 57 59 59 61 61 63 63 65 65 67 67 69 69 71 71 73 73 75 75 77 77 79 79 81 81 83 83 85 85 87 87 89 89 91 91 93 93 95 95 97 97
Measurements> Test device 2 ● Intel Minnowboard Max EVB – 64bit Atom TM CPU E3815 @ 1.46GHz – DDR3 2 GB RAM – Storage 4 GB eMMC ● Debian 8.4 64 bit – Custom 4.3-based kernel
ZRAM performance: x86_64 500000 450000 400000 350000 300000 zsmalloc zbud 250000 200000 150000 Outcome: obvious. 100000 50000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
Measurements> ZRAM latency: x86_64 20000 Outcome: zbud is better again. 18000 16000 zsmalloc zbud 14000 12000 10000 8000 6000 4000 2000 0
Measurements> Test device 3 ● Carambola 2 – MIPS32 24Ke – Qualcomm/Atheros AR9331 SoC – 400 MHz CPU – 64 MB DDR2 RAM – Storage 512 MB NAND flash ● OpenWRT – Git as of Jan 15, 2016 – Custom 4.3-based kernel
Measurements> ZRAM performance: MIPS32 30000 25000 20000 15000 10000 zsmalloc zram 5000 Outcome: roughly equal. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Measurements> ZRAM latency: MIPS32 50000 Outcome: more stability with zbud. 45000 40000 zsmalloc zbud 35000 30000 25000 20000 15000 10000 5000 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75
Wrap-Up ● Compressed RAM swap is a generous idea – Many systems can benefit from it ● Two implementations mainlined – Zswap: mostly targeting big systems – ZRAM: mostly for embedded / small systems ● Each has its own backend – Zswap uses zbud – ZRAM uses zsmalloc
Conclusions ● Compressed RAM swap is the way out for embedded systems ● ZRAM over zbud is a good match for non-compression-ratio-demanding cases – Lower latencies – Higher throughput – Minimal aging ● Having options is good
swapping completed. Questions? mailto: vitalywool@gmail.com
Recommend
More recommend