. . Compressing Strings of the Kernel Wolfram Sang Consultant 21.8.2014, LinuxCon14 Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 1 / 36
The origin: CEWG project 1 kernel debug messages. 21.8.2014, LinuxCon14 Compressing Strings of the Kernel Wolfram Sang (wsa@the-dreams.de) 1 http://elinux.org/Compressed_printk_messages … Timothy Miller did some work on this in 2003 … savings attractive, because they still would like to see . developers do not use it, even when they find the space option to disable all printks, but many embedded messages to save kernel runtime footprint. There is an Attempts have been made in the past to compress printk . . From the proposal: 2 / 36
Timothy’s approach 2 4 21.8.2014, LinuxCon14 Compressing Strings of the Kernel Wolfram Sang (wsa@the-dreams.de) 2 http://lwn.net/Articles/28935/ Compile again 6 . There, replace strings with tokens 5 . . Create copies of the source files . . . Compress those strings using tokenization 3 . . Filter them for printk strings 2 . . Compile kernel and keep .i -files 1 . 3 / 36
Further notes no code was made public, only the description, a codebook and some results not even sure depacking at printk was ever made allyesconfig was used for the tests based on 2.4.20 and 2.5.68 Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 4 / 36
Author asks the golden question, too . Timothy Miller: 3 . . ”So, I ask... is this a useful savings? Is there any chance anyone would bother to increase their compile time by a factor of 5 in order to shave off 4% or 100k bytes?” 3 https://lkml.org/lkml/2003/6/6/207 Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 5 / 36
The Graph! Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 6 / 36
The Graph! Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 6 / 36 5 4 3 compiletime size 2 1 0 before after
Three problems identified . 21.8.2014, LinuxCon14 Compressing Strings of the Kernel Wolfram Sang (wsa@the-dreams.de) Replace printk format strings 3 . Compress printk format strings . 2 . . Extract printk format strings 1 . 7 / 36
Extracting . Problem: Find all printk-strings . . There are lots of functions/defines embedding printk/vprintk_emit They are nested in all ways you can think of Moving target, there will be more <new_subsys>_dev_err, … Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 8 / 36
Extracting: Options I . Scan the source files . . needs to know all printk-emerging functions misses merging of literals handle all ways of string concatenation Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 9 / 36
Extracting: Options II . printk strings to own section . . scales a bit better (only base functions need to be converted) no knowledge where strings came from needs changes to core functions Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 10 / 36
Extracting: own section + 21.8.2014, LinuxCon14 Compressing Strings of the Kernel Wolfram Sang (wsa@the-dreams.de) +} while (0) printk(fmt, ##args); \ + } else \ printk(__f, ##args); \ +#define __printk(fmt, args...) \ + char __f[] = fmt; \ + static const __attribute__((section("__printk"))) \ + if (__builtin_constant_p(fmt)) { \ + +do { \ 11 / 36
Extracting: own section II + 21.8.2014, LinuxCon14 Compressing Strings of the Kernel Wolfram Sang (wsa@the-dreams.de) BTW don’t redefine printk . Really, don’t! ... __printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__) printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__) #define pr_emerg(fmt, ...) \ - #define pr_alert(fmt, ...) \ __printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__) + printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__) - 12 / 36
Extracting: own section III 5 ++++ 21.8.2014, LinuxCon14 Compressing Strings of the Kernel Wolfram Sang (wsa@the-dreams.de) 5 files changed, 75 insertions(+), 60 deletions(-) | 27 +++++++++++++++------- include/linux/printk.h | 48 ++++++++++++++++++++++++++------------- include/linux/device.h include/asm-generic/vmlinux.lds.h | Author: Wolfram Sang <wsa@the-dreams.de> | 19 ++++++++++++---- include/asm-generic/bug.h | 36 ++++------------------------- drivers/base/core.c Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Kernel doesn't fully build yet. Only pr_*, dev_*, BUG, and WARN are supported. WIP: move printk strings to a special section 13 / 36
Upstreaming forecast Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 14 / 36
Upstreaming forecast Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 14 / 36
Compressing . 21.8.2014, LinuxCon14 Compressing Strings of the Kernel Wolfram Sang (wsa@the-dreams.de) no frequency based compression (stats3) no variable length encoding (Huffman and friends) no sliding window algos (LZ and friends) . Conclusion . . no significant overhead not somewhere in the middle of packed data should be instantly available lots of small strings . . Problem: Algorithm 15 / 36
Compressing . 21.8.2014, LinuxCon14 Compressing Strings of the Kernel Wolfram Sang (wsa@the-dreams.de) no frequency based compression (stats3) no variable length encoding (Huffman and friends) no sliding window algos (LZ and friends) . Conclusion . . no significant overhead not somewhere in the middle of packed data should be instantly available lots of small strings . . Problem: Algorithm 15 / 36
Compressing II tokenization is actually a good option BytePairEncoding works, too . Problem: UTF8 . . Both approaches need ’empty’ symbols which might collide with UTF8 encoding. Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 16 / 36 both achieve ≈ 50% of compression still ≈ 4% of the kernel size gained
Compressing II tokenization is actually a good option BytePairEncoding works, too . Problem: UTF8 . . Both approaches need ’empty’ symbols which might collide with UTF8 encoding. Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 16 / 36 both achieve ≈ 50% of compression still ≈ 4% of the kernel size gained
Upstreaming forecast Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 17 / 36
Upstreaming forecast Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 17 / 36
Compressing III Brainstorming 21.8.2014, LinuxCon14 Compressing Strings of the Kernel Wolfram Sang (wsa@the-dreams.de) could save second kernel compile, too predefined codebook? . . . . what about modules? smaller kernels, smaller pool for codes allyesconfig is unrealistic for tiny systems . . Problem: Codebook 18 / 36 share codebook from kernel → tied to that build own codebook → overhead eats gain
Replacing . Scanning . . run source files through filter before compiling . Own section . . Work on the section directly? Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 19 / 36
Upstreaming forecast Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 20 / 36
Upstreaming forecast Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 20 / 36
Further issues printk strings are only a subset devicetree uses a lot of strings! which should be easier to tackle since they are accessed via of_* functions address this problem at a higher level all strings? all .rodata? Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 21 / 36
Summary Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 22 / 36
Summary Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 22 / 36
This quote still makes sense . From: Managing Gigabytes 4 . . ”We find ourselves in the midst of a practically important and intelectually fascinating convergence between the desire for more and better compression and the need to learn about what ’structure’ there is in data.” 4 Witten/Moffat/Bell, 1st edition, p. 385 Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 23 / 36
Analyzing the data observations from 3.16-rc5: x86-64 allyesconfig arm-cortexa8 customer kernel maybe a bit biased for device drivers Wolfram Sang (wsa@the-dreams.de) Compressing Strings of the Kernel 21.8.2014, LinuxCon14 24 / 36
Print from central locations . 21.8.2014, LinuxCon14 Compressing Strings of the Kernel Wolfram Sang (wsa@the-dreams.de) devm_ioremap_resource() OOM error message removal . Examples . . bonus: consistent messages locations as possible strings should be emitted from as centralized . . Proposal 25 / 36
Recommend
More recommend