JSZap : Compressing JavaScript Code Martin Burtscher, UT Austin Ben Livshits & Ben Zorn, Microsoft Research Gaurav Sinha, IIT Kanpur
A Web 2.0 Application Dissected Talks to 14 backend services 1+ MB code (traffic, images, directions, ads, …) 70,000+ lines of JavaScript code 2,855 Functions downloaded 2
Lots of JavaScript being Transmitted 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% www.live.com spreadsheets.google maps.live Up to 85% of a Web 2.0 chi.lexigame hotmail gmail app is JavaScript code! dropthings maps.google Fraction of download that is JavaScript pageflakes bunny hunt 3
AJAX: Tension Headaches Move code to Execution can’t client for start without responsiveness the code 4
JavaScript on the Wire JSZap JavaScript crunch gzip gzip -d parser AST 5
JSZap Approach • Represent JavaScript as AST instead of source • Serialize the compressed AST • Decompress directly into AST on client • Use gzip as 2 nd -level (de-)compressor 6
Benefits of AST-based Compression Reduced Latency • Compression: less to transmit • ASTs are blasted directly into the browser Reduced Network Bandwidth • Reduces mobile charges • Reduces operator network costs: better for servers Correctness, Security, and other Benefits • Ensures well-formedness of code • Can use to check language subsets easily (AdSafe) • Caching incremental updates • Unblocking HTML parser 7
JSZap Compression JavaScript JSZap gzip 8
JSZap Compression productions 1 JavaScript identifiers gzip 2 literals 3 9
GZIP is a formidable opponent 10
JSZap vs. GZIP Literals Identifiers Productions 40 35 11.5 30 8.4 Size in KB 25 20 15 19.0 18.4 10 5 5.4 5.4 0 gzip JSZap 11
Talk Outline productions 1 evaluation identifiers on real 2 code literals 3 12
Background: ASTs Expression Grammar Tree 1 a * b + c + 1) E E + T E T 2) T T * F 3) 3 * c T F 4) 5 F id 5) a b 5 5 13
A Simple Javascript Example var y = 2; function foo () { var x = "jscrunch"; var z = 3; z = y + y; } x = "jszap"; Production Stream 1 3 4 ... 1 3 4 ... Identifier Stream y foo x z z y y x Literal Stream "jscrunch" 2 3 "jszap" 14
Benchmarking JSZap • JavaScript files up Benchmark name Source Source to 22K LOC lines bytes gmonkey 922 17,382 • Variety of app types getDOMHash 1,136 25,467 bing1 3,758 77,891 bingmap1 3,473 80,066 • Both hand- livemsg1 5,307 93,982 generated, and bingmap2 9,726 113,393 machine-generated facebook1 5,886 141,469 livemsg2 7,139 156,282 officelive1 22,016 668,051 • gzipped everything 15
Components of JavaScript Source productions identifiers literals 100% 90% • None of the categories can be ignored 80% 70% • Identifiers become more prominent with code growth 60% 50% 40% 30% 20% 10% 0% gmonkey getDOMHash bing1 bingmap1 livemsg1 bingmap2 facebook1 livemsg2 officelive1 16
Compressing the Production Stream • Frequency-based production renaming • Differential encoding: 26 and 57 => 2 and 3 • Chain rule: eliminate predictable productions • Tree-based prediction-by-partial-match 17
PPMC • Tree context used to build a predictor • Consider compressing – if (P) then X else X • Provides the next likely … child node given context C • Should be very compressible and child position p … • if (P) then ...abc... else ...abc... • Arithmetic coding: more P likely=shorter IDs X X • See paper for details 18
Production Compression (gzip = 1) 100% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% Production Compression with PPMC gmonkey getDOMHash bing1 bingmap1 livemsg1 bingmap2 0.6772 facebook1 livemsg2 officelive1 19
Compressing the Identifier Stream • Symbol tables instead of identifier stream: – Compress redundancy: offset into table – Global or local symbol tables – Use variable-length encoding • Other techniques: – Sort symbols by frequency – Rename local variables 20
Variable-length Encoding for Identifiers is global? is renamed local fits in 1 byte? 00… 11… 01… 10… 21
Variable-Length Identifier Encoding 100% 90% 80% 70% 60% parent 50% local 2byte 40% local 1byte 30% local builtin 20% 10% global 2byte 0% global 1byte gmonkey getDOMHash bing1 bingmap1 livemsg1 bingmap2 facebook1 livemsg2 officelive1 22
Identifiers (NoST = 1) 100% 80% 85% 90% 95% Symbol Tables: Effectiveness gmonkey getDOMHash Global ST bing1 bingmap1 livemsg1 VarEnc bingmap2 0.943 facebook1 89% livemsg2 officelive1 23
Compressing Literals • Symbol tables • Grouping literals by type • Pre-fixes and post-fixes • These techniques result in 5-10% savings compared to gzip 24
Average JSZap Compression: 10% 100% JSZap Compression (gzip = 1) 98% Productions, 96% 26% 94% 92% 90% 0.8792 88% 13% savings Identifiers, 86% 57% 84% 82% 80% gmonkey getDOMHash bing1 bingmap1 livemsg1 bingmap2 facebook1 livemsg2 officelive1 Literals, 17% 25
Summary and Conclusions • JSZap: AST-based compression for JavaScript • Propose a range of techniques for compressing – Productions – Identifiers – Literals • Preliminary results are encouraging: 10% savings over gzip • Future focus – Latency measurements – Browser integration 26
Security Well- (AdSafe) formedness Unblocking AST ? HTML representation parser Caching and Compression incremental with JSZap updates Questions? 27
Recommend
More recommend