Parallelizing the Web Browser Chris Jones, Rose Liu, Leo Meyerovich Krste Asanovic, and Rastislav Bodik ParLab UC Berkeley
The Transi*on to Handhelds Power wall: Previous genera*ons reused so4ware Mainframe of their ancestors. Mobiles will need parallel so4ware. Mini Log price WS PC Laptop Handset Ubiquitous Time Soon on mobile: 4‐cores x 2‐threads x 8‐SIMD = 64‐way parallelism 2
Why Parallelize a Browser? • Dominant applica1on pla2orm – easy deployment: apps downloaded, JS portable – produc*ve programming: scrip*ng, layout • … but not on handhelds – na*ve frameworks for: iPhone, Google Android – slow: for Slashdot, Laptop: 3s => iPhone: 21s • Parallel browser may need new architecture – ex: JavaScript relies on “gotos”, is too serial
Anatomy of a Browser Frontend page? decompress web servers lex parse + build DOM Scrip1ng plugin script (decode image, …) layout render Layout
Project Status 1. Developed work‐efficient algorithms work‐efficient : no more work than sequen*al algo. – layout: parallel‐map with a *ling op*miza*on – layout : break up tree traversal into five parallel ones – lexing: specula*on to break sequen*al dependencies 2. Reexamining the scrip1ng programming model – programmer produc4vity : from callbacks to actors – performance : adding structure to detect dependences
Frontend: Lexing page? decompress web servers lex parse + build DOM plugin script (decode image, …) layout render
Lexing, from 10,000 feet Goal : given lexical spec and input, find lexemes Σ – {‘>‘} STag ::= <[^>]*> ‘/’ Content ::= [^<]+ ETag ::= </[^>]*> Σ – {‘>‘} Σ – {‘/‘} ‘<’ Σ – {‘<‘} STag Σ – {‘<‘} < b > B e r k e l e y ! < / b > (label each character with its state)
Inherently Sequen*al? Σ – {‘>‘} STag ::= <[^>]*> ‘/’ Content ::= [^<]+ ETag ::= </[^>]*> Σ – {‘>‘} Σ – {‘/‘} ‘<’ Σ – {‘<‘} ? STag Σ – {‘<‘} < b > B e r k e l e y ! < / b > Processor 1 Processor 2 …
An observa*on In lexing, irrespec*ve of where DFA starts, it converges to a stable, recurring state Lexing: < b > B e r k e l e y ! < / b > “in ETag” start state “in Content” Parallel scans thus need not scan from all possible states, 9 just one, yielding a work‐efficient algorithm.
Our solu*on (1/2): Par**on • split input into blocks with k ‐character overlap • scan in parallel; start block from a tolerant state … … … … … k Processor 1 … Processor 2 … … … …
Our solu*on (2/2): Speculate • split input into blocks with k ‐character overlap • scan in parallel; start block from a tolerant state • check if blocks converge: expected in k ‐overlap • specula*on may fail; if so, block is rescanned … … … … … … … … … … …
Speedup: Flex vs Cell today’s page sizes : 5 cores are 4.5x faster than flex baseline : (sequen*al) flex on the CELL main CPU
Layout Solving (1/2) page? decompress web servers lex parse + build DOM plugin script (decode image, …) layout render
Rule Matching Goal: Match rules with nodes: <body> – a rule: p img { fontsize: 7px} <p> <p> – match tag path – path‐rule matching <img> <b> ok ok ok ok hello ok • end with the same node • and are a substring world selectors p img p img proper1es height=83% width=100px fontsize=7px float=le4
Paralleliza*on • 1000s nodes, 1000s rules <body> • Assign nodes to cores <p> <p> <img> <b> ok ok ok ok hello ok world selectors p img p img proper1es height=83% width=100px fontsize=7px float=le4
Tiling for Caches Problem: all the nodes + selectors might not fit in cache! <body> <p> <p> hello <img> <b> ok ok ok ok ok world selectors p img p img proper1es height=83% width=100px fontsize=7px float=le4
Speedup (Cilk++) Speedup vs. Fastest Sequen1al (Slashdot) 5 Redundancy opt. + *ling(Cilk) 4.5 4 Naïve + *ling (Cilk) Speedup 3.5 3 2.5 2 1.5 Redundancy opt. + *ling(seq.) 1 0.5 Naïve (Cilk) 0 Naïve (seq) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Cores 2 socket x 4 core x 2 thread (2.6 Ghz, 12x 1 GB)
Layout Solving (2/2) page? decompress web servers lex parse + build DOM plugin script (decode image, …) layout render
Problem: Layout a Page h=40 w=100, fs=12 x=0, y=0 w=100, fs=12 <body> fs, Δ, w Δ fs, Δ, w fs=50% w=100, fs=6 <p> <p> w=100, fs=12 x=0, y=0 Δ x=0, y=10 h=10 h=40 fs, Δ, w Δ fs, Δ, w fs,Δ,w w=100, fs=12 x=0, y=10 <img> hello <b> ok ok ok ok ok h=10 w=40, fs=6 w=50, float=le4 w=50 x=0, y=0 x=0, y=10 h=10 fs, Δ, w h=20 w=30, fs=12 x=50, y=10 h=10 world
It looks rather sequen*al.. h=40 w=100, fs=12 x=0, y=0 w=200, fs=12 <body> fs, Δ, w Δ Δ fs, Δ, w fs, Δ, w fs=50% w=100, fs=6 <p> <p> w=100, fs=12 w=100, fs=12 x=0, y=0 Δ x=0, y=10 x=0, y=10 h=10 h=40 fs, Δ, w Δ Δ fs, Δ, w fs, Δ,w w=100, fs=12 x=0, y=10 <img> hello <b> ok ok ok ok ok h=10 w=40, fs=6 w=40, fs=6 w=50, float=le4 w=50 x=0, y=0 x=0, y=0 x=0, y=10 h=10 h=10 fs, Δ, w h=20 w=30, fs=12 x=50, y=10 h=10 world
But not en*rely h=40 w=100, fs=12 x=0, y=0 w=200, fs=12 <body> fs, Δ, w Δ fs, Δ, w fs, Δ, w fs=50% w=100, fs=6 <p> <p> w=100, fs=12 x=0, y=0 Δ x=0, y=10 h=10 h=40 fs, Δ, w fs, Δ, w Δ fs, Δ, w fs, Δ,w w=100, fs=12 x=0, y=10 <img> hello <b> ok ok ok ok ok h=10 w=40, fs=6 w=50, float=le4 w=50 x=0, y=0 x=0, y=10 h=10 fs, Δ, w fs, Δ, w h=20 w=30, fs=12 w=30, fs=12 x=50, y=10 x=50, y=10 h=10 h=10 world
5 Phases: Each Exhibits Tree Parallelism w p =80, w m =40 w=100, fs=12 fs=12 <body> <body> fs=50% w p =40 w m =40 w p =80 fs=6 w m =30 <p> <p> <p> fs=12 <p> w p =30 w m =30 fs=12 ok ok ok ok hello <b> <img> ok ok ok ok ok hello <b> ok <img> float = le4 fs=6 fs=12 fs=12 w p =40 w p =10 w p =50 w m =40 w m =10 w m =50 world world fs=12 w p =30, w m =30 Phase 1: font size, temporary width Phase 2: preferred max & min width Phase 3: solved width Phase 4: height, rela1ve x/y posi1on Phase 5: absolute x/y posi1on
Recommend
More recommend