CS137: Today Electronic Design Automation • Problem • Parallelism • Primary Sources • Cellular Automata – Wrighton&DeHon FPGA2003 • Idea Day 8: January 27, 2006 – Wrighton MS Thesis • Details 2003 Cellular Placement – Avoid Local Minima – Update locations • Results • Directions 1 2 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Placement Parallelism • Problem : Pick locations for all building • What parallelism exists in placement? blocks – Evaluate costs of prospective moves – minimizing energy, delay, area • One set to many perspective locations – really: • Many moves each to single location • minimize wire length – Perform moves • minimize channel density – surrogates: • Minimizing squared wire length • Minimize bounding box 3 4 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Cellular Automata CA Model • Basic idea: regular array of identical • On each cycle: cells with nearest-neighbor – Each cell exchanges values with neighbors communication – Updates state/value based on own state and that of neighbors – E.g. Conway’s LIFE 5 6 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 1
System Architecture Taxonomy Cellular Automata • Physical Advantage: – No long wires • Area linear in number of nodes • Minimum delay � small cycle time • Good scaling properties (Subject to continuing refinement and embellishment) 7 8 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon CA Placement Mapping • Can we perform placement in a CA? • Each cell is a physical placement location • State is a logical node assigned to the cell • Assume: – Cell knows own location – State knows location of connected nodes 9 10 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Costs Moves • Assume: • Two adjacent cells can exchange graph nodes – Cell knows own location – State knows location of connected nodes • Cell computes: its cost at that location ( ) ∑ − 2 L ( e . src ) L ( e . snk ) ∈ e g . edges 11 12 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 2
Moves Move Costs • Evaluate goodness of proposed swap • Only really need to evaluate delta cost – Each cell considers impact of its graph node being in the other cell • (src.x-sink.x) 2 • Moving sink • d/dx=-2 (src.x-sink.x) • Delta move cost is linear distance – Keep if swap reduces cost 13 14 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Movement Parallel Swaps • Alternate pairings with N,S,E,W neighbor � move any directions • Pair up and perform N/2 swaps in parallel 15 16 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Basic Idea Problems/Details • Greedy swaps � local minima? • Pair up PEs • Compute impact of swaps in parallel • How update location of neighbors? • Perform swaps in parallel – …they are moving, too • Repeat until converge 17 18 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 3
Avoid Greedy Swap? • Insert randomness in swaps • � Simulated Annealing • Shake up system to get out of local minima • Swap if – Randomly decide to swap – OR beneficial to swap • Change swap thresholds over time 19 20 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Impact of Randomness Range Limiting Eurgo, Hauck, & Sharma DAC 2005 21 22 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Local Swap Random Walk Local Swaps Only • Decreasing temperature restricts effective radius of walk • Assume there’s an ideal location • Each node takes a biased Random Walk away from minimum cost location • Gives node a distribution function around the minimum cost location • If wander into a better “minimum cost” home, then wanders around new centerpoint • Decreasing temperature restricts effective radius of walk 23 24 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 4
How update locations? Simple Solution: Ring • Broadcast? • Drop value in ring • Pipelined Ring? • Shift around • Send to neighbors? entire array – Routing network? • Everyone • Tree? listens for • For whom? updates – Everyone? Only things moved? Only things moved a lot? 25 26 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Simple Solution: Ring Simple Solution: Ring • Weakness? • Linear update bad – Serial • Idea: allow staleness – N cycles to complete – Things move slowly – N/2 swaps in O(1) – Estimate of position not that bad… – Then O(N) to update? – …and continued operation will correct… 27 28 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Algorithm Algorithm Update Locations 29 30 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 5
Quality vs. Parameters Algorithm Try Moves 31 32 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Iso-Quality FPGA Implementation Pick point on Iso-Quality • Virtex E (180nm) Curve that minimizes time • 10ns cycle (100MHz) • 150 cycles for 4-phase swap – (~40 cycles/swap) • 400 LUTs / Placement Engine • Comparing – 2.2GHz Intel Xeon (L2 512KB) 33 34 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Results Tuning Quality 35 36 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 6
Scaling Scaling • Processor cycles O(N 4/3 ) – VPR • Systolic cycles – O(N 1/2 ) – assume geometric refinement; O(N 1/2 ) update – O(N 5/6 ) – mesh sort, same number of swaps as VPR (N 4/3 / N 1/2 ) Also includes technology scaling 37 38 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Update Scheme: Variations Tree • Update Schemes • Build Reduce Tree (H-Tree) • Route to route in O(N 1/2 ) time • Cost Functions • Larger bins than PEs • Route from root to leaves in O(N 1/2 ) times • Pipeline • Same bandwidth as Ring (1/cycle) • But less staleness (only O(N 1/2 )) 39 40 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Reducing Broadcast (Idea 1) Reducing Broadcast (Idea 2) • Don’t update things that haven’t moved • Update locally (much) • Don’t need to know if someone far away – …or things that move and move back moved by 1 square before broadcast • …but need to know if near neighbor did • Keep track of staleness • Multigrid/multiscale scheme – How far moved from last broadcast – Only alert nodes in same subtree • Give priority to stalest data – When change subtrees at a level, alert all • Max staleness wins at each tree stage nodes underneath – Break ties with randomness 41 42 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 7
Update Scheme: Mesh Route • Can Route a permutation in O(N 1/2 ) time on a mesh Cost Functions • Build mesh switching • Make O(N) swaps • Then take O(N 1/2 ) time moving/updating • Becomes full simulated annealing – i.e. not just local swaps 43 44 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Timing Cost Functions • Linear Update: • Bounding Box � 2 phase update – Topological ordering of netlist – Phase 1: alert source to location of all – Use tree to distribute updates sinks – Send updates in netlist order – Phase 2: source communicates bbox – � get delay in one pass extents to all sinks • Mesh: – Compute directly with dataflow-style spreading activation • Wait for all inputs; then send output 45 46 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon Node Bins • Keep more than one graph node per PE Bins • Local swap of one node from each PE node set each step – One with largest benefit? – Randomly select based on cost/benefit? • Like rejectionnless annealing 47 48 CALTECH CS137 Winter2006 -- DeHon CALTECH CS137 Winter2006 -- DeHon 8
Admin • Parallel Prefix familiarity? • Due today: literature review • There is class on Monday 49 CALTECH CS137 Winter2006 -- DeHon 9
Recommend
More recommend