Hardware Modeling 2 Cache Analyses Peter Puschner slides credits: P. Puschner, R. Kirner, B. Huber VU 2.0 182.101 SS 2015
Recap: Caches in WCET Analysis Purpose : Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40 cycles for a miss on MPC755 What: Instructions, Data, BTB, TLB Design: Direct Mapped, Set/Fully Associative Replacement Policy: LRU, FIFO, PLRU, PRR More Characteristics : read-only / write through / write back, write (no) allocate, Multi-Level Caches (inclusive/exclusive), ... 2
Caches in WCET Analysis For software running on hardware with caches, computing the WCET by IPET alone (CFG + CCG) gets too complex Ignoring caches leads to unacceptable overestimations ð Decomposition of WCET analysis into 2+ phases 1. Categorization of memory access wrt. cache behavior (e.g., always hit, always miss, etc.); Low-Level Analysis uses cache categorization. 2. WCET computation: IPET with no or simplified cache model 3
Categories of Cache Behavior ah always hit each access to the cache is a hit (MUST analysis) am always each access to the cache is a miss miss (MAY analysis ➭ complement) ps(S) persistent for each entering of context S, first access is nc , but all other accesses are hits (PERSISTANCE analysis) nc not the access is not classified as one of the above classified categorizations 4
Direct Mapped Cache address word Line: valid bit (v), tag and data (k bytes) tag ld(m) bits ld(k) bits m lines Line 1 Line 2 w1 w2 wk v tag Line is selected ... by ld(m) address bits Line m 5
START tag, line, offset 0, 0, 0 DM-$ Analysis 0, 0, 1 Example 0, 1, 0 0,1,1 0,2,0 Compiled ¡from ¡e.g. ¡ x, y, z = a, b, 0 � 0,2,1 while (x > 0 && � 0,3,0 y > 0) � { 0,3,1 z += x-- + y-- 1,0,0 } � x,y = 0,0 � END 6
START DM-$ Analysis tag, line, offset 0, 0, 0 continue with 0, 0, 1 Example 2nd loop iteration 0, 1, 0 0, 1, 0 always 0, 1, 1 hit 0,1,1 0,2,0 Compiled ¡from ¡e.g. ¡ 0,2,0 x, y, z = a, b, 0 � 0, 2, 1 while (x > 0 && � 0,2,1 always hit 0,3,0 (2..n loop iteration) 0,3,0 y > 0) � { 0,3,1 z += x-- + y-- always miss 1,0,0 } � conflict with (0,0,x) x,y = 0,0 � END 7
Cache Classification (Hit/Miss) Goal: A mechanized analysis, which classifies each cache access in a certain context (e.g. call context) as either Ø Always hit: in all possible executions, this access to the cache will be a cache hit (the accessed cache block is guaranteed to be in the cache) Ø Always miss: in all possible executions, this access to the cache will be a cache miss (the accessed cache block is guaranteed NOT to be in the cache) Ø Not classified: The accessed cache block may or may not be in the cache 8
Automated Categorization of Memory Accesses à Based on Abstract Interpretation and fixed-point analysis of cache states in the CFG à Cache update function: models changes of the cache state for memory accesses à Join function: Combines states at control-flow joins à Concrete Semantics: Set of possible cache configurations (tags only, no data) at each program point à Abstract Semantics: Efficient approximation in an abstract, “more efficient” domain 9
Data-Flow Analysis (DFA) DFA analysis is based on the data-flow structure of the system behavior of interest (e.g. forward and backward propagation) • PRED(n) are the virtual predecessors of CFG node n regarding the data flow of interest (Cache Analysis: usually CFG predecessors) The data domain L of the analysis forms a lattice, on which the transfer function F n (): L → L models the semantics of the system behavior of interest. To merge two or more states, a join function ⊔ : L × L → L is used to compute the least upper bound 10
Data-Flow Analysis (2) Data-flow equations modeling the data- flow between nodes: … IN(n) IN(n) = ⊔ ( { OUT(j) | j ∈ PRED(n) } ) F n () node n OUT(n) = F n ( IN(n) ) OUT(n) 11
Data-Flow Analysis (3) Monotonicity requirements for solving the data-flow equation iteratively: • the transfer functions F n (s) as well as the join function s 1 ⊔ s 2 must be monotone to ensure termination of the analysis. Monotonicity: a function f: A à B is monotone, iff ∀ a,a ’ ∈ A. (a ⊆ A a ’ ) à ( f(a) ⊆ B f(a ’ ) ) 12
Data-Flow Analysis (4) Iterative Algorithm to find least fixpoint for data-flow equations: for i ← 1 to N do /* initialize node i: */ OUT(i) = ⊥ while ( sets are still changing ) do for i ← 1 to N do /* recompute sets at node i: */ IN(i) = ⊔ ( { OUT(j) | j ∈ PRED(n) } ) OUT(i) = Fn( IN(i) ) 13
Concrete & Abstract Semantics Concrete Cache Semantics: Model the semantics of the relevant aspects of the program (here: cache state & update). The concrete semantics collects the set of all possible cache states for each program point. Abstract Cache Semantics: Semantics in a different, usually finite domain, connected to the concrete semantics by an abstraction/concretization function. 14
N-way Set-Associative Cache address word w1 w2 wk v tag ... 2 tag ld(k) bits ld(m) bits Replacement Strategy Block (Line): valid bit (v), tag updates blocks in one set and data (k bytes) m sets Block 1,1 Block 1,2 ... Block 1,n Block 2,1 Block 2,2 ... Block 2,n ... ... ... 1 Set is selected by ld(m) address bits Block m,1 Block m,2 ... Block m,n 15 n ways
Fully-associative Cache (Associativity N) address word Associativity: N tag offset LRU, FIFO: Way 1 youngest Cache is Way 2 updated based on value of TAG. Replacement Line: valid bit (v), tag w1 w2 wk Policy v tag ... and data (k bytes) determines the update strategy used. LRU, FIFO: Way N oldest, evicted on miss 16
Concrete Cache Semantics (Fully Associative Cache) Cache Configuration: Mapping from cache line to tag S (data is irrelevant) Domain: For each program point, set of all possible cache states State at start node: Singleton set with empty cache, or set of all possible cache configurations Update: For a cache configuration C and cache reference S, the new cache configuration C’ after accessing S 17
Concrete LRU Update (Fully Associative Cache) Update Function for 4-way cache (1 line per way) with LRU a c b a access c c b HIT d d a e b a access e c b MISS d c 18
Abstract Cache Semantics for MUST / MAY Analysis Abstract Cache Configuration Compact representation of cache configuration set MUST: For each tag S, the maximum age MAY: For each tag S, the minimum age Join: MUST: For each tag S, the maximum age MAY: For each tag S, the minimum age Update (LRU) Accessed Tag: Youngest Set MUST: For other tags, increase age if may be aged MAY: For other tags, increase age if must be aged 19
Abstract Cache Representation ⊤ ¡= ¡ ∀ x, ¡x ¡≤ ¡N+1 ¡ MUST Analysis { a } a <= 1 { } b <= 3 or c <= 4 { b } d,e <= 5+ { c } ⊤ = ¡ ∀ x, ¡x ¡≥ 1 ¡ MAY Analysis { d,e } a >= 2 { } b >= 4 or c >= 5 { a } d,e >= 1 { b } 20
Abstract Cache Semantics (MUST Concretization) MUST Analysis { a } a <= 1 { } b <= 3 or c <= 4 { b } d,e <= 5+ { c } Concretization a a a a a a a a b b b b c c d e c d c e b b b b d c e c d e c c 21
Abstract Cache Semantics (MUST Join) MUST Join { a } { } a <= 1 a <= 2 { } { a } b <= 3 c <= 4 join c <= 4 d <= 4 { b } { } d,e <= 5+ b,e <= 5+ { c } { c,d } { } a <= 2 { a } b <= 5+ c <= 4 { } d,e <= 5+ { c } 22
Abstract Cache Update Function: (LRU Cache, MUST analysis) when accessing block c max-age’(c) = 1 max-age(d) ≥ max-age(c) à max-age’(d) = max-age(d) max-age(d) < max-age(c) à max-age’(d) = max-age(d) + 1 23
Abstract Cache Update Function: (LRU Cache, MUST analysis) when accessing block c max-age’(c) = 1 max-age(d) ≥ max-age(c) à max-age’(d) = max-age(d) 1. assume age(d) < age(c) à max-age(d) ≥ age(d)+1 2. assume age(d) > age(c) à age’(d) = age(d) max-age(d) < max-age(c) à max-agd’(d) = max-age(d) + 1 1. If age(d) < age(c), age’(d) = age(d) + 1 ≤ max-age(d) + 1 2. If age(d) > age(c), age’(d) = age(d) ≤ max-age(d) + 1 24
Cache Hit/Miss Classification using MUST analysis If at some program point, tag S must be in the cache, i.e., its maximum age is less than or equal to the associativity, then The cache access is classified as ALWAYS HIT If at some program point, it is not the case that tag S may be in the cache, i.e., its minimum age is greater than the associativity of the cache, then The cache access is classified as ALWAYS MISS Otherwise The cache access is NOT CLASSIFIED 25
Abstract Cache Semantics (MAY Concretization) MAY Analysis { d,e } a >= 2 { a } b >= 4 or c >= 5 { } d,e >= 1 { b } Concretization d e d e a a e d e d a a b b b b
Abstract Cache Semantics (MAY Join) MAY Analysis { d,e } { } a >= 4 a >= 2 b >= 5 { a } { e } b >= 4 join c >= 5 c >= 5 { } { } d >= 5 d,e >= 1 e >= 2 { b } { a } { d, e } a >= 2 b >= 4 { a } c >= 5 { } d >= 1 e >= 1 { b }
Recommend
More recommend