Check Suite: Full-Stack Automated MCM Analysis High-Level Languages (HLL) So far, tools have found bugs in: TriCheck • [Trippel et al. ASPLOS 2017] Widely-used gem5 Research simulator Compiler OS • Cache coherence paper ( TSO-CC ) • COATCheck IBM XL C++ compiler (fixed in v13.1.5) [Lustig et al. ASPLOS 2016] • In-design commercial processors Architecture (ISA) • RISC-V draft ISA specification PipeCheck & CCICheck • Compiler mapping proofs [Lustig et al. MICRO 2014] • C11 memory model [Manerkar et al. MICRO 2015] Microarchitecture • Open-source processor RTL RTLCheck [Manerkar et al. MICRO 2017] Processor RTL ▪ Suite of tools at various levels of computing stack ▪ Automated Full-Stack MCM checking across litmus test suites
Modelling Microarchitecture: Going below the ISA ▪ Hardware enforces consistency model using smaller localized orderings • In- order fetch/decode/execute… • Orderings enforced by memory hierarchy • …and many more Fetch Fetch Lds. Dec. Dec. SB SB Exec. Exec. L1 L1 Mem. Mem. Memory Hierarchy WB WB L2
Modelling Microarchitecture: Going below the ISA ▪ Hardware enforces consistency model using smaller localized orderings • In- order fetch/decode/execute… • Orderings enforced by memory hierarchy • …and many more Fetch Fetch Lds. Dec. Dec. SB SB Exec. Exec. L1 L1 Mem. Mem. Pipeline stages Memory Hierarchy WB WB L2 may be FIFO to ensure in-order execution
Modelling Microarchitecture: Going below the ISA ▪ Hardware enforces consistency model using smaller localized orderings • In- order fetch/decode/execute… • Orderings enforced by memory hierarchy • …and many more Do individual orderings correctly work together Fetch Fetch Lds. Dec. Dec. SB SB to satisfy consistency model? Exec. Exec. L1 L1 Mem. Mem. Pipeline stages Memory Hierarchy WB WB L2 may be FIFO to ensure in-order execution
Microarchitectural Consistency Checking Mic icroarchit itecture in in µspec ec DS DSL Axiom “ Decode_is_FIFO": ... EdgeExists ((i1, Decode), (i2, Decode)) => AddEdge ((i1, Execute), (i2, Execute)). Axiom "PO_Fetch": ... SameCore i1 i2 /\ ProgramOrder i1 i2 => AddEdge ((i1, Fetch), (i2, Fetch)). Litm Litmus Tes est
Microarchitectural Consistency Checking Mic icroarchit itecture in µspec in ec DS DSL Axiom “ Decode_is_FIFO": ... EdgeExists ((i1, Decode), (i2, Decode)) => AddEdge ((i1, Execute), (i2, Execute)). Axiom "PO_Fetch": ... SameCore i1 i2 /\ ProgramOrder i1 i2 => AddEdge ((i1, Fetch), (i2, Fetch)). Each axiom specifies an ordering that µarch should respect Litm Litmus Tes est
Microarchitectural Consistency Checking Mic icroarchit itecture in in µspec ec DS DSL Axiom “ Decode_is_FIFO": ... EdgeExists ((i1, Decode), (i2, Decode)) => AddEdge ((i1, Execute), (i2, Execute)). Axiom "PO_Fetch": ... SameCore i1 i2 /\ ProgramOrder i1 i2 => AddEdge ((i1, Fetch), (i2, Fetch)). Litm Litmus Tes est
Microarchitectural Consistency Checking Mic icroarchit itecture in µspec in ec DS DSL Axiom “ Decode_is_FIFO": ... EdgeExists ((i1, Decode), (i2, Decode)) => AddEdge ((i1, Execute), (i2, Execute)). Axiom "PO_Fetch": ... SameCore i1 i2 /\ ProgramOrder i1 i2 => AddEdge ((i1, Fetch), (i2, Fetch)). Litm Litmus Tes est Mic icroarchit itectural happens-before (µ (µhb hb) gr graphs
Microarchitectural Consistency Checking Mic icroarchit itecture in µspec in ec DS DSL Axiom “ Decode_is_FIFO": ... EdgeExists ((i1, Decode), (i2, Decode)) => AddEdge ((i1, Execute), (i2, Execute)). Axiom "PO_Fetch": ... SameCore i1 i2 /\ ProgramOrder i1 i2 => AddEdge ((i1, Fetch), (i2, Fetch)). Litm Litmus Tes est Microarch. verification checks that combination of axioms satisfies MCM Mic icroarchit itectural happens-before (µ (µhb hb) gr graphs
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014] Cor Core 0 Cor Core 1 Litmus Test mp
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014] Cor Core 0 Cor Core 1 (i1) Fetch Dec. Exec. Mem. WB SB Mem Litmus Test mp Hier.
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014] Cor Core 0 Cor Core 1 (i1) Fetch Dec. Exec. Mem. WB SB Mem Litmus Test mp Hier.
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014] Cor Core 0 Core 1 Cor (i1) (i2) Fetch Dec. Fetch Exec. Dec. Mem. Exec. WB Mem. WB SB SB Mem Litmus Test mp Hier. Mem Hier.
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014] Cor Core 0 Core 1 Cor (i1) (i2) Fetch Dec. Fetch Exec. Dec. Mem. Exec. WB Mem. WB SB SB Mem Litmus Test mp Hier. Mem Hier.
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014] Cor Core 0 Core 1 Cor (i3) (i4) (i1) (i2) Fetch Dec. Fetch Fetch Exec. Dec. Dec. Fetch Mem. Exec. Exec. Dec. WB Mem. Mem. Exec. WB SB WB Mem. SB WB Mem Litmus Test mp Hier. Mem Hier.
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014] Cor Core 0 Core 1 Cor (i3) (i4) (i1) (i2) Fetch Dec. Fetch Fetch Exec. Dec. Dec. Fetch Mem. Exec. Exec. Dec. WB Mem. Mem. Exec. WB SB WB Mem. SB WB Mem Litmus Test mp Hier. Mem Hier.
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014] Cor Core 0 Core 1 Cor (i3) (i4) (i1) (i2) Fetch Dec. Fetch Fetch Exec. Dec. Dec. Fetch Mem. Exec. Exec. Dec. WB Mem. Mem. Exec. WB SB WB Mem. SB WB Mem Litmus Test mp Hier. Mem Hier.
PipeCheck: Microarchitectural Correctness ▪ Cycle in µhb graph => event has to happen before itself (impossible) ▪ Cyclic graph → unobservable on µarch ▪ Acyclic graph → observable on µarch ▪ Exhaustively enumerate and check all possible execs of litmus test on µarch • Implemented using fast SMT solvers • Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014] Litmus Test mp
PipeCheck: Microarchitectural Correctness ▪ Cycle in µhb graph => event has to happen before itself (impossible) ▪ Cyclic graph → unobservable on µarch ▪ Acyclic graph → observable on µarch ▪ Exhaustively enumerate and check all possible execs of litmus test on µarch • Implemented using fast SMT solvers • Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014] Litmus Test mp ISA-Level Observable Not Observable Outcome (≥ 1 Graph Acyclic) (All Graphs Cyclic) OK (stricter Allowed OK than necessary) Forbidden Consistency violation! OK
PipeCheck: Microarchitectural Correctness ▪ Cycle in µhb graph => event has to happen before itself (impossible) ▪ Cyclic graph → unobservable on µarch ▪ Acyclic graph → observable on µarch ▪ Exhaustively enumerate and check all possible execs of litmus test on µarch • Implemented using fast SMT solvers • Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014] Litmus Test mp ISA-Level Observable Not Observable Outcome (≥ 1 Graph Acyclic) (All Graphs Cyclic) OK (stricter Allowed OK than necessary) Forbidden Consistency violation! OK
PipeCheck: Microarchitectural Correctness ▪ Cycle in µhb graph => event has to happen before itself (impossible) ▪ Cyclic graph → unobservable on µarch ▪ Acyclic graph → observable on µarch ▪ Exhaustively enumerate and check all possible execs of litmus test on µarch • Implemented using fast SMT solvers • Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014] Litmus Test mp ISA-Level Observable Not Observable Outcome (≥ 1 Graph Acyclic) (All Graphs Cyclic) OK (stricter Allowed OK than necessary) Forbidden Consistency violation! OK
PipeCheck: Microarchitectural Correctness ▪ Cycle in µhb graph => event has to happen before itself (impossible) ▪ Cyclic graph → unobservable on µarch ▪ Acyclic graph → observable on µarch ▪ Exhaustively enumerate and check all possible execs of litmus test on µarch • Implemented using fast SMT solvers • Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014] Litmus Test mp ISA-Level Observable Not Observable Outcome (≥ 1 Graph Acyclic) (All Graphs Cyclic) OK (stricter Allowed OK than necessary) Forbidden Consistency violation! OK
PipeCheck: Microarchitectural Correctness ▪ Cycle in µhb graph => event has to happen before itself (impossible) ▪ Cyclic graph → unobservable on µarch ▪ Acyclic graph → observable on µarch Abstracted memory hierarchy prevents ▪ Exhaustively enumerate and check all possible execs of litmus test on µarch • Implemented using fast SMT solvers verification of complex coherence issues! • Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014] Litmus Test mp ISA-Level Observable Not Observable Outcome (≥ 1 Graph Acyclic) (All Graphs Cyclic) OK (stricter Allowed OK than necessary) Forbidden Consistency violation! OK
CCICheck: Coherence vs Consistency ▪ Memory hierarchy is a collection of caches High-Level Languages (HLL) • Coherence protocols ensure that all caches agree on the value of any variable ▪ CCICheck [Manerkar et al. MICRO 2015] shows that Compiler OS consistency verification often cannot simply treat memory hierarchy abstractly Architecture (ISA) • No Nomin inated for or Best Pap aper at t MIC ICRO 20 2015 15 Microarchitecture Fetch Fetch Lds. Dec. Dec. SB SB Exec. Exec. Processor RTL Processor RTL L1 L1 Mem. Mem. Memory WB WB Hierarchy L2
CCICheck: Coherence vs Consistency ▪ Memory hierarchy is a collection of caches High-Level Languages (HLL) • Coherence protocols ensure that all caches agree on the value of any variable ▪ CCICheck [Manerkar et al. MICRO 2015] shows that Compiler OS consistency verification often cannot simply treat memory hierarchy abstractly Architecture (ISA) • No Nomin inated for or Best Pap aper at t MIC ICRO 20 2015 15 Microarchitecture Fetch Fetch Lds. Dec. Dec. SB SB Exec. Exec. Processor RTL Processor RTL L1 L1 Mem. Mem. WB WB L2 Coh Coherence Protocol (S (SWMR, , DVI VI, etc.)
Coherence Protocol Example ▪ If P1 updates the value of x to 200, the stale value of x in other processors must be invalidated ▪ If P3 wants to subsequently read/write x, it must request the new value ▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant P1 P2 P3 Processors x = 100 x = 100 x = 100 Caches
Coherence Protocol Example ▪ If P1 updates the value of x to 200, the stale value of x in other processors must be invalidated ▪ If P3 wants to subsequently read/write x, it must request the new value ▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant P1 P2 P3 Processors St x = 200 x = 100 x = 100 x = 100 Caches
Coherence Protocol Example ▪ If P1 updates the value of x to 200, the stale value of x in other processors must be invalidated ▪ If P3 wants to subsequently read/write x, it must request the new value ▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant P1 P2 P3 Processors St x = 200 x = 100 x = 100 x = 100 x = 100 x = 100 Caches Invalidations
Coherence Protocol Example ▪ If P1 updates the value of x to 200, the stale value of x in other processors must be invalidated ▪ If P3 wants to subsequently read/write x, it must request the new value ▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant P1 P2 P3 Processors x = 100 x = 200 x = 100 x = 100 x = 100 x = 100 Caches
Coherence Protocol Example ▪ If P1 updates the value of x to 200, the stale value of x in other processors must be invalidated ▪ If P3 wants to subsequently read/write x, it must request the new value ▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant P1 P2 P3 Processors Ld x x = 100 x = 200 x = 100 x = 100 x = 100 x = 100 Caches Request Data
Coherence Protocol Example ▪ If P1 updates the value of x to 200, the stale value of x in other processors must be invalidated ▪ If P3 wants to subsequently read/write x, it must request the new value ▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant P1 P2 P3 Processors Ld x x = 200 x = 100 x = 100 x = 100 x = 200 x = 100 x = 100 Caches Data Response
Motivating Example – “Peekaboo” [Sorin et al. Primer 2011] ▪ Three optimizations: correct individually, but not in combination
Motivating Example – “Peekaboo” [Sorin et al. Primer 2011] ▪ Three optimizations: correct individually, but not in combination 1. Prefetching
Motivating Example – “Peekaboo” [Sorin et al. Primer 2011] ▪ Three optimizations: correct individually, but not in combination 1. Prefetching 2. Invalidation before use • Invalidation can arrive before data • Acknowledge Inv early rather than wait for data to arrive • But repeated inv before use → livelock [Kubiatowicz et al. ASPLOS 1992]
Motivating Example – “Peekaboo” [Sorin et al. Primer 2011] ▪ Three optimizations: correct individually, but not in combination 1. Prefetching 2. Invalidation before use • Invalidation can arrive before data • Acknowledge Inv early rather than wait for data to arrive • But repeated inv before use → livelock [Kubiatowicz et al. ASPLOS 1992] 3. 3. Liv ivelock avoid idance: allow destination core to perform one operation on data when it arrives, even if if alr lready in invalid lidated [Sorin et al. Primer 2011] • Does not break coherence • Sometimes in intentio ionall lly returns stale data
Motivating Example – “Peekaboo” ▪ Consider mp with the livelock-avoidance mechanism: Optimizations: 1. Prefetching 2. Invalidation-before-use 3. Livelock avoidance Cor Core 0 Cor Core 1 x: Shared x: Invalid y: Modified y: Invalid r1 ← [y] [x] ← 1 r2 ← [x] [y] ← 1
Motivating Example – “Peekaboo” ▪ Consider mp with the livelock-avoidance mechanism: Optimizations: 1. Prefetching 2. Invalidation-before-use 3. Livelock avoidance Prefetch x Core 0 Cor Core 1 Cor x: Shared x: Invalid y: Modified y: Invalid r1 ← [y] [x] ← 1 r2 ← [x] [y] ← 1
Motivating Example – “Peekaboo” ▪ Consider mp with the livelock-avoidance mechanism: Optimizations: 1. Prefetching 2. Invalidation-before-use 3. Livelock avoidance Prefetch x Core 0 Cor Core 1 Cor Data (x = 0) x: Shared x: Invalid y: Modified y: Invalid r1 ← [y] [x] ← 1 r2 ← [x] [y] ← 1
Motivating Example – “Peekaboo” ▪ Consider mp with the livelock-avoidance mechanism: Optimizations: 1. Prefetching 2. Invalidation-before-use 3. Livelock avoidance Prefetch x Core 0 Cor Core 1 Cor Data (x = 0) x: Shared x: Invalid y: Modified y: Invalid Inv r1 ← [y] [x] ← 1 r2 ← [x] [y] ← 1
Motivating Example – “Peekaboo” ▪ Consider mp with the livelock-avoidance mechanism: Optimizations: 1. Prefetching 2. Invalidation-before-use 3. Livelock avoidance Prefetch x Core 0 Cor Core 1 Cor Data (x = 0) x: Shared x: Invalid y: Modified y: Invalid Inv Inv-Ack r1 ← [y] [x] ← 1 r2 ← [x] [y] ← 1
Motivating Example – “Peekaboo” ▪ Consider mp with the livelock-avoidance mechanism: Optimizations: 1. Prefetching 2. Invalidation-before-use 3. Livelock avoidance Prefetch x Core 0 Cor Core 1 Cor Data (x = 0) x: Modified x: Invalid y: Modified y: Invalid Inv [x] ← 1 Inv-Ack r1 ← [y] [y] ← 1 r2 ← [x]
Motivating Example – “Peekaboo” ▪ Consider mp with the livelock-avoidance mechanism: Optimizations: 1. Prefetching 2. Invalidation-before-use 3. Livelock avoidance Prefetch x Core 0 Cor Core 1 Cor Data (x = 0) x: Modified x: Invalid y: Modified y: Invalid Inv Inv-Ack r1 ← [y] [x] ← 1 r2 ← [x] [y] ← 1
Motivating Example – “Peekaboo” ▪ Consider mp with the livelock-avoidance mechanism: Optimizations: 1. Prefetching 2. Invalidation-before-use 3. Livelock avoidance Prefetch x Cor Core 0 Cor Core 1 Data (x = 0) x: Modified x: Invalid y: Modified y: Invalid Inv Inv-Ack r1 ← [y] [x] ← 1 Request y r2 ← [x] [y] ← 1
Motivating Example – “Peekaboo” ▪ Consider mp with the livelock-avoidance mechanism: Optimizations: 1. Prefetching 2. Invalidation-before-use 3. Livelock avoidance Prefetch x Cor Core 0 Cor Core 1 Data (x = 0) x: Modified x: Invalid y: Shared y: Shared Inv Inv-Ack r1 r1 = 1 [x] ← 1 Request y r2 ← [x] [y] ← 1 Data (y = 1)
Motivating Example – “Peekaboo” ▪ Consider mp with the livelock-avoidance mechanism: Optimizations: 1. Prefetching 2. Invalidation-before-use 3. Livelock avoidance Prefetch x Cor Core 0 Cor Core 1 Data (x = 0) x: Modified x: Invalid y: Shared y: Shared Inv Inv-Ack r1 r1 = 1 [x] ← 1 Request y r2 ← [x] [y] ← 1 Data (y = 1)
Motivating Example – “Peekaboo” ▪ Consider mp with the livelock-avoidance mechanism: Optimizations: 1. Prefetching 2. Invalidation-before-use 3. Livelock avoidance Prefetch x Cor Core 0 Core 1 Cor Data (x = 0) x: Modified x: Invalid y: Shared y: Shared Inv Inv-Ack r1 r1 = 1 [x] ← 1 Request y r2 r2 = 0 [y] ← 1 Data (y = 1)
The Coherence-Consistency Interface (CCI) ▪ CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol + Expected Coherence SWMR, DVI, No Stale Data = Consistency
The Coherence-Consistency Interface (CCI) ▪ CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol + Expected Coherence SWMR, DVI, No Stale Data = Consistency
The Coherence-Consistency Interface (CCI) ▪ CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol + Expected Coherence SWMR, DVI, No Stale Data = Consistency
The Coherence-Consistency Interface (CCI) ▪ CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol + Expected Coherence SWMR, DVI, No Stale Data = Consistency
The Coherence-Consistency Interface (CCI) ▪ CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol + Expected Coherence SWMR, DVI, No Stale Data = Consistency
The Coherence-Consistency Interface (CCI) ▪ CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol + Expected Coherence SWMR, DVI, No Livelock = Consistency
The Coherence-Consistency Interface (CCI) ▪ CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol + Expected Coherence SWMR, DVI, No Livelock = Consistency
The Coherence-Consistency Interface (CCI) ▪ CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol + Expected Coherence SWMR, DVI, No Livelock = CCI Mismatch Consistency Violation!
ViCL: Value in Cache Lifetime ▪ Need a way to model cache occupancy and coherence events for: • Coherence protocol optimizations (eg: Peekaboo) • Partial incoherence and lazy coherence (GPUs, etc) ▪ A ViCL is a 4-tuple: (cache_id, address, data_value, , generation_id) ▪ cache_id and generation_id uniquely identify each cache line ▪ A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address
ViCLs in µhb Graphs ▪ ViCLs start at a ViC iCL Create event and end at a ViC iCL Exp xpire event • Correspond to nodes in µhb graphs • Axioms over these nodes and edges enforce coherence and data movement orderings ▪ Use pipeline model from PipeCheck, but add ViCL nodes and edges Litmus Test co-mp
ViCLs in µhb Graphs ▪ ViCLs start at a ViC iCL Create event and end at a ViC iCL Exp xpire event • Correspond to nodes in µhb graphs • Axioms over these nodes and edges enforce coherence and data movement orderings ▪ Use pipeline model from PipeCheck, but add ViCL nodes and edges Litmus Test co-mp
ViCLs in µhb Graphs ▪ ViCLs start at a ViC iCL Create event and end at a ViC iCL Exp xpire event • Correspond to nodes in µhb graphs • Axioms over these nodes and edges enforce coherence and data movement orderings ▪ Use pipeline model from PipeCheck, but add ViCL nodes and edges Litmus Test co-mp
ViCLs in µhb Graphs ▪ ViCLs start at a ViC iCL Create event and end at a ViC iCL Exp xpire event • Correspond to nodes in µhb graphs • Axioms over these nodes and edges enforce coherence and data movement orderings ▪ Use pipeline model from PipeCheck, but add ViCL nodes and edges Litmus Test co-mp
ViCLs in µhb Graphs ▪ ViCLs start at a ViC iCL Create event and end at a ViC iCL Exp xpire event • Correspond to nodes in µhb graphs • Axioms over these nodes and edges enforce coherence and data movement orderings ▪ Use pipeline model from PipeCheck, but add ViCL nodes and edges Litmus Test co-mp
µhb Graph for the Peekaboo Problem ▪ Additional nodes represent ViCL requests and invalidations ▪ Solu lution: Invalidated data only usable if accessing load/store is oldest in program order at time of request [Sorin et al. Primer 2011] ▪ TSO-CC protocol [Elver and Nagarajan HPCA 2014] was vulnerable to variant of Peekaboo! • Now fixed
µhb Graph for the Peekaboo Problem ▪ Additional nodes represent ViCL requests and invalidations ▪ Solu lution: Invalidated data only usable if accessing load/store is oldest in program order at time of request [Sorin et al. Primer 2011] ▪ TSO-CC protocol [Elver and Nagarajan HPCA 2014] was vulnerable to variant of Peekaboo! • Now fixed
µhb Graph for the Peekaboo Problem ▪ Additional nodes represent ViCL requests and invalidations ▪ Solu lution: Invalidated data only usable if accessing load/store is oldest in program order at time of request [Sorin et al. Primer 2011] ▪ TSO-CC protocol [Elver and Nagarajan HPCA 2014] was vulnerable to variant of Peekaboo! • Now fixed
µhb Graph for the Peekaboo Problem ▪ Additional nodes represent ViCL requests and invalidations ▪ Solu lution: Invalidated data only usable if accessing load/store is oldest in program order at time of request [Sorin et al. Primer 2011] ▪ TSO-CC protocol [Elver and Nagarajan HPCA 2014] was vulnerable to variant of Peekaboo! • Now fixed
µhb Graph for the Peekaboo Problem ▪ Additional nodes represent ViCL requests and invalidations ▪ Solu lution: Invalidated data only usable if accessing load/store is oldest in program order at time of request [Sorin et al. Primer 2011] ▪ TSO-CC protocol [Elver and Nagarajan HPCA 2014] was vulnerable to variant of Peekaboo! • Now fixed
CCICheck Takeaways ▪ Coherence & consistency often closely coupled in implementations ▪ In such cases, coherence & consistency cannot be verified separately ▪ CCICheck: CCI-aware microarchitectural MCM checking • Uses ViCL (Value in Cache Lifetime) abstraction ▪ Discovered bug in TSO-CC lazy coherence protocol
ISA-level MCMs in the Hardware-Software Stack High-Level Languages (HLLs) New ISA-level MCM Hardware
ISA-level MCMs in the Hardware-Software Stack High-Level Languages (HLLs) New ISA-level MCM Which orderings must be guaranteed by hardware? Hardware
ISA-level MCMs in the Hardware-Software Stack High-Level Languages (HLLs) Which orderings does the compiler need to enforce? New ISA-level MCM Which orderings must be guaranteed by hardware? Hardware
ISA-level MCMs in the Hardware-Software Stack High-Level Languages (HLLs) Which orderings does the compiler need to enforce? TriCheck checks that HLL, compiler, ISA, and New ISA-level MCM hardware align on MCM requirements Which orderings must be guaranteed by hardware? Hardware
TriCheck: Layers of the Stack are Intertwined ▪ ISA-level MCMs should allow microarchitectural High-Level Languages (HLL) optimizations but also be compatible with HLLs ▪ TriCheck [Trippel et al. ASPLOS 2017] enables Compiler OS holistic analysis of HLL memory model, ISA-level MCM, compiler mappings, and microarchitectures Architecture (ISA) • Mapping: translation of HLL synchronization primitives to Microarchitecture one or more assembly language instructions ▪ Also useful for checking HLL compiler mappings to Processor RTL Processor RTL ISA-level MCMs ▪ Selected as one of 12 “ Top Pic icks of f Comp. Arc rch. Conferences ” for 2017
TriCheck: Comparing HLL to Microarchitecture HLL HLL to ISA µspec HLL Litmus Model Compiler Microarch. Test Variants e.g. C11 Mapping Model Four Primary Inputs
TriCheck: Comparing HLL to Microarchitecture HLL HLL to ISA µspec HLL Litmus Model Compiler Microarch. Test Variants e.g. C11 Mapping Model Examine all C11 memory_order combinations ( release , acquire , relaxed , seq_cst ) for HLL litmus tests
TriCheck: Comparing HLL to Microarchitecture HLL HLL to ISA µspec ISA-level HLL Litmus Model Compiler Microarch. litmus tests Test Variants e.g. C11 Mapping Model Translate HLL Litmus Tests to ISA-level litmus tests
TriCheck: Comparing HLL to Microarchitecture HLL HLL to ISA µspec ISA-level HLL Litmus Model Compiler Microarch. litmus tests Test Variants e.g. C11 Mapping Model Use Herd to Herd check HLL [Alglave et al. TOPLAS 2014] outcomes HLL Outcome Forbidden/Allowed?
TriCheck: Comparing HLL to Microarchitecture HLL HLL to ISA µspec ISA-level HLL Litmus Model Compiler Microarch. litmus tests Test Variants e.g. C11 Mapping Model Use µhb analysis to Herd µhb Analysis check microarch. [Alglave et al. with Check TOPLAS 2014] outcomes HLL Outcome Microarch. Outcome Forbidden/Allowed? Observable/Unobservable?
TriCheck: Comparing HLL to Microarchitecture HLL HLL to ISA µspec ISA-level HLL Litmus Model Compiler Microarch. litmus tests Test Variants e.g. C11 Mapping Model Herd µhb Analysis [Alglave et al. with Check TOPLAS 2014] Compare HLL and microarch. outcomes ? HLL Outcome Microarch. Outcome Forbidden/Allowed? Observable/Unobservable?
TriCheck: Comparing HLL to Microarchitecture HLL HLL to ISA µspec ISA-level HLL Litmus Model Compiler Microarch. litmus tests Test Variants e.g. C11 Mapping Model Herd µhb Analysis [Alglave et al. with Check TOPLAS 2014] Compare HLL and microarch. outcomes ? HLL Outcome Microarch. Outcome Forbidden Forbidden/Allowed? Observable Observable/Unobservable?
TriCheck: Comparing HLL to Microarchitecture HLL HLL to ISA µspec ISA-level HLL Litmus Model Compiler Microarch. litmus tests Test Variants e.g. C11 Mapping Model Herd µhb Analysis [Alglave et al. with Check TOPLAS 2014] Compare HLL and microarch. outcomes HLL Outcome Microarch. Outcome BUG! Forbidden Forbidden/Allowed? Observable Observable/Unobservable?
TriCheck: Comparing HLL to Microarchitecture HLL HLL to ISA µspec ISA-level HLL Litmus Model Compiler Microarch. litmus tests Test Variants e.g. C11 Mapping Model If bugs found, iterate Herd by changing the µhb Analysis [Alglave et al. with Check inputs and re-run TOPLAS 2014] HLL Outcome Microarch. Outcome BUG! Forbidden Forbidden/Allowed? Observable Observable/Unobservable?
Using TriCheck for ISA MCM Design: RISC-V ▪ Ran TriCheck on draft RISC-V ISA MCM with • C11 HLL MCM [Batty et al. POPL 2011] [Batty et al. POPL 2016] • Compiler mappings based on RISC-V manual • Variety of microarchitectures that relaxed various memory orderings − All legal according to draft RISC-V spec − Ranging from SC microarchitecture to one with reorderings allowed by ARM/Power ▪ Draft RISC-V MCM for Base ISA incapable of correctly compiling C11: • C11 outcome forbidden, but impossible to forbid on hardware • RISC-V fences too weak to restore orderings that implementations could relax
Current RISC-V Status ▪ In response to our findings, RISC-V Memory Model Working Group was formed (we are members) • Mandate to create an MCM for RISC-V that satisfies community needs ▪ Working Group has developed an MCM proposal that fixes the aforementioned bugs (and other issues) ▪ MCM proposal recently passed the 45-day public feedback period! • Well on its way to being included in the next version of the RISC-V ISA spec
Recommend
More recommend