An integrated concurrency and core- ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors Kathryn E. Gray 1 Gabriel Kerneis 1+ Dominic Mulligan 1 Christopher Pulte 1 Susmit Sarkar 2 Peter Sewell 1 1 University of Cambridge 1+ During work 2 University of St Andrews
What is an architecture spec?
What is an architecture spec? Typically prose
What is an architecture spec? Typically prose
What is an architecture spec? Typically prose Sometimes pseudocode
What is an architecture spec? Typically prose Version 2.06 Branch I-form Branch Conditional B-form b target_addr (AA=0 LK=0) bc BO,BI,target_addr (AA=0 LK=0) ba target_addr (AA=1 LK=0) bca BO,BI,target_addr (AA=1 LK=0) bl target_addr (AA=0 LK=1) bcl BO,BI,target_addr (AA=0 LK=1) bla target_addr (AA=1 LK=1) bcla BO,BI,target_addr (AA=1 LK=1) 18 LI AA LK 16 BO BI BD AA LK 0 6 30 31 0 6 11 16 30 31 if AA then NIA � iea EXTS(LI || 0b00) if (64-bit mode) else NIA � iea CIA + EXTS(LI || 0b00) then M � 0 Sometimes pseudocode if LK then LR � iea CIA + 4 else M � 32 if ¬ BO 2 then CTR � CTR - 1 target_addr specifies the branch target address. ctr_ok � BO 2 | ((CTR M:63 ≠ 0) ⊕ BO 3 ) cond_ok � BO 0 | (CR BI+32 ≡ BO 1 ) If AA=0 then the branch target address is the sum of if ctr_ok & cond_ok then LI || 0b00 sign-extended and the address of this if AA then NIA � iea EXTS(BD || 0b00) instruction, with the high-order 32 bits of the branch tar- else NIA � iea CIA + EXTS(BD || 0b00) get address set to 0 in 32-bit mode. if LK then LR � iea CIA + 4 If AA=1 then the branch target address is the value BI+32 specifies the Condition Register bit to be tested. LI || 0b00 sign-extended, with the high-order 32 bits of The BO field is used to resolve the branch as described the branch target address set to 0 in 32-bit mode. in Figure 42. target_addr specifies the branch target address. If LK=1 then the effective address of the instruction fol- lowing the Branch instruction is placed into the Link If AA=0 then the branch target address is the sum of Register. BD || 0b00 sign-extended and the address of this instruction, with the high-order 32 bits of the branch tar- Special Registers Altered: get address set to 0 in 32-bit mode. LR (if LK=1) If AA=1 then the branch target address is the value BD || 0b00 sign-extended, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction fol- lowing the Branch instruction is placed into the Link Register. Special Registers Altered: CTR (if BO 2 =0) LR (if LK=1) Extended Mnemonics: Examples of extended mnemonics for Branch Condi- tional : Extended: Equivalent to: blt target bc 12,0,target bne cr2,target bc 4,10,target bdnz target bc 16,0,target
But …
But … • Not executable test oracles • You can’t test h/w or s/w against prose
But … • Not executable test oracles • You can’t test h/w or s/w against prose • Not a clear guide to concurrent behaviour • Especially for weakly consistent IBM Power and ARM
But … • Not executable test oracles • You can’t test h/w or s/w against prose • Not a clear guide to concurrent behaviour • Especially for weakly consistent IBM Power and ARM • A mass of instruction set detail
Specification as Artefact We (show how to) make architecture specs that are real technical artefacts
Specification as Artefact We (show how to) make architecture specs that are real technical artefacts • Executable as test oracle
Specification as Artefact We (show how to) make architecture specs that are real technical artefacts • Executable as test oracle • Mathematically precise
Specification as Artefact We (show how to) make architecture specs that are real technical artefacts • Executable as test oracle • Mathematically precise • Related to vendor pseudocode and intuition
Specification as Artefact We (show how to) make architecture specs that are real technical artefacts • Executable as test oracle • Mathematically precise • Related to vendor pseudocode and intuition • Clarify interface between ISA and concurrency
Specification as Artefact We (show how to) make architecture specs that are real technical artefacts Specifically IBM POWER all non-FP non-vector "user" ISA (153 instructions) and concurrency model
Specification as Artefact We (show how to) make architecture specs that are real technical artefacts Specifically IBM POWER all non-FP non-vector "user" ISA (153 instructions) and concurrency model Applicable to ARM as well See Modelling the ARMv8 Architecture, Operationally Concurrency and ISA, POPL16
Not just an emulator
Not just an emulator Emulator PPCMEM2
Not just an emulator Emulator PPCMEM2 Written in C etc A language with many faults Intermingling of emulation detail & semantics
Not just an emulator Emulator PPCMEM2 Written in C etc Written in Lem & Sail A language with many faults Languages for logic, Intermingling of emulation detail mathematics, and ISAs & semantics � Only spec detail Emulation separated
Not just an emulator Emulator PPCMEM2 Running concurrent code: Consider a lock
Not just an emulator Emulator PPCMEM2 T1 Lock T1 Set critical section T1 Unlock T2 Lock … � repeat Running concurrent code: Consider a lock
Not just an emulator Emulator PPCMEM2 init0:W crit/8=0 Thread 0 Thread 1 init1:W spin_lock_unlocked/8=0 T1 Lock R0[0-63] i4:BL enq i34:BL enq R0[0-63] R30 R30 rf[0-3,0,0] R0 R0 R0[0-63] enq:i5:LDAXR W1, [X0] R0[0-63] enq:i35:LDAXR W1, [X0] 4:Flow event: RXA spin_lock_unlocked/4 g:RXA spin_lock_unlocked/4 = 0 RXA spin_lock_unlocked/4 = ? R1 R1 T1 Set critical section R1[0-63] co R1 R1 i6:ADD W2, W1, #16, LSL #12 i36:ADD W2, W1, #16, LSL #12 R2 R2 R2[0-63] R1[0-63] R0[0-63] R0 R2 R1[0-63] R0 R2 T1 Unlock i37:STXR W3, W2, [X0] i7:STXR W3, W2, [X0] 5:Flow event: h:WX spin_lock_unlocked/4=0x00010000 R4[0-63] R30[0-63] a:WX spin_lock_unlocked/4=? R4[0-63] R5[0-63] h:WX spin_lock_unlocked/4=0x00010000 R5[0-63] R3 R4[0-63] R3 R4[0-63] co R30[0-63] R3[0-63] R3 R3 i8:CBNZ W3, exit i38:CBNZ W3, exit T2 Lock R5[0-63] R30[0-63] R1 R1 R4[0-63] i9:EOR W2, W1, W1, ROR #16 i39:EOR W2, W1, W1, ROR #16 R4[0-63] R2 R0[0-63] R2 R0[0-63] R2[0-63] R2 R2 i10:CBZ W2, out i40:CBZ W2, out R0[0-63] … R0[0-63] R0 spin:i11:LDAXRH W3, [X0] R30 R30 0:Memory read request from storage RXA spin_lock_unlocked/2 out:i12:RET out:i42:RET R3 R0[0-63] R4 R5 R1 R3 R4 R5 R0[0-63] i55:STR X5, [X4] i13:EOR W2, W3, W1, LSR #16 i25:STR X5, [X4] 6:Reorder events: m:W crit/8=1 and h:WX spin_lock_unlocked/4=0x00010000 R2 d:W crit/8=0 m:W crit/8=1 � rf[0-7,0,0] R4 R4 R2 i26:LDR X5, [X4] i56:LDR X5, [X4] i14:CBNZ W2, exit e:R crit/8 = 0 2:Memory read request from storage R crit/8 R5 R5 R5[0-63] repeat R30 R5 R5 out:i15:RET i27:CBNZ W5, error i57:CBZ W5, error R4 R5 i28:BL unlock error:i29:MOV W18, #1 i58:BL unlock error:i59:MOV W18, #1 i16:STR X5, [X4] R30 R18 R30 R18 b:W crit/8=0 Running concurrent code: R0 R0 R4 unlock:i30:LDRH W1, [X0] unlock:i60:LDRH W1, [X0] i17:LDR X5, [X4] 1:Memory read request from storage R spin_lock_unlocked/2 3:Memory read request from storage R spin_lock_unlocked/2 R5 R1 R1 R1 R1 R5 i31:ADD W1, W1, #1 i61:ADD W1, W1, #1 Consider a lock i18:CBNZ W5, error R1 R1 R0 R1 R0 R1 i19:BL unlock error:i20:MOV W18, #1 i32:STLRH W1, [X0] i62:STLRH W1, [X0] R30 R18 f:W.rel spin_lock_unlocked/2=? n:W.rel spin_lock_unlocked/2=? R0 i33:BL exit i63:BL exit unlock:i21:LDRH W1, [X0] R30 R30 R1 R1 i22:ADD W1, W1, #1 R1 R0 R1 i23:STLRH W1, [X0] c:W.rel spin_lock_unlocked/2=? i24:BL exit R30 Test SPINLOCK_UNROLL
Beneficiaries • Compiler writers • Concurrency primitive implementors • Security developers • Hardware developers
ISA model Litmus frontend Binary frontend test.litmus a.out Power 2.06B Framemaker Framemaker export Litmus parser ELF model OCaml Lem Power 2.06B XML parse, analyse, patch Concurrency model Harness Power 2.06B Text UI Sail Storage Web UI semantics Sail typecheck Lem OCaml, CSS, JS System Power 2.06B semantics Lem (Sail AST) Thread Lem executions semantics Sail interpreter Lem Lem
� � � � � � � � Sample Instruction Store Word with Update D-form stwu RS,D(RA) union ast member (bit[5], bit[5], bit[16]) Stwu � � 37 RS RA D function clause decode (0b100101 : � 0 6 11 16 31 (bit[5]) RS : � (bit[5]) RA : � EA � (RA) + EXTS(D) � (bit[16]) D as instr) = � MEM(EA, 4) � (RS) 32:63 � Stwu (RS,RA,D) � RA � EA � � Let the effective address (EA) be the sum (RA)+ D. function clause execute (Stwu (RS, RA, D)) = { � (RS) 32:63 are stored into the word in storage addressed (bit[64]) EA := 0; � by EA. EA := GPR[RA] + EXTS(D); � GPR[RA] := EA; � EA is placed into register RA. MEMw(EA,4) := (GPR[RS])[32 .. 63] � If RA=0, the instruction form is invalid. } Special Registers Altered: None
Recommend
More recommend