BUS Electronic Computers M Some drawings are from the Intel book «Weaving_High_Performance_Multiprocessor_Fabric” 1
Traditional bus Main Interfaces Memory Processor CPU Program Transit ALU Network and status Registers Registers Data Cache Local input/output Data bus BUS Address bus Bus control signals Graphic DMA processor Controller • Active agents (processor, graphic controller, DMA etc.) issue first the address (and the data on the data bus in case of write) and then pulse a line (read or write on the bus control signals) to read (or to store) the data on the data bus • The data destination (or the source) can be either the memory OR the input/output • Parallel bus. 64/128 address lines (how many GB?), 64/126 data lines, 30 control lines (Rd/Wr, Memory/IO, interrupts…) • All data transfers require the bus: BOTTLENECK 2
Bus evolution QPI DIB: Dual Independent Buses DHSI: Dedicated High Speed Interconnects Quick Path: bus serial evolution Point to point packetized bus Greater bandwidth Snoop/coherency protocol Communication paths dynamically reconfigurable 3
FSB until 2004 Monoprocessor Multiprocessor MCH Memory Central Hub ICH I/O Central Hub 4
DIB (2005-2007) Snoop traffic must however entail both busses 5
DHSI (2007-2008) • Still snoop problems • Centralized control • Similar to the twisted pair Ethernet 6
New requirements • Bigger overall bandwith • Total hw/sw trasparency after initialization • Riconfigurability and scalability • Low …. costs • Reliability • Provision for future needs • Lower number of connections (wires) between blocks Quick Path (84 wires/connection maximum) 7
Quick Path fully connected Interconnections between different devices 8
Quick Path Full width link Half width link 9 80 wires 40 wires
Quick Path partially connected In this architecture the connection between A and D – for instance – requires the use of AC and CD connections 10
Hyerarchically connected Quick Path (Network) 11
Terminology • Each node (which can be a multicore – remark the difference between multinode architectures and multicore chip) is connected to the system through a high efficiency cache and on the QPI bus acts as a caching agent • Each node has a private memory, that is it controls directly a portion of the global addresses which can be handled by one or more memory controller , each one of them is called home agent • The devices which control the I/O are called I/O agents • The devices which control the system boot are called firmware agents • In each node more cores can coexist which are called sockets • The interconnections (with different parallelism – see later) are called links • Interrupts and power down messages too are transmitted via QPI that is they too are network messages 13
Quick Path Cores are crossbar connected Block diagram of a single node with multiple cores (codenames Nehalem, Windmere, Sandy Bridge etc. – commercial names I5,I7…) 14
Architecture layers • Physical Layer: controls the physical information exchange and the transmission erros (for example the Cyclic Redundancy Code). It consists of monodirectional connections in both directions (transmission units: Phit) • Link Layer: reconstructs the messages from the Phits and controls the information flow (messages: Flit) • Routing layer: handles the routing of messages (see – for instances – the partially interconnected previoulsy analysed) • Protocol Layer: multiple tasks. It implements the cache coherency protocol, handles the non coherent messages, the interrupts, the memory mapped I/O etc. More generally it handles the messages sent over multiple links which involve multiple agents NB: The Quick Path protocol caters for the cache snoop and allows direct cache-to-cache transfers 15
Architecture layers(ISO) Co h e re nc e C o h e re nc e In te rrup t O rd e rin g Ord e rin g Inte rru pt High Level Protocol (packets) Protocol ... ... Communications End to End Reliable Transmission End-to- End-to- (Optional Layer) End End Transport Reliable Reliable Trans- Transmis mission sion Routing Agent Routing Routing Services Services Routing Link Flow Control Flow Control Electrical Xfers Electrical Xfers Physical Physical Layer: Link Layer: Routing Layer: Protocol Layer: High-Speed Reliable Framework for High Level protocol electrical transfer transmission, Flow routing capability information exchange, Control between High Level Commands agents MEMRD, IOWR, INT Coherency, Packets reorder Transport Layer: etc Advanced Routing capability for reliable end-to- end transmission (seldom 16 implemented)
Architetture layers Co h e re nc e C o h e re nc e In te rrup t O rd e rin g Ord e rin g Inte rru pt High Level Protocol (packets) ... ... Communications Protocol Layer End to End Reliable Transmission End-to- End-to- (Optional Layer) End End Reliable Reliable Trans- Transmis mission sion Routing Agent Routing Routing Services Services Link Layer Flow Control Flow Control Physical Electrical Xfers Electrical Xfers Layer Physical Layer: Link Layer: Protocol Layer: Operates on PHITS Operates on FLITS Transmission Layer: PHIT = 20 bits (18 data, 2 CRC) 1 FLIT = 4 PHITS Routing Layer: 1 PHIT carries 2 BYTES of data + FLIT is the minimum Operates on PACKETS 2 controls bit+ 2 bits CRC unit of protocol 1 Packet = 1 or more FLITS PHIT is the minimum unit of raw data 1 FLIT = 4x (2 bytes/Phit)=8 bytes data Only the protocol layer is aware of the meaning of the transmitted data 17
Example This is an example of a data message (one packet) FLT1 FLT2 FLT3 FLT4 FLT5 FLT6 FLT7 FLT8 FLT9 FLT10 FLT11 FLT12 FLT13 8 (4x2) data bytes Phit1 Phit2 Phit3 Phit4 20 bits 2 bits 2 bits 2 bytes CRC data cntr 18
Physical Layer Differential transmission An example Transmission of a «1» V+= +0,5V, V- = -0,5V, Vout= (V+) - (V-) =1V Transmission of a «0» V+= 0 V, V- = 0, Vout= (V+) - (V-) =0V Smaller signal dynamic (0,5V/channel, 1V out) Noise rejection!!!!! Voltage swing in QPI i nominally 1 V. Maximum swing 1,36-1.38 V 19
Physical Layer Rcvd Clk Trsm Clk Lane 19 19 . . . . LINK . . RX TX . . . . Component B Lanes Lanes . . . . Component A . . 0 0 LINK PAIR 19 19 . . . . . . RX TX . . . . Lanes Lanes . . . . . . 0 0 Rcvd Clk Trsm Clk Signals or Traces Pin Physically a Differential Pair, Logically a Lane A differential pair (2 wires) is a lane A full link has 20 data lines (1 Phit 20 [16+2+2] bit => 40 differential wires) for each direction plus two clocks ( one for each direction - 4 differential wires) . Totally 20 data lanes x 2 wires x 2 directions + (2+2) clocks = 84 wires. 20
Physical layer 1 Quadrant consists of 10 wires (5x2 differential) plus 2 wires for the clock that is 12 wires. Transmission is bidirectional so 24 wires are involved. A quadrant transfers one fourth of a PHIT (that is 4+1 bits - differential). The single bit is in turn the control and CRC The clocks consist always of 4 wires no matter how many quadrants are present (a clock is common to all quadrants) 4 quadrants (5 x 4=20 lanes) make a full link (20 bit transmitted - only 16 for the payload) which transfers one PHIT . No matter how many quadrants are used (1 to 4) there is a single clock for each direction (in a full link there are 42 lanes - bidirectional 84 wires). CRC bits are always present 21
Payload Four quadrants of 5 bit Total 20 bit – 2 bytes payload ) 1 PHIT – 4 quadrants The information unit of the link layer is the flit which consists of 80 information bit (4 phits – 4x4=16 quadrants): 72 bits are data (64 bit real data - 8 bytes – plus 8 (4x2) bits for control) and 8 (4x2) CRC bits (each Phit carries two CRC bit). One Phit is transferred each clock edge (with 20 lanes – but a physical transfer can consist of a single quadrant only - in that case more transfers are required for a single Phit). The CRC has the following polynomial form X 8 ⊕ X 7 ⊕ X 2 ⊕ 1 which allows the correction of • single, doble and triple errors • all errors when their numbers are odd • any error in 8 consecutive bit • 99% of errors in 9 consecutive bit In order to grant the absolute correctness in all possible cases QPI can increase the reliability by means of an additional methodology called Rolling CRC which uses the CRC of the preceding Flit together to the present one leading to a 16th order 22 polynomial.
Phit e Flit Phits can be transmitted Out Of Order (possibly over different paths) and reassembled in the receiver Transmission direction 23
Different size Flit 1 PHIT (4 Phits in sequence) 1 2 Phits 3 4 Flit transmitted by means of single quadrants 24
Recommend
More recommend