University of Crete School of Sciences & Engineering Computer Science Department Master Thesis by Michael Papamichael Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors Supervisor Prof. Manolis G.H. Katevenis Heraklion, Crete, July, 2007
Presentation Outline NI Queue Manager Introduction Key Concepts Architecture & Implementation NI Design for CMPs NI Design Goals NI Design Issues Proposed NI Design Conclusions & Future Work 2
Presentation Outline NI Queue Manager Introduction Key Concepts Architecture & Implementation NI Design for CMPs NI Design Goals NI Design Issues Proposed NI Design Conclusions & Future Work 3
NI Queue Manager Introduction FPGA-based Prototyping Platform PCI-X RDMA-capable NIC in cluster environment Buffered crossbar switch Goals Confirm buffered crossbar behavior Interprocessor communication research 4
Presentation Outline NI Queue Manager Introduction Key Concepts Architecture & Implementation NI Design for CMPs NI Design Goals NI Design Issues Proposed NI Design Conclusions & Future Work 5
NI Queue Manager - Key Concepts Head-Of-Line (HOL) Blocking HOL Blocking reduces switch throughput Switching Fabric Input Queues Outputs 1 1 1 2 1 Idle! 2 2 2 2 1 HOL Blocking 6
NI Queue Manager - Key Concepts Virtual Output Queues (VOQs) VOQs eliminate HOL Blocking Switching Fabric Input Queues Outputs 1 1 1 1 2 1 2 2 2 2 VOQs 7
NI Queue Manager - Key Concepts Traffic Segmentation Schemes Traffic segmented to optimize switching Variable-Size MultiPacket (VSMP) Segmentation well suited to buffered crossbar 40 40 300 160 80 300 160 80 40 80 256 160 64 64 64 64 64 64 64 64 64 64 64 44 5 segments, S 256 B 11 segments, S = 64 B 580 bytes 704 bytes Variable-size Unipacket seg. Fixed-size Unipacket segments 40 40 300 160 80 300 160 80 68 256 256 256 256 256 3 segments, S = 256 B 3 segments, S 256 B 768 bytes 580 bytes Fixed-size Multipacket segments Variable-size Multipacket seg. 8
Presentation Outline NI Queue Manager Introduction Key Concepts Architecture & Implementation NI Design for CMPs NI Design Goals NI Design Issues Proposed NI Design Conclusions & Future Work 9
NI Queue Manager – Architecture & Implementation Overview Virtual Output Queues (VOQs) VOQ migration to external memory Hardware-managed linked lists VSMP Segmentation Scheduling Flow Control 3 major versions implemented 10
NI Queue Manager – Architecture & Implementation Architecture External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 11
NI Queue Manager – Architecture & Implementation Packet Sorter External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 12
NI Queue Manager – Architecture & Implementation Packet Sorter Sorts packets according to: destination other criteria (e.g. priority) Notifies Scheduler about incoming traffic Light-weight packet processing e.g. enforce maximum packet size 2 versions implemented with packet segmentation without packet segmentation 13
NI Queue Manager – Architecture & Implementation On-Chip VOQs External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 14
NI Queue Manager – Architecture & Implementation On-Chip VOQs Accumulates traffic in VOQs VOQs implemented as: Circular buffers in single statically partitioned on-chip memory Xilinx FIFOs 2 versions implemented VOQs in BRAM VOQs in Xilinx FIFOs 15
NI Queue Manager – Architecture & Implementation Linked List Manager External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 16
Linked List Manager Performs Segment Transfers variable-size segments fixed-size segments Manages Linked Lists Head, Tail pointers in on-chip memory Next Block pointers in DRAM (along data) Optimization Techniques Free Block Preallocation Free-List Bypass FSM-based Implementation 17
Linked List Manager Segment Transfers 1. From On-Chip VOQs to Packet Processor 2. From On-Chip to Off-Chip VOQs 3. From Off-Chip VOQs to Packet Processor Off-Chip VOQs 2 3 from Packet Sorter to Packet Processor fixed-size blocks On-Chip max. segment size = 1 block VOQs variable-size segments 1 18
Linked List Manager Segment Transfers 1. From On-Chip VOQs to Packet Processor 2. From On-Chip to Off-Chip VOQs 3. From Off-Chip VOQs to Packet Processor Off-Chip VOQs 2 3 from Packet Sorter to Packet Processor fixed-size blocks On-Chip max. segment size = 1 block VOQs variable-size segments 1 19
Linked List Manager Segment Transfers 1. From On-Chip VOQs to Packet Processor 2. From On-Chip to Off-Chip VOQs 3. From Off-Chip VOQs to Packet Processor Off-Chip VOQs 2 3 from Packet Sorter to Packet Processor fixed-size blocks On-Chip max. segment size = 1 block VOQs variable-size segments 1 20
Linked List Manager Segment Transfers 1. From On-Chip VOQs to Packet Processor 2. From On-Chip to Off-Chip VOQs 3. From Off-Chip VOQs to Packet Processor Off-Chip VOQs 2 3 from Packet Sorter to Packet Processor fixed-size blocks On-Chip max. segment size = 1 block VOQs variable-size segments 1 21
Linked List Manager Linked List Management Large VOQs migrate to DRAM Traffic stored in linked-lists of fixed-size blocks Dynamic allocation of external memory Block size needs to be: Large to benefit from DRAM burst length Small to minimize size of On-Chip VOQs 2 Basic Operations Enqueue Dequeue Free blocks stored in Free-Block List 22
Linked List Manager Basic Linked List Operations Enqueue Get free block from Free-Block list Write data in the new block Update Next-Block pointer of the last block Update VOQ Tail pointer Dequeue Read data from the first block Read Next-Block from first block Update VOQ Head pointer Put free block in Free-Block list 23
Linked List Manager Enqueue/Dequeue Example DRAM VOQ Pointers 0 Head Tail 0 1 1 2 … … … 3 5 4 … … … 15 5 … Free List 15 Head Tail 16 17 Next Free Block 18 19 20 Enqueue into VOQ 5 … Enqueue into VOQ 5 Dequeue from VOQ 5 24
Linked List Manager Finite State Machine (FSM) PUSHFB2 PUSHFB2 DEQ1 DEQ2 PUSH FREE BLOCK SRAM2XBAR1 DEQUEUE Idle SRAM2XBAR2 SRAM2XBAR POPFB2 POPFB2 ENQ1 ENQ2 ENQUEUE POP FREE BLOCK 25
Linked List Manager Finite State Machine (FSM) DEQUEUE PUSH FREE BLOCK On-Chip VOQs IDLE to Packet Processor POP FREE BLOCK ENQUEUE 26
NI Queue Manager Scheduler External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 27
Scheduler Keeps track of each VOQ On-chip occupancy Off-chip occupancy Employs Flow Control (network & local) Number of sent data words Number of credits Implements Scheduling Builds VOQ eligibility masks Enforces scheduling policy Instructs Linked List Manager 28
Scheduler Scheduling Issues Determining Eligibility One eligibility mask for each kind of transfer Eager approach Lazy approach Scheduling Policy Round-Robin Weighted Round-Robin Deficit Round-Robin Strict Priority Starvation? 29
NI Queue Manager Packet Processor External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 30
Packet Processor Processes Network Traffic Receives variable-size segments Creates autonomous network packets Performs 3 Basic Operations Insert header Modify header Delete header Implemented as 3-stage pipeline Greatly depends on packet nature RDMA packets well suited 31
Packet Processor Example of packet processing Traffic passing through Packet Processor Insert Insert Insert Insert Modify Delete Modify Seg 5 Seg 4 Seg 3 Segment 2 Segment 1 : : : : : Packet 5 Packet 4 Pck 3 Pck 2 Packet 1 : = Packet Header = Packet Body = End of Packet 32
NI Queue Manager Implementation 3 major versions Full “No external memory” “No VSMP segmentation” Variations of individual modules Packet Processor with/without segmentation On-Chip VOQs with BRAM/Xilinx FIFOs Linked List Manager with/without external memory support 33
Recommend
More recommend