network interface architecture and prototyping for chip
play

Network Interface Architecture and Prototyping for Chip and Cluster - PowerPoint PPT Presentation

University of Crete School of Sciences & Engineering Computer Science Department Master Thesis by Michael Papamichael Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors Supervisor Prof. Manolis G.H.


  1. University of Crete School of Sciences & Engineering Computer Science Department Master Thesis by Michael Papamichael Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors Supervisor Prof. Manolis G.H. Katevenis Heraklion, Crete, July, 2007

  2. Presentation Outline  NI Queue Manager  Introduction  Key Concepts  Architecture & Implementation  NI Design for CMPs  NI Design Goals  NI Design Issues  Proposed NI Design  Conclusions & Future Work 2

  3. Presentation Outline  NI Queue Manager  Introduction  Key Concepts  Architecture & Implementation  NI Design for CMPs  NI Design Goals  NI Design Issues  Proposed NI Design  Conclusions & Future Work 3

  4. NI Queue Manager Introduction  FPGA-based Prototyping Platform  PCI-X RDMA-capable NIC in cluster environment  Buffered crossbar switch  Goals  Confirm buffered crossbar behavior  Interprocessor communication research 4

  5. Presentation Outline  NI Queue Manager  Introduction  Key Concepts  Architecture & Implementation  NI Design for CMPs  NI Design Goals  NI Design Issues  Proposed NI Design  Conclusions & Future Work 5

  6. NI Queue Manager - Key Concepts Head-Of-Line (HOL) Blocking  HOL Blocking reduces switch throughput Switching Fabric Input Queues Outputs 1 1 1 2 1 Idle! 2 2 2 2 1 HOL Blocking 6

  7. NI Queue Manager - Key Concepts Virtual Output Queues (VOQs)  VOQs eliminate HOL Blocking Switching Fabric Input Queues Outputs 1 1 1 1 2 1 2 2 2 2 VOQs 7

  8. NI Queue Manager - Key Concepts Traffic Segmentation Schemes  Traffic segmented to optimize switching  Variable-Size MultiPacket (VSMP) Segmentation well suited to buffered crossbar 40 40 300 160 80 300 160 80 40 80 256 160 64 64 64 64 64 64 64 64 64 64 64 44 5 segments, S 256 B 11 segments, S = 64 B 580 bytes 704 bytes Variable-size Unipacket seg. Fixed-size Unipacket segments 40 40 300 160 80 300 160 80 68 256 256 256 256 256 3 segments, S = 256 B 3 segments, S 256 B 768 bytes 580 bytes Fixed-size Multipacket segments Variable-size Multipacket seg. 8

  9. Presentation Outline  NI Queue Manager  Introduction  Key Concepts  Architecture & Implementation  NI Design for CMPs  NI Design Goals  NI Design Issues  Proposed NI Design  Conclusions & Future Work 9

  10. NI Queue Manager – Architecture & Implementation Overview  Virtual Output Queues (VOQs)  VOQ migration to external memory  Hardware-managed linked lists  VSMP Segmentation  Scheduling  Flow Control  3 major versions implemented 10

  11. NI Queue Manager – Architecture & Implementation Architecture External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 11

  12. NI Queue Manager – Architecture & Implementation Packet Sorter External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 12

  13. NI Queue Manager – Architecture & Implementation Packet Sorter  Sorts packets according to:  destination  other criteria (e.g. priority)  Notifies Scheduler about incoming traffic  Light-weight packet processing  e.g. enforce maximum packet size  2 versions implemented  with packet segmentation  without packet segmentation 13

  14. NI Queue Manager – Architecture & Implementation On-Chip VOQs External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 14

  15. NI Queue Manager – Architecture & Implementation On-Chip VOQs  Accumulates traffic in VOQs  VOQs implemented as:  Circular buffers in single statically partitioned on-chip memory  Xilinx FIFOs  2 versions implemented  VOQs in BRAM  VOQs in Xilinx FIFOs 15

  16. NI Queue Manager – Architecture & Implementation Linked List Manager External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 16

  17. Linked List Manager  Performs Segment Transfers  variable-size segments  fixed-size segments  Manages Linked Lists  Head, Tail pointers in on-chip memory  Next Block pointers in DRAM (along data)  Optimization Techniques  Free Block Preallocation  Free-List Bypass  FSM-based Implementation 17

  18. Linked List Manager Segment Transfers 1. From On-Chip VOQs to Packet Processor 2. From On-Chip to Off-Chip VOQs 3. From Off-Chip VOQs to Packet Processor Off-Chip VOQs 2 3 from Packet Sorter to Packet Processor fixed-size blocks On-Chip max. segment size = 1 block VOQs variable-size segments 1 18

  19. Linked List Manager Segment Transfers 1. From On-Chip VOQs to Packet Processor 2. From On-Chip to Off-Chip VOQs 3. From Off-Chip VOQs to Packet Processor Off-Chip VOQs 2 3 from Packet Sorter to Packet Processor fixed-size blocks On-Chip max. segment size = 1 block VOQs variable-size segments 1 19

  20. Linked List Manager Segment Transfers 1. From On-Chip VOQs to Packet Processor 2. From On-Chip to Off-Chip VOQs 3. From Off-Chip VOQs to Packet Processor Off-Chip VOQs 2 3 from Packet Sorter to Packet Processor fixed-size blocks On-Chip max. segment size = 1 block VOQs variable-size segments 1 20

  21. Linked List Manager Segment Transfers 1. From On-Chip VOQs to Packet Processor 2. From On-Chip to Off-Chip VOQs 3. From Off-Chip VOQs to Packet Processor Off-Chip VOQs 2 3 from Packet Sorter to Packet Processor fixed-size blocks On-Chip max. segment size = 1 block VOQs variable-size segments 1 21

  22. Linked List Manager Linked List Management  Large VOQs migrate to DRAM  Traffic stored in linked-lists of fixed-size blocks  Dynamic allocation of external memory  Block size needs to be:  Large to benefit from DRAM burst length  Small to minimize size of On-Chip VOQs  2 Basic Operations  Enqueue  Dequeue  Free blocks stored in Free-Block List 22

  23. Linked List Manager Basic Linked List Operations  Enqueue  Get free block from Free-Block list  Write data in the new block  Update Next-Block pointer of the last block  Update VOQ Tail pointer  Dequeue  Read data from the first block  Read Next-Block from first block  Update VOQ Head pointer  Put free block in Free-Block list 23

  24. Linked List Manager Enqueue/Dequeue Example DRAM VOQ Pointers 0 Head Tail 0 1 1 2 … … … 3 5 4 … … … 15 5 … Free List 15 Head Tail 16 17 Next Free Block 18 19 20 Enqueue into VOQ 5 … Enqueue into VOQ 5 Dequeue from VOQ 5 24

  25. Linked List Manager Finite State Machine (FSM) PUSHFB2 PUSHFB2 DEQ1 DEQ2 PUSH FREE BLOCK SRAM2XBAR1 DEQUEUE Idle SRAM2XBAR2 SRAM2XBAR POPFB2 POPFB2 ENQ1 ENQ2 ENQUEUE POP FREE BLOCK 25

  26. Linked List Manager Finite State Machine (FSM) DEQUEUE PUSH FREE BLOCK On-Chip VOQs IDLE to Packet Processor POP FREE BLOCK ENQUEUE 26

  27. NI Queue Manager Scheduler External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 27

  28. Scheduler  Keeps track of each VOQ  On-chip occupancy  Off-chip occupancy  Employs Flow Control (network & local)  Number of sent data words  Number of credits  Implements Scheduling  Builds VOQ eligibility masks  Enforces scheduling policy  Instructs Linked List Manager 28

  29. Scheduler Scheduling Issues  Determining Eligibility  One eligibility mask for each kind of transfer  Eager approach  Lazy approach  Scheduling Policy  Round-Robin  Weighted Round-Robin  Deficit Round-Robin  Strict Priority  Starvation? 29

  30. NI Queue Manager Packet Processor External Memory (Off-chip VOQs) Memory Controller Packet Linked List Packet Network Host Sorter (PCI-X) (RocketIO) Manager Processor On-Chip VOQs Scheduler Flow Control 30

  31. Packet Processor  Processes Network Traffic  Receives variable-size segments  Creates autonomous network packets  Performs 3 Basic Operations  Insert header  Modify header  Delete header  Implemented as 3-stage pipeline  Greatly depends on packet nature  RDMA packets well suited 31

  32. Packet Processor Example of packet processing Traffic passing through Packet Processor Insert Insert Insert Insert Modify Delete Modify Seg 5 Seg 4 Seg 3 Segment 2 Segment 1 : : : : : Packet 5 Packet 4 Pck 3 Pck 2 Packet 1 : = Packet Header = Packet Body = End of Packet 32

  33. NI Queue Manager Implementation  3 major versions  Full  “No external memory”  “No VSMP segmentation”  Variations of individual modules  Packet Processor  with/without segmentation  On-Chip VOQs  with BRAM/Xilinx FIFOs  Linked List Manager  with/without external memory support 33

Recommend


More recommend