platform io dma
play

Platform IO DMA Transaction Acceleration ICS/CACHES Steen Larsen - PowerPoint PPT Presentation

Platform IO DMA Transaction Acceleration ICS/CACHES Steen Larsen (steen.larsen@intel.com) Ben Lee (benl@eecs.oregonstate.edu) June 4 2011 Outline Introduction & Motivation Background Proposal Experiments & Analysis


  1. Platform IO DMA Transaction Acceleration ICS/CACHES Steen Larsen (steen.larsen@intel.com) Ben Lee (benl@eecs.oregonstate.edu) June 4 2011

  2. Outline • Introduction & Motivation • Background • Proposal • Experiments & Analysis • Related & Future work

  3. 10,000 foot view of IO IO growth is not matching CPU and memory bandwidth growth. • Multi-core processors (CMP, SMT) • NUMA

  4. Typical platform configuration and IO interface

  5. Legacy TX

  6. Legacy RX

  7. Critical path latency (10GbE 64B)

  8. IO transmit breakdown (10GbE 64B)

  9. PCIe bandwidth utilization

  10. Basic proposal claims Estimat Descri ed ptor Improv Factor Measurement unit DMA iDMA ement Comment/justification microseconds to send a TCP/IP message Descriptors are no longer Latency between two systems 8.8 7.38 16% latency critical Descriptors no longer Bandwidth- Gbps per serial lane consume chip-to-chip per-pin link 2.5 2.67 17% bandwidth

  11. Proposed TX

  12. Proposed RX

  13. iDMA internals

  14. Related work Sun Niagara2 Memory coherent IO

  15. Estimated Descript Improvem Factor Measurement unit or DMA iDMA ent Comment/justification microseconds to send a TCP/IP message between two Descriptors are no longer Latency systems 8.8 7.38 16% latency critical Bandwidth- Descriptors no longer consume per-pin Gbps per serial lane link 2.5 2.67 17% chip-to-chip bandwidth Bandwidth scalability Not quantifiable Reduced silicon area and power Power Normalized core power Power reduction due to more efficiency (maximum) 100% 29% 71% efficient core allocation of IO Nanoseconds to control Round trip latency to queuing Quality of connection priority from control reduced from PCIe to service software perspective 600 50 92% system memory Silicon, power regulation and cooling cost reduction of Multiple IO multiple IO controllers into a complexity Die cost reduction 100% <50% >50% single iDMA instance Security na na na not quantifiable

  16. Thank you! Questions?

Recommend


More recommend