UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet - - PDF document

udp performance and pci x activity of the intel 10
SMART_READER_LITE
LIVE PREVIEW

UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet - - PDF document

UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium 2 SuperMicro P4DP8-2G Dual Xenon Dell Poweredge 2650 Dual Xenon Richard Hughes-Jones Many people helped including: Sverre Jarp and Glen


slide-1
SLIDE 1

1

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

1

UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium 2 SuperMicro P4DP8-2G Dual Xenon Dell Poweredge 2650 Dual Xenon

Richard Hughes-Jones

Many people helped including: Sverre Jarp and Glen Hisdal CERN Open Lab Sylvain Ravot, Olivier Martin and Elise Guyot DataTAG project Les Cottrell, Connie Logg and Gary Buhrmaster SLAC Stephen Dallison MB-NG

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

2

! Introduction ! 10 GigE on Itanium IA64 ! 10 GigE on Xeon IA32 ! 10 GigE on Dell Xeon IA32 ! Tuning the PCI-X bus ! SC2003 Phoenix

slide-2
SLIDE 2

2

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

3

! UDP/IP packets sent between back-to-back systems

" Similar processing to TCP/IP but no flow control & congestion avoidance algorithms " Used UDPmon test program

! Latency

" Round trip times using Request-Response UDP frames " Latency as a function of frame size ¥ Slope s given by: ¥ Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) ¥ Intercept indicates processing times + HW latencies " Histograms of ÔsingletonÕ measurements

! UDP Throughput

" Send a controlled stream of UDP frames spaced at regular intervals " Vary the frame size and the frame transmit spacing & measure: ¥ The time of first and last frames received ¥ The number packets received, lost, & out of order ¥ Histogram inter-packet spacing received packets ¥ Packet loss pattern ¥ 1-way delay ¥ CPU load ¥ Number of interrupts

Latency & Throughput Measurements

! Tells us about:

" Behavior of the IP stack " The way the HW operates " Interrupt coalescence

! Tells us about:

" Behavior of the IP stack " The way the HW operates " Capacity & Available throughput

  • f the LAN / MAN / WAN

1

s

!

"

# $ % & ' ( =

paths data

dt db

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

4

The Throughput Measurements

! UDP Throughput

! Send a controlled stream of UDP frames spaced at regular intervals

Zero stats OK done ___ Get remote statistics Send statistics:

  • No. received
  • No. lost + loss pattern
  • No. out-of-order

CPU load & no. int 1-way delay Send data frames at regular intervals ___ Time to send Time to receive Inter-packet time (Histogram) Signal end of test OK done

n bytes Number of packets Wait time time ¥¥¥

slide-3
SLIDE 3

3

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

5

The PCI Bus & Gigabit Ethernet Measurements

! PCI Activity ! Logic Analyzer with

" PCI Probe cards in sending PC " Gigabit Ethernet Fiber Probe Card " PCI Probe cards in receiving PC

Gigabit Ethernet Probe CPU mem chipset NIC CPU mem NIC chipset Logic Analyser Display

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

6

Example: The 1 Gigabit NIC Intel pro/1000

Latency Throughput Bus Activity

gig6-7 Intel pci 66 MHz 27nov02 200 400 600 800 1000 5 10 15 20 25 30 35 40 Transmit Time per frame us Recv Wire rate Mbits/s 50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

! Motherboard: Supermicro P4DP6 ! Chipset: E7500 (Plumas) ! CPU: Dual Xeon 2 2GHz with 512k L2 cache ! Mem bus 400 MHz PCI-X 64 bit 66 MHz ! HP Linux Kernel 2.4.19 SMP ! MTU 1500 bytes ! Intel PRO/1000 XT

Intel 64 bit 66 MHz y = 0.0093x + 194.67 y = 0.0149x + 201.75 50 100 150 200 250 300 500 1000 1500 2000 2500 3000 Message length bytes

Latency us

64 bytes Intel 64 bit 66 MHz 100 200 300 400 500 600 700 800 900 170 190 210 Latency us

N(t)

512 bytes Intel 64 bit 66 MHz 100 200 300 400 500 600 700 800 170 190 210 Latency us

N(t)

1024 bytes Intel 64 bit 66 MHz 100 200 300 400 500 600 700 800 190 210 230 Latency us

N(t)

1400 bytes Intel 64 bit 66 MHz 100 200 300 400 500 600 700 800 190 210 230 Latency us

N(t)

Receive Transfer Send Transfer

slide-4
SLIDE 4

4

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

7

Data Flow: SuperMicro 370DLE: SysKonnect

# Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset # CPU: PIII 800 MHz PCI:64 bit 66 MHz # RedHat 7.1 Kernel 2.4.14 # 1400 bytes sent # Wait 100 us # ~8 us for send or receive # Stack & Application overhead ~ 10 us / node Send PCI Receive PCI ~36 us Send Transfer Send CSR setup Receive Transfer Packet on Ethernet Fibre

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

8

10 Gigabit Ethernet NIC with the PCI-X probe card.

slide-5
SLIDE 5

5

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

9

Intel PRO/10GbE LR Adapter in the HP rx2600 system

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

10

10 GigE on Itanium IA64: UDP Latency

! Motherboard: HP rx2600 IA 64 ! Chipset: HPzx1 ! CPU: Dual Itanium 2 1GHz with 512k L2 cache ! Mem bus dual 622 MHz 4.3 GByte/s ! PCI-X 133 MHz ! HP Linux Kernel 2.5.72 SMP ! Intel PRO/10GbE LR Server Adapter ! NIC driver with

"

RxIntDelay=0

"

XsumRX=1 XsumTX=1

"

RxDescriptors=2048 TxDescriptors=2048 ! MTU 1500 bytes ! Latency 100 !s & very well behaved ! Latency Slope 0.0033 !s/byte ! B2B Expect: 0.00268 !s/byte

"

PCI 0.00188 **

"

10GigE 0.0008

"

PCI 0.00188

slide-6
SLIDE 6

6

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

11

10 GigE on Itanium IA64: Latency Histograms

! Double peak structure with the peaks separated by 3-4 !s ! Peaks are ~1-2 !s wide ! Similar to that observed with 1 Gbit Ethernet NICs on IA32 architectures

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

12

10 GigE on Itanium IA64: UDP Throughput

# HP Linux Kernel 2.5.72 SMP # MTU 16114 bytes # Max throughput 5.749 Gbit/s # Int on every packet # No packet loss in 10M packets # Sending host, 1 CPU is idle # For 14000-16080 byte packets, one CPU is 40% in kernel mode # As the packet size decreases load rises to ~90% for packets of 4000 bytes or less. # Receiving host both CPUs busy # 16114 bytes 40% kernel mode # Small packets 80 % kernel mode # TCP gensink data rate was 745 MBytes/s = 5.96 Gbit/s

Oplab29-30 10GE Xsum 512kbuf MTU16114 30Jul03 20 40 60 80 100 5 10 15 20 25 30 35 40 Spacing between frames us % CPU kernel Receiver 16080 bytes 16000 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes Oplab29-30 10GE Xsum 512kbuf MTU16114 30Jul03

20 40 60 80 100 5 10 15 20 25 30 35 40 Spacing between frames us % CPU kernel Sender

16080 bytes 16000 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

Oplab29-30 10GE Xsum 512kbuf MTU16114 30Jul03

1000 2000 3000 4000 5000 6000 5 10 15 20 25 30 35 40

Spacing between frames us Recv Wire rate Mbits/s

16080 bytes 16000 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

slide-7
SLIDE 7

7

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

13

10 GigE on Itanium IA64: UDP Throughput [04]

# HP Linux Kernel 2.6.1 #17 SMP # MTU 16114 bytes # Max throughput 5.81 Gbit/s # Int on every packet # Some packet loss pkts < 4000 bytes # Sending host, 1 CPU is idle Ð but swap over # For 14000-16080 byte packets, one CPU is 20-30% in kernel mode # As the packet size decreases load rises to ~90% for packets of 4000 bytes or less. # Receiving host 1 CPU is idle Ð but swap over # 16114 bytes 40% kernel mode # Small packets 70 % kernel mode

Openlab98-99 10GE MTU16114 12 Feb04 1000 2000 3000 4000 5000 6000 5 10 15 20 25 30 35 40 Spacing between frames us Recv Wire rate Mbits/s

16080 bytes 16000 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes Openlab98-99 10GE MTU16114 12 Feb04

10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 Spacing between frames us % CPU Kernel Sender

16080 bytes 16000 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes Openlab98-99 10GE MTU16114 12 Feb04 10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40

Spacing between frames us % CPU Kernel Reciever

16080 bytes 16000 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

14 ! 16080 byte packets every 200 !s Intel PRO/10GbE LR Server Adapter MTU 16114 ! setpci -s 02:1.0 e6.b=2e (22 26 2a ) mmrbc 4096 bytes (512 1024 2048)

! PCI-X Signals transmit - memory to NIC ! Interrupt and processing: 48.4 !s after start ! Data transfer takes ~22 !s ! Data transfer rate over PCI-X: 5.86 Gbit/s

10 GigE on Itanium IA64: PCI-X bus Activity

CSR Access Transfer of 16114 bytes PCI-X bursts 256 bytes

! Made up of 4 PCI-X sequences of ~4.55 !s then a gap of 700 ns ! Sequence contains 16 PCI bursts 256 bytes ! Sequence length 4096 bytes ( mmrbc)

CSR Access PCI-X Sequence 4096 bytes PCI-X bursts 256 bytes Gap 700ns PCI-X Sequence

slide-8
SLIDE 8

8

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

15 ! 16080 byte packets every 200 !s Intel PRO/10GbE LR Server Adapter MTU 16114 ! setpci -s 02:1.0 e6.b=2e (22 26 2a ) mmrbc 4096 bytes (512 1024 2048)

! PCI-X Signals transmit - memory to NIC ! Interrupt and processing: 48.4 !s after start ! Data transfer takes ~22 !s ! Data transfer rate over PCI-X: 5.86 Gbit/s ! PCI-X Signals receive Ð NIC to memory ! Interrupt every packet ! Data transfer takes ~18.4 !s ! Data transfer rate over PCI-X : 7.014 Gbit/s ! Note: receive is faster cf the 1 GE NICs

10 GigE on Itanium IA64: PCI-X bus Activity

CSR Access Transfer of 16114 bytes PCI-X bursts 256 bytes Interrupt CSR Access Transfer of 16114 bytes PCI-X bursts 512 bytes PCI-X Sequence PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

16

10 GigE on Xeon IA32: UDP Latency

! Motherboard: Supermicro P4DP8-G2 ! Chipset: Intel E7500 (Plumas) ! CPU: Dual Xeon 2.2GHz with 512k L2 cache ! Mem bus 400 MHz ! PCI-X 133 MHz ! RedHat Kernel 2.4.21 SMP ! Intel(R) PRO/10GbE Network Driver v1.0.45 ! Intel PRO/10GbE LR Server Adapter ! NIC driver with

"

RxIntDelay=0

"

XsumRX=1 XsumTX=1

"

RxDescriptors=2048 TxDescriptors=2048 ! MTU 1500 bytes ! Latency 144 !s & reasonably behaved ! Latency Slope 0.0032 !s/byte ! B2B Expect: 0.00268 !s/byte

slide-9
SLIDE 9

9

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

17

10 GigE on Xeon IA32: Latency Histograms

! Double peak structure with the peaks separated by 3-4 !s ! Peaks are ~1-2 !s wide ! Simliar to that observed with 1 Gbit Ethernet NICs on IA32 architectures

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

18

10 GigE on Xeon IA32: Throughput

# MTU 16114 bytes # Max throughput 2.75 Gbit/s mmrbc 512 # Max throughput 3.97 Gbit/s mmrbc 4096 bytes # Int on every packet # No packet loss in 10M packets # Sending host, # For closely spaced packets, the other CPU is ~60-70 % in kernel mode # Receiving host # Small packets 80 % in kernel mode # >9000 bytes ~50% in kernel mode

DataTAG3-6 10GE MTU16114

500 1000 1500 2000 2500 3000 3500 5 10 15 20 25 30 35 40 Spacing between frames us Recv Wire rate Mbits/s 16000 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

Dt3-6 10GE Xsum 512kbuf MTU16114 31Jul03 20 40 60 80 100 5 10 15 20 25 30 35 40 Spacing between frames us % CPU kernel Receiver 16000 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes Dt3-6 10GE Xsum 512kbuf MTU16114 31Jul03 20 40 60 80 100 5 10 15 20 25 30 35 40 Spacing between frames us % CPU Kernel Sender 16000 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

slide-10
SLIDE 10

10

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

19

10 GigE on Xeon IA32: PCI-X bus Activity

! 16080 byte packets every 200 !s Intel PRO/10GbE LR mmrbc 512 bytes ! PCI-X Signals transmit - memory to NIC ! Interrupt and processing: 70 !s after start ! Data transfer takes ~44.7 !s ! Data transfer rate over PCI-X: 2.88Gbit/s ! PCI-X Signals receive Ð NIC to memory ! Interrupt every packet ! Data transfer takes ~18.29 !s ! Data transfer rate over PCI-X : 7.014Gbit/s same as Itanium

Interrupt CSR Access Transfer of 16114 bytes PCI-X burst 256 bytes Interrupt Transfer of 16114 bytes CSR Access PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

20

10 GigE on Dell Xeon: Throughput

# MTU 16114 bytes # Max throughput 5.4 Gbit/s # Int on every packet # Some packet loss pkts < 4000 bytes # Sending host, # For closely spaced packets, one CPU is ~70% in kernel mode # CPU usage swaps # Receiving host # 1 CPU is idle but CPU usage swaps # For closely spaced packets ~80 % in kernel mode

SLAC Dell 10GE MTU16114 1000 2000 3000 4000 5000 6000 5 10 15 20 25 30 35 40 Spacing between frames us Recv Wire rate Mbits/s

16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

C

SLAC Dell 10GE MTU16114 20 40 60 80 100 5 10 15 20 25 30 35 40 Spacing between frames us % CPU Kernel Sender 16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes C SLAC Dell 10GE MTU16114 20 40 60 80 100 5 10 15 20 25 30 35 40 Spacing between frames us % CPU Kernel Receiver 16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes C

slide-11
SLIDE 11

11

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

21

10 GigE on Dell Xeon : UDP Latency

! Motherboard: Dell Poweredge 2650 ! Chipset: Intel E7500 (Plumas) ! CPU: Dual Xeon 3.06 GHz with 512k L2 cache ! Mem bus 533 MHz ! PCI-X 133 MHz ! RedHat Kernel 2.4.20 altAIMD ! Intel(R) PRO/10GbE Network Driver v1.0.45 ! Intel PRO/10GbE LR Server Adapter ! NIC driver with

"

RxIntDelay=0

"

XsumRX=1 XsumTX=1

"

RxDescriptors=2048 TxDescriptors=2048 ! MTU 16114 bytes ! Latency 36 !s with some steps ! Latency Slope 0.0017 !s/byte ! B2B Expect: 0.00268 !s/byte

SLAC Dell 10 GE y = 0.0017x + 36.579 10 20 30 40 50 60 200 400 600 800 1000 1200 1400 Message length bytes Latency us ave time us min time SLAC Dell 10 GE 10 20 30 40 50 60 70 80 2000 4000 6000 8000 10000 12000 Message length bytes Latency us

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

22

10 GigEthernet: Throughput

! 1500 byte MTU gives ~ 2 Gbit/s ! Used 16144 byte MTU max user length 16080 ! DataTAG Supermicro PCs ! Dual 2.2 GHz Xeon CPU FSB 400 MHz ! PCI-X mmrbc 512 bytes ! wire rate throughput of 2.9 Gbit/s ! SLAC Dell PCs giving a ! Dual 3.0 GHz Xeon CPU FSB 533 MHz ! PCI-X mmrbc 4096 bytes ! wire rate of 5.4 Gbit/s ! CERN OpenLab HP Itanium PCs ! Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz ! PCI-X mmrbc 4096 bytes ! wire rate of 5.7 Gbit/s

an-al 10GE Xsum 512kbuf MTU16114 27Oct03

1000 2000 3000 4000 5000 6000 5 10 15 20 25 30 35 40 Spacing between frames us Recv Wire rate Mbits/s 16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

slide-12
SLIDE 12

12

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

23

Tuning PCI-X: Variation of mmrbc IA32

mmrbc 1024 bytes mmrbc 2048 bytes mmrbc 4096 bytes mmrbc 512 bytes CSR Access PCI-X Sequence Data Transfer Interrupt & CSR Update

! 16080 byte packets every 200 !s ! Intel PRO/10GbE LR Adapter ! PCI-X bus occupancy vs mmrbc ! Plot:

"

Measured times

"

Times based on PCI-X times from the logic analyser

"

Expected throughput

5 10 15 20 25 30 35 40 45 50 1000 2000 3000 4000 5000 Max Memory Read Byte Count PCI-X Transfer time us 1 2 3 4 5 6 7 8 9 PCI-X Transfer rate Gbit/s

Measured PCI-X transfer time us expected time us rate from expected time Gbit/s Max throughput PCI-X

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

24

Tuning PCI-X: Throughput vs mmrbc

2 4 6 8 10 1000 2000 3000 4000 5000 Max Memory Read Byte Count PCI-X Transfer time us

measured Rate Gbit/s rate from expected time Gbit/s Max throughput PCI-X

Kernel 2.6.1#17 HP Itanium Intel10GE Feb04 2 4 6 8 10 1000 2000 3000 4000 5000 Max Memory Read Byte Count PCI-X Transfer time us

measured Rate Gbit/s rate from expected time Gbit/s Max throughput PCI-X

Kernel 2.6.1#17 HP Itanium Intel10GE 10 20 30 40 50 1000 2000 3000 4000 5000 Max Memory Read Byte Count CPU % Kernel mode local - sending remote - receiving Std UDP Kernel 2.4.22 Dell Intel10GE 2 4 6 8 10 1000 2000 3000 4000 5000 Max Memory Read Byte Count PCI-X Transfer time us

measured Rate Gbit/s rate from expected time Gbit/s Max throughput PCI-X

! DataTag IA32 ! 2.2 GHz Xeon ! 400 MHz FSB ! 2.7 Ð 4.0 Gbit.s ! SLAC Dell ! 3.0 GHz Xeon ! 533 MHz FSB ! 3.5 - 5.2 Gbit/s ! OpenLab IA64 ! 1.0 GHz Itanium ! 622 MHz FSB ! 3.2 - 5.7 Gbit/s ! CPU load is 1 CPU not average

10 20 30 40 50 60 70 1000 2000 3000 4000 5000 Max Memory Read Byte Count % CPU kernel

local - sending remote - receiving

Std UDP Kernel 2.4.22 Dell Intel10GE

10 20 30 40 50 60 70 80 1000 2000 3000 4000 5000 Max Memory Read Byte Count CPU % Kernel mode local - sending remote - receiving

slide-13
SLIDE 13

13

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

25

10 GigEthernet at SC2003 BW Challenge

! Three Server systems with 10 GigEthernet NICs ! Used the DataTAG altAIMD stack 9000 byte MTU ! Send mem-mem iperf TCP streams From SLAC/FNAL booth in Phoenix to:

"

Pal Alto PAIX

"

rtt 17 ms , window 30 MB

"

Shared with Caltech booth

"

4.37 Gbit hstcp I=5%

"

Then 2.87 Gbit I=16%

"

Fall corresponds to 10 Gbit on link

"

3.3Gbit Scalable I=8%

"

Tested 2 flows sum 1.9Gbit I=39%

"

Chicago Starlight

"

rtt 65 ms , window 60 MB

"

Phoenix CPU 2.2 GHz

"

3.1 Gbit hstcp I=1.6%

"

Amsterdam SARA

"

rtt 175 ms , window 200 MB

"

Phoenix CPU 2.2 GHz

"

4.35 Gbit hstcp I=6.9%

"

Very Stable

"

Both used Abilene to Chicago

10 Gbits/s throughput from SC2003 to PAIX 1 2 3 4 5 6 7 8 9 10 11/19/03 15:59 11/19/03 16:13 11/19/03 16:27 11/19/03 16:42 11/19/03 16:56 11/19/03 17:11 11/19/03 17:25 Date & Time Throughput Gbits/s Router to LA/PAIX Phoenix-PAIX HS-TCP Phoenix-PAIX Scalable-TCP Phoenix-PAIX Scalable-TCP #2 10 Gbits/s throughput from SC2003 to Chicago & Amsterdam 1 2 3 4 5 6 7 8 9 10 11/19/03 15:59 11/19/03 16:13 11/19/03 16:27 11/19/03 16:42 11/19/03 16:56 11/19/03 17:11 11/19/03 17:25 Date & Time Throughput Gbits/s Router traffic to Abilele Phoenix-Chicago Phoenix-Amsterdam

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

26

Summary & Conclusions

! Intel PRO/10GbE LR Adapter and driver gave stable throughput and worked well ! Need large MTU (9000 or 16114) Ð 1500 bytes gives ~2 Gbit/s ! PCI-X tuning mmrbc = 4096 bytes increase by 55% (3.2 to 5.7 Gbit/s) ! PCI-X sequences clear on transmit gaps ~ 950 ns ! Transfers: transmission (22 !s) takes longer than receiving (18 !s) ! Tx rate 5.85 Gbit/s Rx rate 7.0 Gbit/s (Itanium) (PCI-X max 8.5Gbit/s) ! CPU load considerable 60% Xenon 40% Itanium ! BW of Memory system important Ð crosses 3 times! ! Sensitive to OS/ Driver updates ! More study needed

slide-14
SLIDE 14

14

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

27 PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

28

Test setup with the CERN Open Lab Itanium systems

slide-15
SLIDE 15

15

PFLDNet Argonne Feb 2004

  • R. Hughes-Jones Manchester

29

! 1472 byte packets every 15 !s Intel Pro/1000 ! PCI:64 bit 33 MHz ! 82% usage ! PCI:64 bit 66 MHz ! 65% usage ! Data transfers half as long Send PCI Receive PCI Send setup Data Transfers Receive Transfers Send PCI Receive PCI Send setup Data Transfers Receive Transfers

1 GigE on IA32: PCI bus Activity 33 & 66 MHz