Linux Multiqueue Networking David S. Miller Background Linux Multiqueue Networking RX Multiqueue TX Multiqueue David S. Miller Application- based and SW Steering Red Hat Inc. The End Portland, 2009
T RENDS Linux Multiqueue Networking David S. Miller Background More CPUs, either less powerful (high arity) or same RX (low arity) as existing CPUs Multiqueue Flow counts increasing TX Multiqueue Networking hardware adjusting to horizontal scaling Application- based and Single queue model no longer works SW Steering The End Routers and firewalls have different needs than servers
CPU D ESIGN Linux Multiqueue Networking David S. Miller Traditionally single CPUs or very low count SMP Background The move to high-arity CPU counts RX Multiqueue One model: Sun’s Niagara TX Lower powered CPUs, but many of them Multiqueue Application- Other model: x86 based systems based and SW Steering High powered CPUs, but not as high increase in arity The End as Niagara approach, starting with hyperthreading Future: Best of both worlds, high arity and power
E ND N ODES VS . I NTERMEDIATE N ODES Linux Multiqueue Networking End Nodes: Servers David S. Miller Intermediate Nodes: Routers and Firewalls Background Intermediate nodes have good flow distribution implicit RX in their traffic Multiqueue Also, processing a packet occurs purely within the TX Multiqueue networking stack itself, no application level work Application- based and End nodes also usually have good flow distribution SW Steering The End However, there is the added aspect of application cpu usage Completely stateless flow steering Or, application oriented flow steering
N ETWORKING H ARDWARE D ESIGN Linux Multiqueue Networking David Traditionally a single-queue model S. Miller Limitations of bus technology, f.e. PCI Background RX Advent of MSI and MSI-X interrupts Multiqueue RSS based flow hashing TX Multiqueue Multiple TX and RX queues Application- based and Stateless flow distribution SW Steering The End Extra sophistication: Sun’s Neptune 10G Ethernet TCAMs and more fine-grained flow steering Intel’s IXGBE “Flow Director”
NAPI: “N EW API” Linux Multiqueue Networking David Interrupt mitigation scheme designed by Jamal Hadi S. Miller Salim and Robert Olsson Background On interrupt, further interrupts are disabled and RX Multiqueue software interrupt is scheduled TX Software interrupt “polls” the driver, which processes Multiqueue Application- RX packets until no more pending packets or quota is based and hit SW Steering The End Quota provides DRR (Distributed Round Robin) sharing between links When polling is complete, chip interrupts are re-enabled
L IMITATIONS OF NAPI Linux Multiqueue Networking David S. Miller Background All state embedded literally inside of “struct netdevice” RX Ideally we want some kind of “NAPI instance” for each Multiqueue chip interrupt source TX Multiqueue But we had no direct way to instantiate such instances Application- based and structurally SW Steering The End Fixes were in order
S TEPHEN H EMMINGER TO THE R ESCUE Linux Multiqueue Networking David S. Miller Extracted NAPI state into seperate structure Background Device driver could create as many instances as RX Multiqueue necessary TX Multiqueue Multiple RX queues could be represented using Application- multiple NAPI instances based and SW Steering And this is exactly what multiqueue drivers do The End Oh BTW: Nasty hacks...
P ACKET S CHEDULER Linux Multiqueue Networking Sits between network stack and device transmit method David Supports arbitrary packet classification and an S. Miller assortment of queueing disciplines Background Has to lock QDISC and then device TX queue to get a RX Multiqueue packet to the device TX Multiqueue SMP unfriendly, and just like NAPI had state embedded Application- in netdevice struct based and SW Steering Root qdiscs cannot be shared The End Complicated qdisc and classifier state has “device scope” Luckily the default configuration is a stateless and simple qdisc
D RIVER TX M ETHOD Linux Multiqueue Networking David Manages TX queue flow control assuming one queue S. Miller Need to add queue specifier to flow control APIs Background RX But do so without breaking multiqueue-unaware drivers Multiqueue With NAPI we could totally break the API and just fix all TX Multiqueue the drivers at once Application- based and Only a relative handful of drivers use NAPI SW Steering Breaking the flow control API would require changes to The End roughly 450 drivers So, backward compatible solutions only.
TX Q UEUE S ELECTION Linux Multiqueue Networking David S. Miller Selected queue stored in SKB Queue selection function is different depending upon Background RX packet origin Multiqueue Forwarded packet: Function of RX queue selected by TX Multiqueue input device Application- based and Locally generated packet: Use hash value of attached SW Steering socket The End Thorny cases: Devices with unequal RX and TX queues
P ICTURE OF TX E NGINE Linux Multiqueue Networking David dev->queue_lock S. Miller T X l o c k Background RX Multiqueue QDISC dev_queue_xmit() -> hard_start_xmit TX Multiqueue Application- set SKB queue mapping based and SW Steering DRIVER The End TXQ TXQ TXQ
P ICTURE OF D EFAULT C ONFIGURATION Linux Multiqueue Networking David qdisc S. Miller TX lock Background TXQ ->q.lock RX Multiqueue qdisc TX driver Multiqueue TXQ TX lock dev_queue_xmit Application- ->q.lock based and SW Steering The End qdisc TXQ TX lock ->q.lock
P ICTURE WITH N ON - TRIVIAL QDISC Linux Multiqueue Networking SKB David S. Miller Background TX lock TXQ RX Multiqueue TX qdisc driver Multiqueue TX lock SKB TXQ ->q.lock Application- based and SW Steering TXQ The End TX lock skb
M OTIVATION Linux Multiqueue Networking David S. Miller Performance, duh... Background Many networking devices out there are not multiqueue RX Multiqueue capable TX Whilst stateless RX queue hashing is great for Multiqueue forwarding applications... Application- based and SW Steering It is decidedly suboptimal for end-nodes. The End Problem: Figuring out the packet’s “destination” before it’s “too late”
E XAMPLE S CENERIO Linux Multiqueue Networking David S. Miller Background RX Multiqueue TX Multiqueue Application- based and SW Steering The End
E ARLY E FFORTS Linux Multiqueue Networking David Influenced by Jens Axboe’s remote block I/O S. Miller completion experiments Background Up to 10 percent improvement in benchmarks where RX Multiqueue usually a 3 percent improvement is something to brag TX heavily about Multiqueue Application- Generalization of remote software interrupt invocation based and SW Steering Counterpart usage implemented for networking The End Basically SW multiqueue on receive Detrimental for loopback traffic
M ORE R ECENT W ORK Linux Multiqueue Networking David Patch posted by Tom Herbert at Google S. Miller Per-device “packet steering” table, set via sysctl by user Background When packet steering is enabled, receive packets are RX Multiqueue hashed and this indexes into the table TX Entry found in table is cpu to steer packets to Multiqueue Application- Packet steered to foreign cpus using remote SMP calls based and SW Steering and special software interrupt The End Whole mechanism is enabled also via sysctrl If disabled or no valid entry found in the table, behavior is existing behavior
A NOTHER I DEA : SW “F LOW D IRECTOR ” Linux Multiqueue Networking David S. Miller CPU on which transmits for a flow occur is Background “remembered” RX Multiqueue On receive for that flow, remembered cpu is looked up TX and packet steered to that CPU Multiqueue Application- Problems of space based and SW Steering Problems of time The End Problems of locality
C REDITS Linux Multiqueue Networking Linus Torvalds, for sharing his kernel instead of keeping David S. Miller it to himself Background Nivedita Singhvi for asking me to give this keynote RX Stephen Hemminger and Rusty Russell for early RX Multiqueue multiqueue work TX Multiqueue Jarek Poplawski, Patrick McHardy, Jamal Hadi Salim, Application- based and and Eric Dumazet for help with TX multiqueue SW Steering implementation The End Robert Olsson and Herbert Xu for continuing help throughout all of this Tom Herbert and others at Google for ongoing efforts
Recommend
More recommend