linux多队列网卡-凯发app官方网站

embedded linux

凯发app官方网站首页　| 　博文目录　| 　关于我

shdnzwy

博客访问： 139074
博文数量： 41
博客积分： 51
博客等级：民兵
技术积分： 225
用户组：普通用户
注册时间： 2007-08-31 22:53

文章分类

全部博文（41）

未分配的博文（41）

文章存档

2013年（15）

2012年（25）

2011年（1）

我的朋友

1.多队列网卡硬件实现

图1.1是intel 82575硬件逻辑图，有四个硬件队列。当收到报文时，通过hash包头的sip、sport、dip、dport四元组，将一条流总是收到相同的队列。同时触发与该队列绑定的中断。

图1.1 82575硬件逻辑图

2. 2.6.21以前网卡驱动实现

kernel从2.6.21之前不支持多队列特性，一个网卡只能申请一个中断号，因此同一个时刻只有一个核在处理网卡收到的包。如图2.1，协议栈通过napi轮询收取各个硬件queue中的报文到图2.2的net_device数据结构中，通过qdisc队列将报文发送到网卡。

图2.1 2.6.21之前内核协议栈

图2.2 2.6.21之前net_device

3. 2.6.21后网卡驱动实现

2.6.21开始支持多队列特性，当网卡驱动加载时，通过获取的网卡型号，得到网卡的硬件queue的数量，并结合cpu核的数量，最终通过sum=min（网卡queue，cpu core）得出所要激活的网卡queue数量（sum），并申请sum个中断号，分配给激活的各个queue。

如图3.1，当某个queue收到报文时，触发相应的中断，收到中断的核，将该任务加入到协议栈负责收包的该核的net_rx_softirq队列中（net_rx_softirq在每个核上都有一个实例），在net_rx_softirq中，调用napi的收包接口，将报文收到cpu中如图3.2的有多个netdev_queue的net_device数据结构中。

这样，cpu的各个核可以并发的收包，就不会应为一个核不能满足需求，导致网络io性能下降。

图3.1 2.6.21之后内核协议栈

图3.2 2.6.21之后net_device

4.中断绑定

当cpu可以平行收包时，就会出现不同的核收取了同一个queue的报文，这就会产生报文乱序的问题，解决方法是将一个queue的中断绑定到唯一的一个核上去，从而避免了乱序问题。同时如果网络流量大的时候，可以将软中断均匀的分散到各个核上，避免cpu成为瓶颈。

图4.1 /proc/interrupts

5.中断亲合纠正

一些多队列网卡驱动实现的不是太好，在初始化后会出现图4.1中同一个队列的tx、rx中断绑定到不同核上的问题，这样数据在core0与core1之间流动，导致核间数据交互加大，cache命中率降低，降低了效率。

图5.1 不合理中断绑定

linux network子系统的负责人david miller提供了一个脚本，首先检索/proc/interrupts文件中的信息，按照图4.1中eth0-rx-0（$vec）中的vec得出中断mask，并将mask

写入中断号53对应的smp_affinity中。由于eth-rx-0与eth-tx-0的vec相同，实现同一个queue的tx与rx中断绑定到一个核上，如图4.3所示。

图4.2 set_irq_affinity

图4.3 合理的中断绑定

set_irq_affinity脚本位于。

6.多队列网卡识别

#lspci -vvv

ethernet controller的条目内容，如果有msi-x && enable && tabsize > 1，则该网卡是多队列网卡，如图4.4所示。

图4.4 lspci内容

message signaled interrupts(msi)是pci规范的一个实现，可以突破cpu 256条interrupt的限制，使每个设备具有多个中断线变成可能，多队列网卡驱动给每个queue申请了msi。msi-x是msi数组，enable 指使能，tabsize是数组大小。

# setting up irq affinity according to /proc/interrupts
# 2008-11-25 robert olsson
# 2009-02-19 updated by jesse brandeburg
#
# > dave miller:
# (to get consistent naming in /proc/interrups)
# i would suggest that people use something like:
# char buf[ifnamsiz 6];
#
# sprintf(buf, "%s-%s-%d",
#         netdev->name,
#  (rx_interrupt ? "rx" : "tx"),
#  queue->index);
#
# assuming a device with two rx and tx queues.
# this script will assign:
#
# eth0-rx-0 cpu0
# eth0-rx-1 cpu1
# eth0-tx-0 cpu0
# eth0-tx-1 cpu1
#

set_affinity()
{
    mask=$((1<<$vec))
    printf "%s mask=%x for /proc/irq/%d/smp_affinity\n" $dev $mask $irq
    printf "%x" $mask > /proc/irq/$irq/smp_affinity
    #echo $dev mask=$mask for /proc/irq/$irq/smp_affinity
    #echo $mask > /proc/irq/$irq/smp_affinity
}

if [ "$1" = "" ] ; then
echo "description:"
echo "    this script attempts to bind each queue of a multi-queue nic"
echo "    to the same numbered core, ie tx0|rx0 --> cpu0, tx1|rx1 --> cpu1"
echo "usage:"
echo "    $0 eth0 [eth1 eth2 eth3]"
fi

# check for irqbalance running
irqbalance_on=`ps ax | grep -v grep | grep -q irqbalance; echo $?`
if [ "$irqbalance_on" == "0" ] ; then
echo " warning: irqbalance is running and will"
echo "          likely override this script's affinitization."
echo "          please stop the irqbalance service and/or execute"
echo "          'killall irqbalance'"
fi

#
# set up the desired devices.
#

for dev in $*
do
for dir in rx tx txrx
do
     max=`grep $dev-$dir /proc/interrupts | wc -l`
     if [ "$max" == "0" ] ; then
       max=`egrep -i "$dev:.*$dir" /proc/interrupts | wc -l`
     fi
     if [ "$max" == "0" ] ; then
       echo no $dir vectors found on $dev
       continue
       #exit 1
     fi
     for vec in `seq 0 1 $max`
     do
        irq=`cat /proc/interrupts | grep -i $dev-$dir-$vec"$" | cut -d: -f1 | sed "s/ //g"`
        if [ -n "$irq" ]; then
          set_affinity
        else
           irq=`cat /proc/interrupts | egrep -i $dev:v$vec-$dir"$" | cut -d: -f1 | sed "s/ //g"`
           if [ -n "$irq" ]; then
             set_affinity
           fi
        fi
     done
done
done

阅读(862) | 评论(0) | 转发(0) |

上一篇：iptables raw表

下一篇：fio测试磁盘的iops

给主人留下些什么吧！~~

| | | | |

感谢所有关心和支持过chinaunix的朋友们