背景
网络程序员对于“最大传输单元”--mtu应该都不陌生。对于网络传输而已,一条链路上的负载大小通常都是有限制的。比如,对于以太网,mtu通常被设置为1500字节(ip报文最大长度)。对于网络传输中最常用tcp协议而言,其架设与ip协议之上:tcp是基于流,它的数据需要通过划分成一个一个的块,然后组装成一个一个tcp报文,再交由ip协议封装、并发送。通常,为了更好的使用效率,tcp最好确认其数据块的大小,使得一个tcp报文能够顺利的装入一个ip报文中,而且不超过mtu的限制(相反,如果tcp报文过大,那么在ip层发送时,需要将报文分割为更小的多个报文发送,在接收端重组,效率很低)。tcp为了很好的完成这个任务,其通过扩展选项,通过服务端和客户端进行友好协商,选择一个合适的分片大小(mss),保证双方在数据传输过程中,不需要进行ip分片。
经过上面tcp协商处理后,通常是没有问题的。但后来运营商在网络接入时引入了pppoe后,情况就不同了!
pppoe协议是架设在以太网的ppp协议,它会在ip与ethernet之间添加一个pppoe头部(包含pppoe头部和ppp头部,共8字节),这样其实变形减小了链路的mtu;但问题在于,对于tcp/ip层面而已,pppoe是不可见,即tcp在协商mss时,所看到的mtu依然是ethernet的mtu,并没有排除pppoe头部长度。
见下面一个典型拓扑,client 通过pppoe接入intenet,并试图访问站点server的资源:
(clinet)-------(pppoe cli) ------- (pppoe serv) ------(internet) ------(server)
step1:client向server端发起tcp连接请求,同时申明它支持的最大分片长度为1460字节(mtu - tcp_header_len - ip_header_len);
step2:server回复client,同时申明自己支持最大分片长度为1460字节;
step3:client发起读取资源请求;
step4-1:server把资源按最大1460字节切片,然封装为tcp报文,进一步封装ip报文,经以太网成帧后发往client。
step4-2:报文从server向client过程中,流经pppoe serv,pppoe serv需要向报文添加pppoe头部,此时发现添加pppoe头部后报文超过mtu限制了!(怎么办?怎么办?怎么办?)
step4-3: (i)server在发送报文时,明确这个报文不允许分片,那么pppoe serv只能丢弃该报文;(ii)pppoe serv检查报文已经超长,那么久默默丢弃,当什么事情也没发生;(iii)pppoe serv对报文进行分片,逐个发生到clinet。
从上面可以看到,当pppoe serv接受到一个“超长”报文时,其对待的态度是不一定的;同理,当client发生一个“超长”报文时,pppoe cli对待态度也是不一定的。很遗憾,绝大多数pppoe cli不会对报文进行分片,并且pppoe serv也不是总是会执行分片操作。由此引发的问题是,通常client可以和server建立连接,但进行大数据传输时却失败了!
凯发app官方网站的解决方案
问题原因知悉后,解决就不困难了。在上述拓扑中,pppoe cli (或者pppoe serv)监听连接过程,client与server进行mss协商时,主动参与,修正mss:
(clinet)-------(pppoe cli) ------- (pppoe serv) ------(internet) ------(server)
step1:client向server端发起tcp连接请求,同时申明它支持的最大分片长度为1460字节;
step1-1:pppoe cli 捕获tcp mss协商,修正为1412字节;
step2:server确认连接请求,知悉client最大支持1412字节;申明其支持1412字节没有问题;
step2-1:pppoe cli 捕获tcp mss 协商,判断其值不大于1412,ok没有问题
step2-2:client 了解到server最大支持1412字节的分片长度;
step3:client请求资源;
step4:server按最大1412字节分片资源,组装发生给client;
step5:client 收到报文,与以ack确认;
......
经过上面分步说明,这个问题基本阐释清楚了。在路由器上,如果采用pppoe接入,通常需要执行tcp mss clamp,下面是内核pppoe模块添加tcp mss clamp的代码:
-
static uint16_t tcp_checksum(uint8_t *piphdr, uint8_t *ptcphdr)
-
{
-
uint32_t sum = 0;
-
uint16_t count;
-
uint16_t tmp;
-
-
uint8_t *addr;
-
uint8_t pseudo_header[12];
-
int i;
-
-
/* count number of bytes in tcp header and data: ip total length - ip header length */
-
count = piphdr[2] * 256 piphdr[3];
-
count -= (piphdr[0] & 0x0f) * 4;
-
-
/*ip src addr, dest addr, protocl, payload length*/
-
memcpy(pseudo_header, piphdr12, 8);
-
pseudo_header[8] = 0;
-
pseudo_header[9] = piphdr[9];
-
pseudo_header[10] = (count >> 8) & 0xff;
-
pseudo_header[11] = (count & 0xff);
-
-
/* checksum the pseudo-header */
-
for (i = 0; i < 12; i = 2)
-
{
-
sum = *(uint16_t *)(pseudo_header i);
-
}
-
-
/* checksum the tcp header and data */
-
addr = ptcphdr;
-
while (count > 1)
-
{
-
memcpy(&tmp, addr, sizeof(tmp));
-
sum = (uint32_t) tmp;
-
addr = sizeof(tmp);
-
count -= sizeof(tmp);
-
}
-
-
if (count > 0)
-
{
-
sum = (uint8_t) *addr;
-
}
-
-
while (sum >> 16)
-
{
-
sum = (sum & 0xffff) (sum >> 16);
-
}
-
return (uint16_t) ((~sum) & 0xffff);
-
}
-
-
-
-
/**
-
* detect syn of tcp, clamp msss
-
*/
-
static void clamp_mss(struct sk_buff* skb, int clamp_mss)
-
{
-
struct tcphdr* ptcphdr;
-
struct iphdr* piphdr;
-
uint8_t* pppphdr;
-
struct pppoe_hdr *ppppoehdr;
-
int len;
-
int minlen;
-
int optlen;
-
-
uint16_t csum;
-
uint16_t mss = 0;
-
uint8_t* opt;
-
uint8_t* mssopt;
-
-
ppppoehdr = pppoe_hdr(skb);
-
-
pppphdr = (uint8_t*)ppppoehdr sizeof(struct pppoe_hdr);
-
-
/* check ppp protocol type */
-
if (pppphdr[0] & 0x01)
-
{
-
/* may be 8 bit protocol type ? */
-
if (pppphdr[0] != 0x21)
-
{
-
return;
-
}
-
-
piphdr = (struct iphdr*)(pppphdr 1);
-
minlen = 41; // tcp header len ip header len ppp header len
-
}
-
else
-
{
-
/* 16 bit protocol type, upper layer is ip, and the protocol value is 0x0021*/
-
if (pppphdr[0] != 0x00 || pppphdr[1] != 0x21)
-
{
-
return;
-
}
-
piphdr = (struct iphdr*)(pppphdr 2);
-
minlen = 42;
-
}
-
-
/* is it too short? */
-
len = (int)ntohs(ppppoehdr->length);
-
if (len < minlen)
-
{
-
return;
-
}
-
-
/* verify once more that it's ipv4 */
-
if (piphdr->version != 4)
-
{
-
return;
-
}
-
-
/* is it a fragment that's not at the beginning of the packet? */
-
if ( ntohs(piphdr->frag_off) & 0x1fff)
-
{
-
return;
-
}
-
-
/* is it tcp? */
-
if (piphdr->protocol != 0x06)
-
{
-
return;
-
}
-
-
/* get start of tcp header */
-
ptcphdr = (struct tcphdr*)((uint8_t*)piphdr (piphdr->ihl) * 4);
-
-
/* is syn set? */
-
if (!ptcphdr->syn)
-
{
-
return;
-
}
-
-
/* compute and verify tcp checksum -- do not touch a packet with a bad checksum */
-
csum = tcp_checksum((uint8_t*)piphdr, (uint8_t*)ptcphdr);
-
if (csum)
-
{
-
return;
-
}
-
-
/* look for existing mss option */
-
optlen = ntohs(ptcphdr->doff) * 4 - 20;
-
-
if (optlen <= 0)
-
{
-
return;
-
}
-
-
opt = (uint8_t*)ptcphdr 20;
-
-
while (optlen > 0)
-
{
-
switch (*opt)
-
{
-
case 0: // end of options
-
case 1: // empty option, always use for pad
-
len = 1;
-
break;
-
case 2: // mss option
-
if (opt[1] != 4)
-
{
-
return;
-
}
-
-
len = 4;
-
mss = opt[2] * 256 opt[3];
-
mssopt = opt;
-
break;
-
case 3:
-
case 4:
-
case 5:
-
case 8:
-
len = (int)opt[1];
-
break;
-
default:
-
return;
-
-
}
-
-
if (mss > 0)
-
{
-
break;
-
}
-
-
optlen -= len;
-
opt = len;
-
-
}
-
-
/* if mss not exists or it's low enough, do nothing */
-
if (!mss || mss <= clamp_mss)
-
{
-
return;
-
}
-
-
mssopt[2] = (((unsigned) clamp_mss) >> 8) & 0xff;
-
mssopt[3] = ((unsigned) clamp_mss) & 0xff;
-
-
/* recompute tcp checksum */
-
ptcphdr->check = 0;
-
-
csum = tcp_checksum((uint8_t*)piphdr, (uint8_t*)ptcphdr);
-
ptcphdr->check = csum;
-
}
阅读(3729) | 评论(0) | 转发(0) |