tcp/ip 协议栈Linux源码分析五 IPv6分片报文重组分析一
做防火墻模塊的時候遇到過IPv6分片報文處理的問題是,當時的問題是netfilter無法基于傳輸層的端口攔截IPv6分片報文,但是IPv4的分片報文可以。分析了內核源碼得知是因為netfilter的連接跟蹤模塊重組了IPv4分片報文,但是對于IPv6的分片報文沒有重歸組導致,自3.10.x版本后的內核修改了這一塊,在PRE_ROUTING前netfilter重組了IPv6分片報文。
之前寫過幾篇博客分析了IPv4分片報文的處理,接下來分析下IPv6分片報文的處理。IPv6分片報文重組原理基本上和IPv4類似,都需要維護一個分片鏈表,都有定時器處理垃圾回收等等,細節方面上略有不同
內核版本:3.4.39
IPv6模塊啟動的時候會去注冊分片報文處理函數以及IPv6報文處理函數。
網卡驅動收到IPv6報文后查找協議處理函數,即ipv6_rcv,這個函數簡單處理后傳遞給pre-routing,如果沒有意外就傳遞給ip6_rcv_finish(),這個函數就是查找路由,然后調用相應的處理函數,如果是發給本機的就調用ip6_input(),在這個函數里面就是讓local-in鏈上的鉤子函數處理一遍,沒有意外的話就傳遞給ip6_input_finish(),在這個函數里面會遍歷各個傳輸層協議處理函數,比如路由選項、目的地選項、分片選項、tcp協議或者udp協議等等,我們主要關注分片選項的處理。完整的流程圖如下:
?分片報文是調用ipv6_frag_rcv函數來處理,在看這個函數之前,先看下ipv6分片表的組織圖:
?收到分片報文后會根據報文的三元素(saddr, daddr, ip ID)結合一個隨機數rnd計算一個哈希值,然后根據這個哈希值去查找哈希數組,每個數組元素由一個鏈表組成,鏈表中掛著哈希值相同的分片隊列,收到報文則去查找匹配的分片隊列,匹配了則插入進去,如果所有分片報文都集齊了則開始重組。用于計算哈希值的隨機數rnd不是固定的,內核會起一個定時器定期修改該值,然后重新分配分片隊列,這樣做是為了防止攻擊。因為分片隊列占用系統內存,如果一直都無法集齊的話,這段內存就會被垃圾回收定時器回收,回收分片內存的時候按照FIFO的原則,上圖中的lru鏈表就是用來處理這個的。
看下分片選項處理函數ipv6_frag_rcv()的實現:
static int ipv6_frag_rcv(struct sk_buff *skb) {struct frag_hdr *fhdr;struct frag_queue *fq;const struct ipv6hdr *hdr = ipv6_hdr(skb);struct net *net = dev_net(skb_dst(skb)->dev);//防止分片嵌套,分片報文會在下面設置這個標志位,這里是防止存在多個分片選項頭if (IP6CB(skb)->flags & IP6SKB_FRAGMENTED)goto fail_hdr;//增加統計計數IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMREQDS);/* Jumbo payload inhibits frag. header *///jumbo 類型特大報文不允許分片if (hdr->payload_len==0)goto fail_hdr;//設置分片選項頭指針if (!pskb_may_pull(skb, (skb_transport_offset(skb) +sizeof(struct frag_hdr))))goto fail_hdr;hdr = ipv6_hdr(skb);//獲取分片頭部指針fhdr = (struct frag_hdr *)skb_transport_header(skb);//檢查片偏移和MF標志位是否合法,不合法則設置IP6SKB_FRAGMENTED標志位//并返回1if (!(fhdr->frag_off & htons(0xFFF9))) {/* It is not a fragmented frame */skb->transport_header += sizeof(struct frag_hdr);IP6_INC_STATS_BH(net,ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMOKS);IP6CB(skb)->nhoff = (u8 *)fhdr - skb_network_header(skb);//設置一個標志位IP6CB(skb)->flags |= IP6SKB_FRAGMENTED;return 1;}//如果分片報文占用內存超過閾值,則調用ip6_evictor釋放部分舊的分片報文if (atomic_read(&net->ipv6.frags.mem) > net->ipv6.frags.high_thresh)ip6_evictor(net, ip6_dst_idev(skb_dst(skb)));//根據源地址,目的地址,IP ID去分片表中找到相應的分片哈希隊列,找到則返回//找不到則新建一個,該函數返回失敗的唯一可能性是創建失敗。fq = fq_find(net, fhdr->identification, &hdr->saddr, &hdr->daddr);if (fq != NULL) {int ret;spin_lock(&fq->q.lock);//找到隊列后則進行入隊或重組操作ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff);spin_unlock(&fq->q.lock);fq_put(fq);return ret;}IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_REASMFAILS);kfree_skb(skb);return -1;fail_hdr:IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_INHDRERRORS);icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, skb_network_header_len(skb));return -1; }找到分片隊列后則將該報文插入到隊列里合適的位置,這個處理交給ip6_frag_queue函數完成:
static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,struct frag_hdr *fhdr, int nhoff) {struct sk_buff *prev, *next;struct net_device *dev;int offset, end;struct net *net = dev_net(skb_dst(skb)->dev);//重組完成或者隊列被GC回收了會設置該標志位,這時候收到后續報文直接丟棄即可。if (fq->q.last_in & INET_FRAG_COMPLETE)goto err;//片偏移都是8字節的整數倍offset = ntohs(fhdr->frag_off) & ~0x7;//獲取可分片數據部分長度,用payload_len長度減去其它擴展選項頭長度end = offset + (ntohs(ipv6_hdr(skb)->payload_len) -((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1)));//分片報文最大長度不能超過IPV6_MAXPLEN(65535)if ((unsigned int)end > IPV6_MAXPLEN) {IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)),IPSTATS_MIB_INHDRERRORS);icmpv6_param_prob(skb, ICMPV6_HDR_FIELD,((u8 *)&fhdr->frag_off -skb_network_header(skb)));return -1;}//重新計算校驗和if (skb->ip_summed == CHECKSUM_COMPLETE) {const unsigned char *nh = skb_network_header(skb);skb->csum = csum_sub(skb->csum,csum_partial(nh, (u8 *)(fhdr + 1) - nh,0));}/* Is this the final fragment? */if (!(fhdr->frag_off & htons(IP6_MF))) {/* If we already have some bits beyond end* or have different end, the segment is corrupted.*///已經收到最后的分片了,檢查長度是否是否有異常,沒有的話//更新長度并設置標志位if (end < fq->q.len ||((fq->q.last_in & INET_FRAG_LAST_IN) && end != fq->q.len))goto err;fq->q.last_in |= INET_FRAG_LAST_IN;fq->q.len = end;} else {/* Check if the fragment is rounded to 8 bytes.* Required by the RFC.*///如果不是最后一個分片報文則end必須是8字節整數倍,否則按照協議報錯if (end & 0x7) {/* RFC2460 says always send parameter problem in* this case. -DaveM*/IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)),IPSTATS_MIB_INHDRERRORS);icmpv6_param_prob(skb, ICMPV6_HDR_FIELD,offsetof(struct ipv6hdr, payload_len));return -1;}if (end > fq->q.len) {/* Some bits beyond end -> corruption. *///長度不匹配,丟棄該報文if (fq->q.last_in & INET_FRAG_LAST_IN)goto err;//更新長度 fq->q.len = end;}}//數據長度為0,這種情況直接丟棄該報文if (end == offset)goto err;//將data指針指向數據部分/* Point into the IP datagram 'data' part. */if (!pskb_pull(skb, (u8 *) (fhdr + 1) - skb->data))goto err;//將數據調整到線性緩存區if (pskb_trim_rcsum(skb, end - offset))goto err;/* Find out which fragments are in front and at the back of us* in the chain of fragments so far. We must know where to put* this fragment, right?*/prev = fq->q.fragments_tail;if (!prev || FRAG6_CB(prev)->offset < offset) {next = NULL;goto found;}prev = NULL;for(next = fq->q.fragments; next != NULL; next = next->next) {if (FRAG6_CB(next)->offset >= offset)break; /* bingo! */prev = next;}found:/* RFC5722, Section 4, amended by Errata ID : 3089* When reassembling an IPv6 datagram, if* one or more its constituent fragments is determined to be an* overlapping fragment, the entire datagram (and any constituent* fragments) MUST be silently discarded.*//* Check for overlap with preceding fragment. *///根據RFC5722,如果分片報文數據部分有重疊的話則丟棄整個分片隊列if (prev &&(FRAG6_CB(prev)->offset + prev->len) > offset)goto discard_fq;/* Look for overlap with succeeding segment. */if (next && FRAG6_CB(next)->offset < end)goto discard_fq;FRAG6_CB(skb)->offset = offset;/* Insert this fragment in the chain of fragments. */skb->next = next;if (!next)fq->q.fragments_tail = skb;if (prev)prev->next = skb;elsefq->q.fragments = skb;dev = skb->dev;if (dev) {fq->iif = dev->ifindex;skb->dev = NULL;}//更新分片隊列時間戳和分片隊列總長度fq->q.stamp = skb->tstamp;fq->q.meat += skb->len;//增加分片占用的內存大小atomic_add(skb->truesize, &fq->q.net->mem);/* The first fragment.* nhoffset is obtained from the first fragment, of course.*///如果是第一個分片報文,設置下一個擴展選項首部指針并設置INET_FRAG_FIRST_IN標識if (offset == 0) {fq->nhoffset = nhoff;fq->q.last_in |= INET_FRAG_FIRST_IN;}//如果分片都收集齊了,則調用重組函數進行重組if (fq->q.last_in == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&fq->q.meat == fq->q.len)return ip6_frag_reasm(fq, prev, dev);write_lock(&ip6_frags.lock);list_move_tail(&fq->q.lru_list, &fq->q.net->lru_list);write_unlock(&ip6_frags.lock);return -1;discard_fq:fq_kill(fq); err:IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),IPSTATS_MIB_REASMFAILS);kfree_skb(skb);return -1; }如果分片報文都集齊了,則調用ip6_frag_reasm函數進行重組:
static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,struct net_device *dev) {struct net *net = container_of(fq->q.net, struct net, ipv6.frags);struct sk_buff *fp, *head = fq->q.fragments;int payload_len;unsigned int nhoff;//先將該隊列從分片表中移除fq_kill(fq);/* Make the one we just received the head. *///將最后收到的skb指向分片隊列首部,這時候需要提供一份自身副本同時釋放自身數據然后//指向隊首元素。這樣做主要是因為重組完成后后續處理是基于最后收到的報文if (prev) {head = prev->next;fp = skb_clone(head, GFP_ATOMIC);if (!fp)goto out_oom;fp->next = head->next;if (!fp->next)fq->q.fragments_tail = fp;prev->next = fp;skb_morph(head, fq->q.fragments);head->next = fq->q.fragments->next;kfree_skb(fq->q.fragments);fq->q.fragments = head;}WARN_ON(head == NULL);WARN_ON(FRAG6_CB(head)->offset != 0);/* Unfragmented part is taken from the first segment. *///獲取分片部分的總長度payload_len = ((head->data - skb_network_header(head)) -sizeof(struct ipv6hdr) + fq->q.len -sizeof(struct frag_hdr));//單個IP報文最大長度不能超過65535 if (payload_len > IPV6_MAXPLEN)goto out_oversize;/* Head of list must not be cloned. */if (skb_cloned(head) && pskb_expand_head(head, 0, 0, GFP_ATOMIC))goto out_oom;/* If the first fragment is fragmented itself, we split* it to two chunks: the first with data and paged part* and the second, holding only fragments. */// 分離head的數據部分和分片部分,方便后面處理if (skb_has_frag_list(head)) {struct sk_buff *clone;int i, plen = 0;if ((clone = alloc_skb(0, GFP_ATOMIC)) == NULL)goto out_oom;clone->next = head->next;head->next = clone;skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list;skb_frag_list_init(head);for (i = 0; i < skb_shinfo(head)->nr_frags; i++)plen += skb_frag_size(&skb_shinfo(head)->frags[i]);clone->len = clone->data_len = head->data_len - plen;head->data_len -= clone->len;head->len -= clone->len;clone->csum = 0;clone->ip_summed = head->ip_summed;atomic_add(clone->truesize, &fq->q.net->mem);}/* We have to remove fragment header from datagram and to relocate* header in order to calculate ICV correctly. *///重新構造報文首部nhoff = fq->nhoffset;skb_network_header(head)[nhoff] = skb_transport_header(head)[0];memmove(head->head + sizeof(struct frag_hdr), head->head,(head->data - head->head) - sizeof(struct frag_hdr));head->mac_header += sizeof(struct frag_hdr);head->network_header += sizeof(struct frag_hdr);//將分片報文掛到frag_list下面skb_shinfo(head)->frag_list = head->next;skb_reset_transport_header(head);skb_push(head, head->data - skb_network_header(head));//重新計算長度和校驗和for (fp=head->next; fp; fp = fp->next) {head->data_len += fp->len;head->len += fp->len;if (head->ip_summed != fp->ip_summed)head->ip_summed = CHECKSUM_NONE;else if (head->ip_summed == CHECKSUM_COMPLETE)head->csum = csum_add(head->csum, fp->csum);head->truesize += fp->truesize;}atomic_sub(head->truesize, &fq->q.net->mem);head->next = NULL;head->dev = dev;head->tstamp = fq->q.stamp;ipv6_hdr(head)->payload_len = htons(payload_len);IP6CB(head)->nhoff = nhoff;IP6CB(head)->flags |= IP6SKB_FRAGMENTED;/* Yes, and fold redundant checksum back. 8) */if (head->ip_summed == CHECKSUM_COMPLETE)head->csum = csum_partial(skb_network_header(head),skb_network_header_len(head),head->csum);rcu_read_lock();IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS);rcu_read_unlock();fq->q.fragments = NULL;fq->q.fragments_tail = NULL;return 1;out_oversize:if (net_ratelimit())printk(KERN_DEBUG "ip6_frag_reasm: payload len = %d\n", payload_len);goto out_fail; out_oom:if (net_ratelimit())printk(KERN_DEBUG "ip6_frag_reasm: no memory for reassembly\n"); out_fail:rcu_read_lock();IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMFAILS);rcu_read_unlock();return -1; }總的看來,關于IPv6分片隊列的入隊,回收,重組操作和IPv4類似,掌握了IPv4的重組基本上也就掌握了IPv6的重組。
總結
以上是生活随笔為你收集整理的tcp/ip 协议栈Linux源码分析五 IPv6分片报文重组分析一的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 电脑怎么恢复系统winx7 电脑如何重装
- 下一篇: ubuntu 创建github公钥