TCP拥塞状态机的实现(中)
內(nèi)容:本文主要分析TCP擁塞狀態(tài)機(jī)的實(shí)現(xiàn)中,虛假SACK的處理、標(biāo)志丟失數(shù)據(jù)包的詳細(xì)過(guò)程。
內(nèi)核版本:2.6.37
作者:zhangskd @ csdn
?
虛假SACK
?
state B
如果接收的ACK指向已記錄的SACK,這說(shuō)明記錄的SACK并沒(méi)有反應(yīng)接收方的真實(shí)的狀態(tài),
也就是說(shuō)接收方現(xiàn)在已經(jīng)處于嚴(yán)重?fù)砣臓顟B(tài)或者在處理上有bug,所以接下來(lái)就按照超時(shí)
重傳的方式去處理。因?yàn)榘凑照5倪壿嬃鞒?#xff0c;接收的ACK不應(yīng)該指向已記錄的SACK,
而應(yīng)該指向SACK后面未接收的地方。通常情況下,此時(shí)接收方已經(jīng)刪除了保存到失序隊(duì)列中的段。
/* If ACK arrived pointing to a remembered SACK, it means that our remembered* SACKs do not reflect real state of receiver i.e. receiver host is heavily congested* or buggy.** Do processing similar to RTO timeout.*/static int tcp_check_sack_reneging (struct sock *sk, int flag) {if (flag & FLAG_SACK_RENEGING) {struct inet_connection_sock *icsk = inet_csk(sk);/* 記錄mib信息,供SNMP使用*/NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPSACKRENEGING);/* 進(jìn)入loss狀態(tài),1表示清除SACKED標(biāo)志*/tcp_enter_loss(sk, 1); /* 此函數(shù)在前面blog中分析過(guò):)*/icsk->icsk_retransmits++; /* 未恢復(fù)的RTO加一*//* 重傳發(fā)送隊(duì)列中的第一個(gè)數(shù)據(jù)包*/tcp_retransmit_skb(sk, tcp_write_queue_head(sk)); /* 更新超時(shí)重傳定時(shí)器*/inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);return 1;}return 0; }/** 用于返回發(fā)送隊(duì)列中的第一個(gè)數(shù)據(jù)包,或者NULL* skb_peek - peek at the head of an &sk_buff_head* @list_ : list to peek at ** Peek an &sk_buff. Unlike most other operations you must* be careful with this one. A peek leaves the buffer on the* list and someone else may run off with it. You must hold* the appropriate locks or have a private queue to do this.** Returns %NULL for an empty list or a pointer to the head element.* The reference count is not incremented and the reference is therefore* volatile. Use with caution.*/static inline struct sk_buff *skb_peek (const struct sk_buff_head *list_) {struct sk_buff *list = ((const struct sk_buff *) list_)->next;if (list == (struct sk_buff *) list_)list = NULL;return list; }static inline struct sk_buff *tcp_write_queue_head(const struct sock *sk) {return skb_peek(&sk->sk_write_queue); }tcp_retransmit_skb()用來(lái)重傳一個(gè)數(shù)據(jù)包。它最終調(diào)用tcp_transmit_skb()來(lái)發(fā)送一個(gè)數(shù)據(jù)包。
這個(gè)函數(shù)在接下來(lái)的blog中會(huì)分析。
/* This retransmits one SKB. Policy decisions and retransmit queue* state updates are done by the caller. Returns non-zero if an* error occurred which prevented the send.*/int tcp_retransmit_skb (struct sock *sk, struct sk_buff *skb) { }?
重設(shè)重傳定時(shí)器
?
state B
/** inet_connection_sock - INET connection oriented sock** @icsk_timeout: Timeout* @icsk_retransmit_timer: Resend (no ack)* @icsk_rto: Retransmission timeout* @icsk_ca_ops: Pluggable congestion control hook* @icsk_ca_state: Congestion control state* @icsk_ca_retransmits: Number of unrecovered [RTO] timeouts* @icsk_pending: scheduled timer event* @icsk_ack: Delayed ACK control data*/struct inet_connection_sock {...unsigned long icsk_timeout; /* 數(shù)據(jù)包超時(shí)時(shí)間*/struct timer_list icsk_retransmit_timer; /* 重傳定時(shí)器*/struct timer_list icsk_delack_timer; /* delay ack定時(shí)器*/__u32 icsk_rto; /*超時(shí)時(shí)間*/const struct tcp_congestion ops *icsk_ca_ops; /*擁塞控制算法*/__u8 icsk_ca_state; /*所處擁塞狀態(tài)*/__u8 icsk_retransmits; /*還沒(méi)恢復(fù)的timeout個(gè)數(shù)*/__u8 icsk_pending; /* 等待的定時(shí)器事件*/...struct {...__u8 pending; /* ACK is pending */unsigned long timeout; /* Currently scheduled timeout */...} icsk_ack; /* Delayed ACK的控制模塊*/...u32 icsk_ca_priv[16]; /*放置擁塞控制算法的參數(shù)*/... #define ICSK_CA_PRIV_SIZE (16*sizeof(u32)) }#define ICSK_TIME_RETRANS 1 /* Retransmit timer */ #define ICSK_TIME_DACK 2 /* Delayed ack timer */ #define ICSK_TIME_PROBE0 3 /* Zero window probe timer *//** Reset the retransmissiion timer*/ static inline void inet_csk_reset_xmit_timer(struct sock *sk, const int what,unsigned long when,const unsigned long max_when) {struct inet_connection_sock *icsk = inet_csk(sk);if (when > max_when) { #ifdef INET_CSK_DEBUGpr_debug("reset_xmit_timer: sk=%p %d when=0x%lx, caller=%p\n",sk, what, when, current_text_addr()); #endifwhen = max_when;}if (what == ICSK_TIME_RETRANS || what == ICSK_TIME_PROBE0) {icsk->icsk_pending = what;icsk->icsk_timeout = jiffies + when; /*數(shù)據(jù)包超時(shí)時(shí)刻*/sk_reset_timer(sk, &icsk->icsk_retransmit_timer, icsk->icsk_timeout);} else if (what == ICSK_TIME_DACK) {icsk->icsk_ack.pending |= ICSK_ACK_TIMER;icsk->icsk_ack.timeout = jiffies + when; /*Delay ACK定時(shí)器超時(shí)時(shí)刻*/sk_reset_timer(sk, &icsk->icsk_delack_timer, icsk->icsk_ack.timeout);} #ifdef INET_CSK_DEBUGelse {pr_debug("%s", inet_csk_timer_bug_msg);} #endif }?
添加LOST標(biāo)志
?
state C
Q: 我們發(fā)現(xiàn)有數(shù)據(jù)包丟失了,怎么知道要重傳哪些數(shù)據(jù)包呢?
A: tcp_mark_head_lost()通過(guò)給丟失的數(shù)據(jù)包標(biāo)志TCPCB_LOST,就可以表明哪些數(shù)據(jù)包需要重傳。
如果通過(guò)SACK發(fā)現(xiàn)有段丟失,則需要從重傳隊(duì)首或上次標(biāo)志丟失段的位置開(kāi)始,為記分牌為0的段
添加LOST標(biāo)志,直到所有被標(biāo)志LOST的段數(shù)達(dá)到packets或者被標(biāo)志序號(hào)超過(guò)high_seq為止。
/* Mark head of queue up as lost. With RFC3517 SACK, the packets is against sakced cnt,* otherwise it's against fakced cnt.* packets = fackets_out - reordering,表示sacked_out和lost_out的總和。* 所以,被標(biāo)志為L(zhǎng)OST的段數(shù)不能超過(guò)packets。* high_seq : 可以標(biāo)志為L(zhǎng)OST的段序號(hào)的最大值。* mark_head: 為1表示只需要標(biāo)志發(fā)送隊(duì)列的第一個(gè)段。*/static void tcp_mark_head_lost(struct sock *sk, int packets, int mark_head) {struct tcp_sock *tp = tcp_sk(sk);struct sk_buff *skb;int cnt, oldcnt;int err;unsigned int mss;/* 被標(biāo)志為丟失的段不能超過(guò)發(fā)送出去的數(shù)據(jù)段數(shù)*/WARN_ON(packets > tp->packets_out);/* 如果已經(jīng)有標(biāo)識(shí)為丟失的段了*/if (tp->lost_skb_hint) {skb = tp->lost_skb_hint; /* 下一個(gè)要標(biāo)志的段 */cnt = tp->lost_cnt_hint; /* 已經(jīng)標(biāo)志了多少段 *//* Head already handled? 如果發(fā)送隊(duì)列第一個(gè)數(shù)據(jù)包已經(jīng)標(biāo)志了,則返回 */if (mark_head && skb != tcp_write_queue_head(sk))return;} else {skb = tcp_write_queue_head(sk);cnt = 0;}tcp_for_write_queue_from(skb, sk) {if (skb == tcp_send_head(sk))break; /* 如果遍歷到snd_nxt,則停止*//* 更新丟失隊(duì)列信息*/tp->lost_skb_hint = skb;tp->lost_cnt_hint = cnt ;/* 標(biāo)志為L(zhǎng)OST的段序號(hào)不能超過(guò)high_seq */if (after(TCP_SKB_CB(skb)->end_seq, tp->high_seq))break;oldcnt = cnt;if (tcp_is_fack(tp) || tcp_is_reno(tp) || (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))cnt += tcp_skb_pcount(skb); /* 此段已經(jīng)被sacked *//* 主要用于判斷退出時(shí)機(jī) */if (cnt > packets) {if ((tcp_is_sack(tp) && !tcp_is_fack(tp) || (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) ||(oldcnt >= pakcets))break;mss = skb_shinfo(skb)->gso_size;err = tcp_fragment(sk, skb, (packets - oldcnt) * mss, mss);if (err < 0)break;cnt = packets;}/* 標(biāo)志動(dòng)作:標(biāo)志一個(gè)段為L(zhǎng)OST*/tcp_skb_mark_lost(tp, skb);if (mark_head)break;}tcp_verify_left_out(tp); }涉及變量
struct tcp_sock {/* 在重傳隊(duì)列中,緩存下次要標(biāo)志的段,為了加速對(duì)重傳隊(duì)列的標(biāo)志操作 */struct sk_buff *lost_skb_hint; /* 下一次要標(biāo)志的段 */int lost_cnt_hint; /* 已經(jīng)標(biāo)志了多少個(gè)段 */struct sk_buff *retransmit_skb_hint; /* 表示將要重傳的起始包*/u32 retransmit_high; /*重傳隊(duì)列的最大序列號(hào)*/struct sk_buff *scoreboard_skb_hint; /* 記錄超時(shí)的數(shù)據(jù)包,序號(hào)最大*/ }TCP分片函數(shù)tcp_fragment
/* Function to create two new TCP segments. shrinks the given segment* to the specified size and appends a new segment with the rest of the* packet to the list. This won't be called frequently, I hope.* Remember, these are still headerless SKBs at this point.*/int tcp_fragment (struct sock *sk, struct sk_buff *skb, u32 len,unsigned int mss_now) {}給一個(gè)段添加一個(gè)LOST標(biāo)志
static void tcp_skb_mark_lost(struct tcp_sock *tp, struct sk_buff *skb) {if (! (TCP_SKB_CB(skb)->sacked & (TCPCB_LOST | TCPCB_SACKED_ACKED))) {tcp_verify_retransmit_hint(tp, skb); /* 更新重傳隊(duì)列*/tp->lost_out += tcp_skb_pcount(skb); /*增加LOST的段數(shù)*/TCP_SKB_CB(skb)->sacked |= TCPCB_LOST; /* 添加LOST標(biāo)志*/} }/* This must be called before lost_out is incremented */ static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff *skb) {if ((tp->retransmit_skb_hint == NULL) ||before(TCP_SKB_CB(skb)->seq,TCP_SKB_CB(tp->retransmit_skb_hint)->seq))tp->retransmit_skb_hint = skb; if (! tp->lost_out ||after(TCP_SKB_CB(skb)->end_seq, tp->retransmit_high))tp->retransmit_high = TCP_SKB_CB(skb)->end_seq; }
?
轉(zhuǎn)載于:https://www.cnblogs.com/aiwz/archive/2012/12/14/6333363.html
總結(jié)
以上是生活随笔為你收集整理的TCP拥塞状态机的实现(中)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: [转].NET 数据库连接池
- 下一篇: Mac下下载android4.2源码,进