路由cpu负载过高检查
我們經(jīng)常會遇到High CPU的問題,一般來說CPU 分進(jìn)程高和中斷高兩種。中斷高很多時候是由于實(shí)時的流量造成的,要根據(jù)網(wǎng)絡(luò)的實(shí)際情況具體分析,原因有很多種,在此不一一列舉了。進(jìn)程高,不同的進(jìn)程也有不同的root cause。 在比較常見的幾種占用CPU資源進(jìn)程中,有一種叫 IP Input。
在 IOS 中,我們把 SW process 叫做 IP Input 進(jìn)程,顧名思義就是數(shù)據(jù)報文沒有被硬件 switching cache 或者 CEF 處理,而是 punt 到 CPU 去做進(jìn)一步的處理。對于一個數(shù)據(jù)包而言,有幾種情況會被 punt 到 CPU。
?
在 switching cache 里沒有該報文相關(guān)的條目。如該報文的目的地址是 1.1.1.1 而這個目的前綴在如接口所在的 cache 里找不到該條目,那么這個報文將上送CPU做進(jìn)一步查找。
目的地址是到本機(jī)的報文。
廣播報文。
IP 頭里攜帶 option 的報文。
需要協(xié)議轉(zhuǎn)換的報文。
需要加密或者壓縮的報文(如果有了 CSA(Compression Service Adapter)和 ESA(Encryption Service Adapter) 則不需要上送 CPU 只是本地處理了)。
分片報文或者需要重組的報文。(MTU設(shè)置的不合理,就可能會出發(fā)這類問題)。
故障診斷步驟
步驟 1,通過 show cpu process 或者 sh processes cpu sorted | exclude 0.00 來查看是 CPU 高,CPU 資源是否是被 IP Input 或者其他資源占用了。
ln-t1a-rt05>sh processes cpu sorted | exclude 0.00
CPU utilization for five seconds: 97%/40%; one minute: 99%; five minutes: 99%
PID Runtime(ms) ??Invoked ?????uSecs ??5Sec ??1Min ??5Min TTY Process
188 ??112780212 404873385 ???????278 54.14% 47.34% 48.09% ??0 IP Input ????????
452 ???32291128 170096816 ???????189 ?1.43% ?0.85% ?0.60% ??0 BGP Router ??????
444 ???31584160 ?54483137 ???????579 ?0.31% ?0.19% ?0.17% ??0 SNMP ENGINE ?????
442 ????7514572 ?54233626 ???????138 ?0.15% ?0.12% ?0.10% ??0 IP SNMP ?????????
??2 ?????239608 ??2221363 ???????107 ?0.07% ?0.06% ?0.05% ??0 Load Meter ??????
332 ???????9940 ?????2296 ??????4329 ?0.07% ?0.07% ?0.05% ??0 Syslog Traps ????
433 ???14477844 ??2116934 ??????6839 ?0.07% ?0.13% ?0.16% ??0 Tag Control ?????
321 ???12935848 171140132 ????????75 ?0.07% ?0.04% ?0.05% ??0 BGP I/O ?????????
259 ?????660104 ?50553708 ????????13 ?0.07% ?0.04% ?0.05% ??0 TCP Timer ???????
264 ?????417244 ??1429570 ???????291 ?0.07% ?0.62% ?0.54% ??0 XDR mcast ???????
446 ?????265680 ???424691 ???????625 ?0.07% ?0.19% ?0.22% ??0 SNMP Traps ??????
步驟 2,如果是 IP Input 進(jìn)程導(dǎo)致的 CPU 利用率高,那么執(zhí)行 show interface status 和 show interface,來查看哪些接口有較多的流量需要轉(zhuǎn)發(fā),以及這些接口是用何種轉(zhuǎn)發(fā)機(jī)制, (fast switching / cef switching /processor)
sh int stat
Switching path ???Pkts In ??Chars In ??Pkts Out ?Chars Out
?????????????Processor ????650856 ??73671257 ????526830 ??62471991
???????????Route cache ?330217560 ?821433561 ???6624091 1799996569
?????Distributed cache ?????????0 ?????????0 ?????????0 ?????????0
?????????????????Total ?330868416 ?895104818 ???7150921 1862468560
Interface POS0/1/0 is disabled ?<<<<<<<<
FastEthernet1/0/0
????????Switching path ???Pkts In ??Chars In ??Pkts Out ?Chars Out
?????????????Processor ?376139751 ?259046650 ?373525823 ??58361131
???????????Route cache ?212836164 1931396850 ?344019086 ?233164671
?????Distributed cache ?????????0 ?????????0 ?????????0 ?????????0
?????????????????Total ?588975961 2190447507 ?717544911 ?291525922
Serial1/1/0
????????Switching path ???Pkts In ??Chars In ??Pkts Out ?Chars Out
?????????????Processor ?????34782 ???9898739 ????869284 ??85447616
???????????Route cache ??14178391 3542542743 ?206212532 2364427002
?????Distributed cache ?????????0 ?????????0 ?????????0 ?????????0
?????????????????Total ??14213173 3552441482 ?207081816 2449874618
Commtouch-SC4-7513#
步驟 3,show ip traffic 的輸出,可以告訴我們是哪類流量增長最快,之后我們再檢查一下這類流量是否需要上送 CPU 做進(jìn)一步處理的(參考上述第一部分)。就能得出一個大致的結(jié)論----High CPU 問題是什么流量導(dǎo)致的。
Commtouch-SC4-7513#sh ip traffic
IP statistics:
Rcvd: ?3764825155 total, 1632039 local destination
???????0 format errors, 0 checksum errors, 46716 bad hop count
???????0 unknown protocol, 0 not a gateway
???????0 security failures, 0 bad options, 7772 with options
Opts: ?103 end, 10 nop, 387 basic security, 0 loose source route
???????0 timestamp, 0 extended security, 44 record route
???????0 stream ID, 0 strict source route, 7282 alert, 0 cipso
???????0 other
Frags: 0 reassembled, 0 timeouts, 0 couldn't reassemble
???????4 fragmented, 0 couldn't fragment
Bcast: 103181 received, 61 sent
Mcast: 0 received, 32678 sent
Sent: ?2929975 generated, 3758313272 forwarded
Drop: ?54546 encapsulation failed, 0 unresolved, 0 no adjacency
???????589 no route, 0 unicast RPF, 0 forced drop
ICMP statistics:
Rcvd: 0 format errors, 0 checksum errors, 0 redirects, 10299
unreachable
??????319787 echo, 24545 echo reply, 0 mask requests, 0 mask replies,
1
quench
??????0 parameter, 0 timestamp, 0 info request, 0 other
??????0 irdp solicitations, 0 irdp advertisements
Sent: 0 redirects, 1205238 unreachable, 24671 echo, 319787 echo reply
??????0 mask requests, 0 mask replies, 0 quench, 0 timestamp
??????0 info reply, 45869 time exceeded, 0 parameter problem
??????0 irdp solicitations, 0 irdp advertisements
UDP statistics:
Rcvd: 390403 total, 1 checksum errors, 292907 no port
Sent: 199023 total, 5 forwarded broadcasts
TCP statistics:
Rcvd: 883671 total, 0 checksum errors, 1878 no port
Sent: 1135440 total
Probe statistics:
Rcvd: 0 address requests, 0 address replies
??????0 proxy name requests, 0 where-is requests, 0 other
Sent: 0 address requests, 0 address replies (0 proxy)
??????0 proxy name replies, 0 where-is replies
EGP statistics:
Rcvd: 0 total, 0 format errors, 0 checksum errors, 0 no listener
Sent: 0 total
IGRP statistics:
Rcvd: 0 total, 0 checksum errors
Sent: 0 total
OSPF statistics:
Rcvd: 0 total, 0 checksum errors
??????0 hello, 0 database desc, 0 link state req
??????0 link state updates, 0 link state acks
Sent: 0 total
IP-IGRP2 statistics:
Rcvd: 0 total
Sent: 0 total
PIMv2 statistics: Sent/Received
Total: 0/0, 0 checksum errors, 0 format errors
Registers: 0/0, Register Stops: 0/0, ?Hellos: 0/0
Join/Prunes: 0/0, Asserts: 0/0, grafts: 0/0
Bootstraps: 0/0, Candidate_RP_Advertisements: 0/0
IGMP statistics: Sent/Received
Total: 0/0, Format errors: 0/0, Checksum errors: 0/0
Host Queries: 0/0, Host Reports: 0/0, Host Leaves: 00
DVMRP: 0/0, PIM: 0/0
ARP statistics:
Rcvd: 2159586 requests, 978 replies, 0 reverse, 0 other
Sent: 8855 requests, 27586 replies (25987 proxy), 0 reverse
===========
Commtouch-SC4-7513#sh tech
當(dāng)然,我們也可以通過 debug ip packet(detail) 來更直接地查看到底是什么樣的報文上送到CPU觸發(fā)了CPU High,但是由于此時CPU的利用率已經(jīng)很高了,所以建議先執(zhí)行
router(config)#no logging console
router(config)#no logging monitor
之后再執(zhí)行 logging buffered,這樣 Debug 的結(jié)果將被直接記錄到 log buffer 里,而不會再 session 里不停的彈出。最后,show log 就可以看到具體的報文了。
現(xiàn)在可以開始 debug:
router#debug ip packet detail
IP packet debugging is on (detailed)
Debug最長不能超過 3 到 5 秒鐘。可以使用 undebug all 命令停止:
router#undebug all
All possible debugging has been turned off
可以使用show logging 命令檢查結(jié)果:
router#show logging
Syslog logging: enabled (0 messages dropped, 0 flushes, 0 overruns)
???Console logging: disabled
???Monitor logging: disabled
???Buffer logging: level debugging, 145 messages logged
???Trap logging: level informational, 61 message lines logged
Log Buffer (64000 bytes):
*Mar ?3 03:43:27.320: IP: s=192.168.40.53 (Ethernet0/1), d=144.254.2.204
??(Ethernet0/0), g=10.200.40.1, len 100, forward
*Mar ?3 03:43:27.324: ICMP type=8, code=0 ?????
*Mar ?3 03:43:27.324: IP: s=192.168.40.53 (Ethernet0/1), d=144.254.2.205
??(Ethernet0/0), g=10.200.40.1, len 100, forward
*Mar ?3 03:43:27.324: ICMP type=8, code=0 ?????
*Mar ?3 03:43:27.328: IP: s=192.168.40.53 (Ethernet0/1), d=144.254.2.206
??(Ethernet0/0), g=10.200.40.1, len 100, forward
*Mar ?3 03:43:27.328: ICMP type=8, code=0 ?????
...
該log顯示:
?
-
每 4ms 收到一個包
-
報文的源 IP 地址是 192.168.40.53
-
報文從 interface Ethernet0/1 進(jìn)入的.
-
這些上送 CPU 的報文有不同的destination 地址.
-
這些報文時從 interface Ethernet0/0 發(fā)出去的
-
報文的下一跳是 10.200.40.1
-
報文是 ICMP requests (type=8)
在該例中,可以看到IP Input 進(jìn)程中的 High CPU 是由源自 IP 地址 192.168.40.53 的 ping flood 造成的。
通過這種方式,SYN flood 可以很容易地被發(fā)現(xiàn),因?yàn)樵?debugging 輸出中發(fā)現(xiàn)了 SYN flag :
*Mar 3 03:54:40.436: IP: s=192.168.40.53 (Ethernet0/1), d=144.254.2.204
??(Ethernet0/0), g=10.200.40.1, len 44, forward ????
*Mar 3 03:54:40.440: TCP src=11004, dst=53,
??seq=280872555, ack=0, win=4128 SYN
?
?
轉(zhuǎn)載于:https://blog.51cto.com/wangzhenyu/1108726
總結(jié)
以上是生活随笔為你收集整理的路由cpu负载过高检查的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 任何值得拥有的东西
- 下一篇: 尊重用户,提升产品欢迎度