记一次 .NET 某智慧水厂API 非托管内存泄漏分析
一:背景
1. 講故事
七月底的時候有位朋友在wx上找到我,說他的程序內存占用8G,托管才占用1.5G,詢問剩下的內存哪里去了?截圖如下:
從求助內容看,這位朋友真的太客氣了,動不動就談錢,真傷感情,如果有朋友一直關注我的分享,應該知道我一直都是免費分析dump,當然我的知識和經驗也是有邊界的,有些dump我也搞不定,不過我還是盡自己最大努力去尋找答案。
在這里我有必要說一下職場,在我的潛意識或者在我的團隊中,這些很難搞的問題當然由技術領導去搞定,但我發現有好幾起卻不是這樣的,技術經理搞不定轉包下來,下面搞不定就讓他另請高明。。。???????????? ?有大佬可以分析下嗎。
好了,閑話不多說,當務之急上windbg說話。
二:windbg 分析
1. 真的是非托管泄漏嗎?
我在很多分析內存泄漏方面的文章都提到過,先要用二分法確定下是哪一部分的內存泄漏(托管還是非托管)。
0:000>?!address?-summary---?Usage?Summary?----------------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal Free????????????????????????????????????387?????7df2`11ac1000?(?125.946?TB)???????????98.39% <unknown>??????????????????????????????2229??????20c`a21bb000?(???2.049?TB)??99.75%????1.60% Heap???????????????????????????????????1081????????1`33914000?(???4.806?GB)???0.23%????0.00% Image??????????????????????????????????1674????????0`0e4be000?(?228.742?MB)???0.01%????0.00% Stack???????????????????????????????????973????????0`0a140000?(?161.250?MB)???0.01%????0.00% TEB?????????????????????????????????????324????????0`00288000?(???2.531?MB)???0.00%????0.00% Other????????????????????????????????????11????????0`001d9000?(???1.848?MB)???0.00%????0.00% PEB???????????????????????????????????????1????????0`00001000?(???4.000?kB)???0.00%????0.00%---?Type?Summary?(for?busy)?------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal MEM_MAPPED??????????????????????????????300??????200`00f9e000?(???2.000?TB)??97.35%????1.56% MEM_PRIVATE????????????????????????????3869????????d`dd7ed000?(??55.461?GB)???2.64%????0.04% MEM_IMAGE??????????????????????????????2124????????0`0fda4000?(?253.641?MB)???0.01%????0.00%---?State?Summary?----------------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal MEM_FREE????????????????????????????????387?????7df2`11ac1000?(?125.946?TB)???????????98.39% MEM_RESERVE????????????????????????????1763??????20b`d9903000?(???2.046?TB)??99.60%????1.60% MEM_COMMIT?????????????????????????????4530????????2`14c2c000?(???8.324?GB)???0.40%????0.01%0:000>?!eeheap?-gc Number?of?GC?Heaps:?40 ------------------------------ Heap?Size:???????????????Size:?0x3322e60?(53620320)?bytes. ------------------------------ GC?Heap?Size:????????????Size:?0x603046b0?(1613776560)?bytes.從 !address -summary 和 !eeheap -gc 兩條命令看,確實如朋友所說:MEM_COMMIT=8.3G, GC Heap=1.5G, 我去,果然是難搞的非托管內存泄漏,既然是地獄模式,那就硬著頭皮繼續看吧,要想繼續排查的話,首先得看 windows nt 堆。
2. 查看 windows nt堆
其實不管是托管的C#還是非托管的C,C++,它們分配內存最終都需要調用 Windows 的 VirtualAlloc,HeapAlloc API 到 windows nt 上,接下來的研究方向是如何查找這些 .net 看不到的 nt堆, 可以使用 windbg 的 !heap -s 命令。
0:000>?!heap?-s************************************************************************************************************************NT?HEAP?STATS?BELOW ************************************************************************************************************************ LFH?Key???????????????????:?0x0e4dcfd61ab09dd9 Termination?on?corruption?:?ENABLEDHeap?????Flags???Reserv??Commit??Virt???Free??List???UCR??Virt??Lock??Fast?(k)?????(k)????(k)?????(k)?length??????blocks?cont.?heap? ------------------------------------------------------------------------------------- 000001bacd190000?00000002?4841944?4810424?4840388??13556??2042???303????2????dd8???LFH 000001baccfb0000?00008000??????64??????4?????64??????2?????1?????1????0??????0?????? 000001bacd4d0000?00001002????8772???6748???7216???1045???191?????4????0?????38???LFHExternal?fragmentation??15?%?(191?free?blocks) 000001bacdf90000?00001002????2636????404???1080?????33?????3?????2????0??????0???LFH 000001bace620000?00001002????8772???4052???7216???3874????13?????7????0?????1f???LFHExternal?fragmentation??95?%?(13?free?blocks) 000001bace610000?00001003??????60??????8?????60??????6?????1?????1????0????N/A??? 000001bace540000?00001002????1616?????24?????60??????4?????2?????1????0??????1???LFH 000001baceb50000?00001002????4680???1228???3124????504????99?????3????0??????0???LFHExternal?fragmentation??41?%?(99?free?blocks) 000001baceb20000?00041002??????60??????8?????60??????5?????1?????1????0??????0?????? 000001baceb10000?00041002????1616?????68?????60??????4?????3?????1????0??????0???LFH 000001c7738a0000?00001002???49336??19316??47780???8249????43????22????0????13b???LFHExternal?fragmentation??42?%?(43?free?blocks) 000001c7753c0000?00001002???13712???8460??12156????968????29?????6????0?????1c???LFHExternal?fragmentation??11?%?(29?free?blocks) 000001c7763f0000?00001002????8772???3944???7216????423????25?????4????0?????3f???LFH 000001ba977c0000?00001002????1080????376???1080????365?????3?????2????0??????0?????? -------------------------------------------------------------------------------------從上面的信息可以看出,當前有 14個 heap,其中最大的一個heap占了 4.8G,為啥這個heap這么大?接下來詳細看下這個heap,可使用 !ext.heap -stat -h 000001bacd190000 。
0:000>?!ext.heap?-stat?-h?000001bacd190000heap?@?000001bacd190000 group-by:?TOTSIZE?max-display:?20size?????#blocks?????total?????(?%)?(percent?of?total?busy?bytes)20034?8eee?-?11df90858??(96.44)2ee0000?2?-?5dc0000??(1.98)851?1c2b?-?ea419b??(0.31)2ac00?28?-?6ae000??(0.14)27d8?268?-?5fdfc0??(0.13)24000?28?-?5a0000??(0.12)d51?564?-?47c8a4??(0.09)10d1?3e7?-?419f97??(0.09)fd1?415?-?409025??(0.09)29d1?12f?-?317e5f??(0.07)138?18b0?-?1e1680??(0.04)12c?188b?-?1cc2e4??(0.04)1000?17e?-?17e000??(0.03)2000?8e?-?11c000??(0.02)200?899?-?113200??(0.02)ad1?178?-?fe2f8??(0.02)478?367?-?f3448??(0.02)7c8?1b9?-?d6788??(0.02)1c038?7?-?c4188??(0.02)f520?c?-?b7d80??(0.02)可能很多人看不懂上面的卦象,首先 busy表示那些最近分配還未釋放的,從卦頭看,size=20034 的 block 有 36590 個,總占用:11df90858 = 4797827160byte = 4.7G,接下來的疑問很顯然了,這些 block 里面到底都是些什么???要想找到答案,把這 3w 多的 block 信息都顯示出來,可以用命令:!ext.heap -flt s 20034。
0:000>?!ext.heap?-flt?s?20034_HEAP?@?1bacd190000HEAP_ENTRY?Size?Prev?Flags????????????UserPtr?UserSize?-?state000001c771f2ad30?2004?0000??[00]???000001c771f2ad40????20034?-?(busy)000001c774a65160?2004?2004??[00]???000001c774a65170????20034?-?(busy)000001c774a851a0?2004?2004??[00]???000001c774a851b0????20034?-?(busy)000001c774aa51e0?2004?2004??[00]???000001c774aa51f0????20034?-?(busy)000001c774ac5220?2004?2004??[00]???000001c774ac5230????20034?-?(busy)000001c774ae5260?2004?2004??[00]???000001c774ae5270????20034?-?(busy)000001c774b052a0?2004?2004??[00]???000001c774b052b0????20034?-?(busy)000001c774b29320?2004?2004??[00]???000001c774b29330????20034?-?(busy)000001c774b49360?2004?2004??[00]???000001c774b49370????20034?-?(busy)000001c774b693a0?2004?2004??[00]???000001c774b693b0????20034?-?(busy)000001c774b893e0?2004?2004??[00]???000001c774b893f0????20034?-?(busy)unknown!noop000001c774ba9420?2004?2004??[00]???000001c774ba9430????20034?-?(busy)......block塊信息太多,這里我就貼一部分上去,上面列的 HEAP_ENTRY 就是 block 的首地址,然后我通過 dc 一頓找,發現不少下面的輸出。
0:000>?dc?000001c774a65160?L?50 000001c7`74a65160??dddddddd?00000000?cada2944?0c85cc02??........D)...... 000001c7`74a65170??74a21070?000001c7?74a851b0?000001c7??p..t.....Q.t.... 000001c7`74a65180??00000000?00000000?00000000?00000001??................ 000001c7`74a65190??00020000?00000000?0000007a?fdfdfdfd??........z....... 000001c7`74a651a0??00c801aa?55028000?05040355?44b60706??.......UU......D 000001c7`74a651b0??693d6f55?69502c31?32693d6e?7361502c??Uo=i1,Pin=i2,Pas 000001c7`74a651c0??726f7773?33733d64?6f72472c?693d7075??sword=s3,Group=i 000001c7`74a651d0??74532c34?54747261?3d656d69?452c3569??4,StartTime=i5,E 000001c7`74a651e0??6954646e?693d656d?75532c36?41726570??ndTime=i6,SuperA 000001c7`74a651f0??6f687475?657a6972?0a37693d?72657375??uthorize=i7.user 000001c7`74a65200??68747561?7a69726f?2c323d65?3d6e6950??authorize=2,Pin= 000001c7`74a65210??412c3169?6f687475?657a6972?656d6954??i1,AuthorizeTime 000001c7`74a65220??656e6f7a?693d6449?75412c32?726f6874??zoneId=i2,Author 000001c7`74a65230??44657a69?49726f6f?33693d64?6c6f680a??izeDoorId=i3.hol 000001c7`74a65240??79616469?482c333d?64696c6f?693d7961??iday=3,Holiday=i 000001c7`74a65250??6f482c31?6164696c?70795479?32693d65??1,HolidayType=i2 000001c7`74a65260??6f6f4c2c?33693d70?6d69740a?6e6f7a65??,Loop=i3.timezon 000001c7`74a65270??2c343d65?656d6954?656e6f7a?693d6449??e=4,TimezoneId=i 000001c7`74a65280??75532c31?6d69546e?693d3165?75532c32??1,SunTime1=i2,Su 000001c7`74a65290??6d69546e?693d3265?75532c33?6d69546e??nTime2=i3,SunTim說實話用dc一個一個找,真的太累,這里我就寫一個簡單的腳本,把前1w個block都dc出來看看內容咋樣?
"use?strict";var?index?=?1;function?initializeScript()?{?return?[new?host.apiVersionSupport(1,?7)];?} function?log(str)?{?host.diagnostics.debugLog(str?+?"\n");?} function?exec(str)?{?log("\n"?+?str);?return?host.namespace.Debugger.Utility.Control.ExecuteCommand(str);?}function?invokeScript()?{show_heap_s(); }function?show_heap_s()?{//get?top?1?var?output?=?exec("!heap?-s").Skip(10).First();var?h_address?=?output.split('?')[0];show_max_blocksize(h_address); }function?show_max_blocksize(address)?{var?output?=?exec("!ext.heap?-stat?-h?"?+?address).Skip(3).First();var?block_size?=?output.trim().split('?')[0];show_all_blocksize(block_size); }function?show_all_blocksize(blocksize)?{var?output?=?exec("!ext.heap?-flt?s?"?+?blocksize).Take(10000);for?(var?line?of?output)?{var?heap_entry_address?=?line.trim().split('?')[0];if?(heap_entry_address.indexOf("00")?==?-1)?continue;show_heap_entry(heap_entry_address);} }function?show_heap_entry(heap_entry_address)?{var?pageIndex?=?(index++);var?path?=?".writemem?D:\\dumps\\winform-memory-leak\\file\\"?+?pageIndex?+?".txt?"?+?heap_entry_address?+?"?L?0x500";var?output?=?exec(path);log("pageIndex="?+?pageIndex); }腳本執行后,輸出結果如下:
問了下朋友這些字符串大概是干嘛的?為啥非托管中有這么多的string沒有得到釋放,朋友告訴我這個大概是門禁相關業務,是通過 plc 方式和 C# 進行交互,分析到這里我能提供的信息都已提供了,接下來就要和門禁業務方確認下如何進一步定位和改進了。
三:總結
貌似這是20篇dump案例分享中第一個聊到非托管泄露的問題,曾今我在B站上說只專注于分析.NET托管內存泄漏,看樣子很難實現哈,確實 C# 和 lua,C++,COM,內嵌瀏覽器 的交互造成非托管內存泄漏的例子數不勝數哈 ????????????
END
工作中的你,是否已遇到 ...?
1. CPU爆高
2. 內存暴漲
3. 資源泄漏
4. 崩潰死鎖
5. 程序呆滯
等緊急事件,全公司都指望著你能解決...? 危難時刻才能展現你的技術價值,作為專注于.NET高級調試的技術博主,歡迎微信搜索: 一線碼農聊技術,免費協助你分析Dump文件,希望我能將你的踩坑經驗分享給更多的人。
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的记一次 .NET 某智慧水厂API 非托管内存泄漏分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 基于事件驱动架构构建微服务第3部分:Pr
- 下一篇: Magicodes.IE之快速导出Exc