排查 .NET开发的工厂MES系统 内存泄漏分析
一:背景
1. 講故事
上個月有位朋友加微信求助,說他的程序跑著跑著就內存爆掉了,尋求如何解決,截圖如下:
從聊天內容看,這位朋友壓力還是蠻大的,話說這貌似是我分析的第三個 MES 系統了,看樣子 .NET 在傳統工廠是巨無霸的存在哈。。。
話不多說,一起用 Windbg 一探究竟吧。
二:Windbg 分析
1. 托管還是非托管
先看下進程的commit內存,用 !address -summary 即可。
0:000>?!address?-summaryMapping?file?section?regions... Mapping?module?regions... Mapping?PEB?regions... Mapping?TEB?and?stack?regions... Mapping?heap?regions... Mapping?page?heap?regions... Mapping?other?regions... Mapping?stack?trace?database?regions... Mapping?activation?context?regions...---?Type?Summary?(for?busy)?------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal MEM_PRIVATE?????????????????????????????971??????????e7d6b000?(???3.622?GB)??95.24%???90.56% MEM_IMAGE??????????????????????????????1175???????????ac5d000?(?172.363?MB)???4.43%????4.21% MEM_MAPPED???????????????????????????????34????????????d08000?(??13.031?MB)???0.33%????0.32%---?State?Summary?----------------?RgnCount?-----------?Total?Size?--------?%ofBusy?%ofTotal MEM_COMMIT?????????????????????????????1806??????????edfd9000?(???3.719?GB)??97.77%???92.97% MEM_FREE????????????????????????????????190???????????c920000?(?201.125?MB)????????????4.91% MEM_RESERVE?????????????????????????????374???????????56f7000?(??86.965?MB)???2.23%????2.12%...可以看到,當前占用內存是 3.79G,從內存地址看是一個 32bit 程序,看樣子程序在崩潰的邊緣哈😂😂😂,接下來我們看下 托管堆內存 占用,使用 !eeheap -gc 命令。
0:000>?!eeheap?-gc Number?of?GC?Heaps:?1 generation?0?starts?at?0xf35a90c0 generation?1?starts?at?0xf33a1000 generation?2?starts?at?0x01db1000 ephemeral?segment?allocation?context:?nonesegment?????begin??allocated??????size...f7790000??f7791000??f8058854??0x8c7854(9205844) f33a0000??f33a1000??f3ba6e84??0x805e84(8412804) Large?object?heap?starts?at?0x02db1000segment?????begin??allocated??????size 02db0000??02db1000??0387e988??0xacd988(11327880) Total?Size:??????????????Size:?0xdcab5ca8?(3702217896)?bytes. ------------------------------ GC?Heap?Size:????Size:?0xdcab5ca8?(3702217896)?bytes.從輸出信息看,托管堆內存占用 3.7G,這是一個相對簡單的 托管內存泄漏 問題了。
2. 探究托管堆
要查看托管堆還是很簡單的,先來一個大一統的命令 !dumpheap -stat。
0:000>?!dumpheap?-stat Statistics:MT????Count????TotalSize?Class?Name ... 04b045d0????67663?????25711940?xxx.Product.Mes.DataStore.EF.MesDbContext 719f0100??3458387?????41500644?System.Object 719f1b84???281492?????42391384?System.Int32[] 0489adb0??2238394?????44767880?xxx.Application.Features.FeatureChecker 71551e00??2238503?????53724072?System.Collections.Generic.List`1[[System.String,?mscorlib]] 07c473e0??5615923?????67391076?System.Data.Entity.Core.Objects.Internal.ObjectQueryExecutionPlanFactory 07c68954??5683589?????68203068?System.Data.Entity.Core.Common.Internal.Materialization.Translator 04c7e3a8??4042677?????71990132?Castle.DynamicProxy.IInterceptor[] 014a80c0??3142755?????80480594??????Free 042ecd18??5869494?????93911904?xxxx.Domain.Uow.UnitOfWorkInterceptor 096ed32c????67663?????97164068?System.Collections.Generic.Dictionary`2+Entry[[System.Type,?mscorlib],[System.Data.Entity.Internal.Linq.IInternalSetAdapter,?EntityFramework]][] 0488edb0?12641117????151693404?xxx.Domain.Uow.AsyncLocalCurrentUnitOfWorkProvider 0488fa50?10769173????215383460?xxx.Domain.Uow.UnitOfWorkManager 07cc0fb0??5548261????355088704?System.Data.Entity.Core.Objects.EntitySqlQueryState 719efd60?11275964???1268805768?System.String從卦象上看,沉底的基本都是和 EF 相關的類,相對來說 string 一般都是被這些 EF 所持有,而且還發現了一個非常異常的地方,就是 MesDbContext 居然有 6w 多,看樣子有些不正常,接下來就抽幾個查一下引用,大概都是如下輸出:
0:000>?!gcroot?17d2e438 HandleTable:014313c8?(pinned?handle)->?02dd9020?System.Object[]->?0260abf4?System.Collections.Concurrent.ConcurrentDictionary`2[[System.Data.Entity.DbContext,?EntityFramework],[System.Collections.Concurrent.ConcurrentDictionary`2[[System.String,?mscorlib],[EntityFramework.DynamicFilters.DynamicFilterParameters,?EntityFramework.DynamicFilters]],?mscorlib]]->?b96074a4?System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[System.Data.Entity.DbContext,?EntityFramework],[System.Collections.Concurrent.ConcurrentDictionary`2[[System.String,?mscorlib],[EntityFramework.DynamicFilters.DynamicFilterParameters,?EntityFramework.DynamicFilters]],?mscorlib]]->?02fcddb0?System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Data.Entity.DbContext,?EntityFramework],[System.Collections.Concurrent.ConcurrentDictionary`2[[System.String,?mscorlib],[EntityFramework.DynamicFilters.DynamicFilterParameters,?EntityFramework.DynamicFilters]],?mscorlib]][]->?b955eecc?System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Data.Entity.DbContext,?EntityFramework],[System.Collections.Concurrent.ConcurrentDictionary`2[[System.String,?mscorlib],[EntityFramework.DynamicFilters.DynamicFilterParameters,?EntityFramework.DynamicFilters]],?mscorlib]]->?17d2e438?xxx.DataStore.EF.MesDbContext從引用鏈來看,這些 MesDbContext 都是被 ConcurrentDictionary<DbContext,ConcurrentDictionary<string,DynamicFilterParameters>> 所持有,接下來需要判斷下這個字典的 size 到底有多大,可以用 !objsize 命令。
0:000>?!objsize?0260abf4 e06d7363?Exception?in?c:\mysymbols\SOS_x86_x86_4.7.3701.00.dll\5F4FF1AE6f0000\SOS_x86_x86_4.7.3701.00.dll.objsize?debugger?extension.PC:?757ea842??VA:?022ce8f4??R/W:?19930520??Parameter:?7b9bb5280:000>?!DumpObj?/d?02fcddb0 Name:????????System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Data.Entity.DbContext,?EntityFramework],[System.Collections.Concurrent.ConcurrentDictionary`2[[System.String,?mscorlib],[EntityFramework.DynamicFilters.DynamicFilterParameters,?EntityFramework.DynamicFilters]],?mscorlib]][] MethodTable:?0973cb60 EEClass:?????715c4fc0 Size:????????573440(0x8c000)?bytes Array:???????Rank?1,?Number?of?elements?143357,?Type?CLASS?(Print?Array) Fields: None經過漫長的等待,害,最后報錯了,但也可以看到這個 dictionary 有 14.3w 條記錄, 接下來嚴峻的問題就來了,這個 ConcurrentDictionary 是朋友定義的還是框架內的?所以下一步就需要找到它的歸屬類?
3. 探究字典到底屬于哪個類
要想找到 字典 的歸屬類,這個相對有點麻煩,我為此在 B 站上錄了一集專門聊這個,有興趣的朋友可以看一看。
總而言之,整體思路是:
先找 17d2e438(MesDbContext) 在 0260abf4(dictionary) 中的 address (address1) 。
再從內存中尋找這個 address(address1) 的 address (address2)。
這個 address2 就存在于那個引用此dictionary的方法體,然后就可以反編譯出該方法體,查看它的EEClass,最終找到所屬類名。
接下來我們就實戰一下。
查看 object[] 的 size。
尋找 address1
用 s -d 搜索內存。
0:000>?s?-d?02dd9020?L?0xfffc?0260abf4 02de11a4??0260abf4?0260ad04?0260ad2c?08320d20??..`...`.,.`.?.2.這個 02de11a4 就是我要找的 address1,這里稍微解釋一下,-d 表示按 32bit 搜索, -q 按 64bit 搜索, L?0xfffc 是 object[] 數組的 size 。
尋找 address2
這里將地址拆成 02de11a4 = a4 11 de 02 去搜索,不然有坑的哈。
0:000>?s-b?0?L?0xffffffff?a4?11?de?02 0695d2f9??a4?11?de?02?e8?be?14?f9-6b?b9?18?3c?34?70?e8?bc??........k..<4p.. 09e9438b??a4?11?de?02?39?09?e8?9a-11?af?67?8b?f0?a1?bc?11??....9.....g.....從輸出看,有兩個代碼區域用到了 dict, 因為是全內存搜索的,這里就挑選最后一個 address2=09e9438b 吧。
反編譯address2
使用 !U 反編譯,然后再 !name2ee + !dumpmd + !dumpclass 即可。
0:000>?!U?09e9438b Normal?JIT?generated?code EntityFramework.DynamicFilters.DynamicFilterExtensions.GetOrCreateScopedFilterParameters(System.Data.Entity.DbContext,?System.String) Begin?09e94320,?size?1e1 09e94320?55??????????????push????ebp ... 09e9433a?8bf1????????????mov?????esi,ecx 09e9433c?b95088ea09??????mov?????ecx,9EA8850h?(MT:?EntityFramework.DynamicFilters.DynamicFilterExtensions+<>c__DisplayClass71_0) 09e94341?e882ed5af7??????call????014430c8?(JitHelp:?CORINFO_HELP_NEWSFAST) 09e94346?8bf8????????????mov?????edi,eax 09e94348?8d5704??????????lea?????edx,[edi+4] 09e9434b?e800a5a568??????call????clr!JIT_WriteBarrierESI?(728ee850)0:000>?!name2ee?*!EntityFramework.DynamicFilters.DynamicFilterExtensions.GetOrCreateScopedFilterParameters Module:??????0973aef4 Assembly:????EntityFramework.DynamicFilters.dll Token:???????0600005e MethodDesc:??0973b8fc Name:????????EntityFramework.DynamicFilters.DynamicFilterExtensions.GetOrCreateScopedFilterParameters(System.Data.Entity.DbContext,?System.String) JITTED?Code?Address:?09e943200:000>?!dumpmd?0973b8fc Method?Name:??EntityFramework.DynamicFilters.DynamicFilterExtensions.GetOrCreateScopedFilterParameters(System.Data.Entity.DbContext,?System.String) Class:????????0974c7d8 MethodTable:??0973b938 mdToken:??????0600005e Module:???????0973aef4 IsJitted:?????yes CodeAddr:?????09e94320 Transparency:?Critical0:000>?!dumpclass?0974c7d8 Class?Name:??????EntityFramework.DynamicFilters.DynamicFilterExtensions mdToken:?????????02000006 File:????????????D:\xxx\Debug\EntityFramework.DynamicFilters.dll Parent?Class:????715415b0 Module:??????????0973aef4 Method?Table:????0973b938 Vtable?Slots:????4 Total?Method?Slots:??20 Class?Attributes:????100181??Abstract,? Transparency:????????Critical NumInstanceFields:???0 NumStaticFields:?????5MT????Field???Offset?????????????????Type?VT?????Attr????Value?Name 0973bfcc??400000d????????c?....DynamicFilters]]??0???static?0260a9d4?_GlobalParameterValues 0973c3f4??400000e???????10?...ers]],?mscorlib]]??0???static?0260abf4?_ScopedParameterValues 70343c18??400000f???????14?...tring,?mscorlib]]??0???static?0260ad04?_PreventDisabledFilterConditions 71a34804??4000010???????43???????System.Boolean??1???static????????1?_Initialized 05ec9adc??4000011???????18?...rsion,?mscorlib]]??0???static?0260ad2c?_OracleInstanceVersions終于給找到了,原來是EF底層的 EntityFramework.DynamicFilters.DynamicFilterExtensions 類哈,導出源碼如下:
最后就是拿 6w多的 MesDbContext 和 14w+的 _ScopedParameterValues 字典和朋友做了溝通,朋友也找到了解決辦法。
三:總結
根據朋友提供的信息,最后注釋掉了構造函數中的 MesDbContext 解決了問題,EF我不熟,有懂的朋友可以留言分析下哈。
總結
以上是生活随笔為你收集整理的排查 .NET开发的工厂MES系统 内存泄漏分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 龙芯发布.NET 6.0.100开发者内
- 下一篇: C# WPF DataGrid获取单元格