如何抓获JVM crash的幕后黑手
2019獨角獸企業重金招聘Python工程師標準>>>
? ?最近幾天線上jboss服務器經常莫名地突然停止運行,導致半夜都被報警短信吵醒,元旦幾天也基本就在收報警,然后重啟系統。查看jboss控制臺錯誤日志,發現只有下面一行:
/opt/.../jboss/bin/run.sh: line 181: 26430 段錯誤????????????????? "$JAVA" $JAVA_OPTS -Djava.endorsed.dirs="$JBOSS_ENDORSED_DIRS" -classpath "$JBOSS_CLASSPATH" org.jboss.Main "$@"
??????? 一開始對上面的錯誤日志也沒引起重視,一直在找JVM crash的日志hs_err_<pid>,看了很多網上的帖子,包括JVM官網的介紹,為什么沒有生成hs_err_pid日志,可能的原因就是沒權限或者空間不夠,查看了磁盤空間是足夠的,于是想到可能沒有權限(其實默認目錄就是JVM的工作目錄,應該是有權限的,但當時唯一能懷疑的就是這個),所以就想到JVM參數(-XX:ErrorFile=./hs_err_<pid>.log)可以配置這個日志文件的保存目錄,但這個參數是JDK1.6才引入的,而我們系統杯催的還停留在JDK1.5上,沒辦法想到另外一個參數(-XX:OnError="<cmd args>;<cmd args>"),里面配置的命令是kill -3 <pid>,這樣JVM在crash的時候可以將線程堆棧dump出來,想法是好的,可這個命令一直沒配上去,只能作罷。
?????? 拿不到hs_err_<pid>日志,唯一的線索就是上面僅有的一行錯誤日志,于是google“段錯誤”“segmentation fault”,發現可以通過core dump日志進行分析,解決思路也就自然而然有了。
1.core dump介紹
??????? 程序異常退出(crash)時會自動生成一個core文件,包含了程序運行時的內存,寄存器狀態,堆棧指針,內存管理等信息,也就是把程序當時工作的狀態存儲成一個文件。不僅僅是在出錯的時候會生成core dump文件,在系統卡住或者cpu使用率很高的時候也可以手動觸發生成core dump文件(當然這種情況也可以直接通過jmap和jstack dump出內存和線程堆棧,但JDK1.6以前,這2個命令經常執行不成功),這樣就可以很好的分析了。
2.生成core dump文件
?????? core dump文件生成開關其實是通過對生成的文件大小進行控制達到的,默認大小是0,也就是說默認是不生成core dump文件的,可以通過命令ulimit -c進行查看。將此參數修改成unlimited就可以生成core dump文件了,但值得注意的一點是,每個應用進程都會讀取自己的一套系統參數,可以查看進程對應的內存文件/proc/<pid>/limits中的信息來判斷修改后的參數值是否對此應用進程生效了,limits文件中的信息如下:
Limit???????????????????? Soft Limit?????????? Hard Limit?????????? Units?????
Max cpu time????????????? unlimited??????????? unlimited??????????? seconds???
Max file size???????????? unlimited??????????? unlimited??????????? bytes?????
Max data size???????????? unlimited??????????? unlimited??????????? bytes?????
Max stack size??????????? 10485760???????????? unlimited??????????? bytes?????
Max core file size????????unlimited???????????unlimited??????????? bytes????
Max resident set????????? unlimited??????????? unlimited??????????? bytes?????
Max processes???????????? 20480??????????????? 20480??????????????? processes?
Max open files??????????? 204800?????????????? 204800?????????????? files?????
Max locked memory???????? 32768??????????????? 32768??????????????? bytes?????
Max address space???????? unlimited??????????? unlimited??????????? bytes?????
Max file locks??????????? unlimited??????????? unlimited??????????? locks?????
Max pending signals?????? 1024???????????????? 1024???????????????? signals???
Max msgqueue size???????? 819200?????????????? 819200?????????????? bytes
????? 上述信息中的對應的參數值為unlimited,代表此應用進程的core dump開關是開著的。ulimit -c unlimited這個命令可以在很多地方修改,比如用root在命令行執行,在/etc/profile中增加(/etc/profile文件中已經有defaultulimit -S -c 0 > /dev/null 2>&1,可以直接修改0為unlimited),在用戶的.bash_profile中增加..這些方式執行后的影響范圍還是挺復雜的,到現在也沒弄清楚。當時通過root用戶命令行修改后,在shell里查詢ulimit -c是生效了,以為應用程序也生效了,覺得不用應用都不需要重啟就生效了挺好,等到下一次crash的時候發現還是沒有生成core dump文件,于是到文件/proc/<pid>/limits中檢查此參數值才發現應用程序所讀取的值還是0。后來采用一種比較簡單直接的方式,用root用戶修改/etc/profile文件中的ulimit -c的值,然后重新登錄,重啟應用,再檢查/proc/<pid>/limits中的值,確保參數已經生效,后來應用再crash時core dump文件也如期而至,心里暗爽,終于可以開始分析堆棧了。
?????? core dump文件生成的默認路徑在用戶的工作目錄(啟動腳本目錄),可以通過設置參數/proc/sys/kernel/core_pattern的值改變core dump文件的最終生成路徑及文件名格式。core dump文件成功后,在jboss控制臺錯誤日志中的日志也多了core dumped關鍵字,如下:
/opt/.../jboss/bin/run.sh: line 181:? 9176 段錯誤????????????????? (core dumped) "$JAVA" $JAVA_OPTS -Djava.endorsed.dirs="$JBOSS_ENDORSED_DIRS" -classpath "$JBOSS_CLASSPATH" org.jboss.Main "$@"
3.core dump分析
????? 有了core dump文件,接下來要做的就是通過命令去解析此文件,定位具體問題了,主要有以下三個命令:
(1)先執行gdb?$JAVA_HOME$/bin/java?core-java-16427-1325846515,再執行bt,輸出結果如下:
Loaded symbols for /opt/.../oracle/lib/libnnz10.so
Reading symbols from /opt/.../install/oracle/10.2.0.3/lib/libociei.so...(no debugging symbols found)...done.
Loaded symbols for /opt/.../oracle/lib/libociei.so
Reading symbols from /usr/lib/gconv/GB18030.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/gconv/GB18030.so
#0? 0x4662051e in ttcrd2r ()
?? from /opt/.../oracle/lib/libclntsh.so.10.1
(gdb) bt
#0? 0x4662051e in ttcrd2r () from /opt/.../oracle/lib/libclntsh.so.10.1
#1? 0x46617212 in ttcrbur () from /opt/.../oracle/lib/libclntsh.so.10.1
#2? 0x466176e6 in ttcbur () from /opt/.../oracle/lib/libclntsh.so.10.1
#3? 0x465ba8c7 in ttcdrv () from /opt/.../oracle/lib/libclntsh.so.10.1
#4? 0x464a1bc8 in nioqwa () from /opt/.../oracle/lib/libclntsh.so.10.1
#5? 0x4630d618 in upirtrc () from /opt/.../oracle/lib/libclntsh.so.10.1
#6? 0x462825f6 in kpurcsc () from /opt/.../oracle/lib/libclntsh.so.10.1
#7? 0x46236a0d in kpuexecv8 () from /opt/.../oracle/lib/libclntsh.so.10.1
#8? 0x46238ec4 in kpuexec () from /opt/.../oracle/lib/libclntsh.so.10.1
#9? 0x463121b6 in OCIStmtExecute () from /opt/.../oracle/lib/libclntsh.so.10.1
#10 0x46e725aa in Java_oracle_jdbc_driver_T2CConnection_t2cClearAllApplicationContext ()
?? from /opt/.../install/oracle/10.2.0.3/lib/libocijdbc10.so
#11 0x46e7580f in Java_oracle_jdbc_driver_T2CStatement_t2cDefineExecuteFetch ()
?? from /opt/.../install/oracle/10.2.0.3/lib/libocijdbc10.so
#12 0xb125a8b8 in ?? ()
#13 0x47e01bc8 in ?? ()
#14 0x47d46bb0 in ?? ()
#15 0x47d46c40 in ?? ()
#16 0x4babf774 in ?? ()
#17 0x00000000 in ?? ()
?????? 通過這個debug命令可以查看到應用crash時正在做什么,這里是在獲取jdbc連接,但這個并不代表問題就出在這里,有可能這只是個結果,起因還需要進一步排查。
(2)光有上述信息還不足以定位出問題在哪,可以通過命令jstack?$JAVA_HOME$/bin/java?core-java-16427-1325846515獲取應用crash時的線程堆棧,輸出結果如下:
Thread 11586: (state = BLOCKED)
?- java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
?- java.lang.Object.wait() @bci=2, line=474 (Compiled frame)
?- edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingDeque.takeFirst() @bci=20, line=389 (Compiled frame)
?- edu.emory.mathcs.backport.java.util.concurrent.LinkedBlockingDeque.take() @bci=1, line=577 (Compiled frame)
?- edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=18, line=674 (Compiled frame)
Error occurred during stack walking:
Thread 10924: (state = IN_NATIVE)
?- oracle.jdbc.driver.T2CStatement.t2cDefineExecuteFetch(oracle.jdbc.driver.OracleStatement, long, int, int, int, int, boolean, boolean, byte[], int, byte, int, int, short[], int, byte[], char[], int, int, short[], byte[], int, int, boolean, boolean, oracle.jdbc.driver.Accessor[], byte[][][], long[], byte[], int, char[], int, short[], int) @bci=0 (Interpreted frame)
?- oracle.jdbc.driver.T2CPreparedStatement.doDefineExecuteFetch() @bci=190, line=878 (Interpreted frame)
?- oracle.jdbc.driver.T2CPreparedStatement.executeForRows(boolean) @bci=34, line=760 (Interpreted frame)
?- oracle.jdbc.driver.OracleStatement.executeMaybeDescribe() @bci=154, line=1062 (Interpreted frame)
?- oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout() @bci=102, line=1126 (Interpreted frame)
?- oracle.jdbc.driver.OraclePreparedStatement.executeInternal() @bci=94, line=3339 (Interpreted frame)
?- oracle.jdbc.driver.OraclePreparedStatement.execute() @bci=17, line=3445 (Interpreted frame)
?- org.jboss.resource.adapter.jdbc.CachedPreparedStatement.execute() @bci=4, line=216 (Interpreted frame)
?- org.jboss.resource.adapter.jdbc.WrappedPreparedStatement.execute() @bci=12, line=209 (Compiled frame)
?- com.ibatis.sqlmap.engine.execution.SqlExecutor.executeQuery(com.ibatis.sqlmap.engine.scope.RequestScope, java.sql.Connection, java.lang.String, java.lang.Object[], int, int, com.ibatis.sqlmap.engine.mapping.statement.RowHandlerCallback) @bci=135, line=180 (Compiled frame)
?- com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.sqlExecuteQuery(com.ibatis.sqlmap.engine.scope.RequestScope, java.sql.Connection, java.lang.String, java.lang.Object[], int, int, com.ibatis.sqlmap.engine.mapping.statement.RowHandlerCallback) @bci=15, line=205 (Interpreted frame)
?- com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithCallback(com.ibatis.sqlmap.engine.scope.RequestScope, java.sql.Connection, java.lang.Object, java.lang.Object, com.ibatis.sqlmap.client.event.RowHandler, int, int) @bci=169, line=173 (Compiled frame)
?- com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryForObject(com.ibatis.sqlmap.engine.scope.RequestScope, com.ibatis.sqlmap.engine.transaction.Transaction, java.lang.Object, java.lang.Object) @bci=28, line=104 (Compiled frame)
?- com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(com.ibatis.sqlmap.engine.scope.SessionScope, java.lang.String, java.lang.Object, java.lang.Object) @bci=58, line=561 (Compiled frame)
?- com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(com.ibatis.sqlmap.engine.scope.SessionScope, java.lang.String, java.lang.Object) @bci=5, line=536 (Interpreted frame)
?- com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryForObject(java.lang.String, java.lang.Object) @bci=10, line=93 (Interpreted frame)
?- org.springframework.orm.ibatis.SqlMapClientTemplate$1.doInSqlMapClient(com.ibatis.sqlmap.client.SqlMapExecutor) @bci=9, line=273 (Interpreted frame)
?- org.springframework.orm.ibatis.SqlMapClientTemplate.execute(org.springframework.orm.ibatis.SqlMapClientCallback) @bci=242, line=209 (Compiled frame)
?- org.springframework.orm.ibatis.SqlMapClientTemplate.queryForObject(java.lang.String, java.lang.Object) @bci=11, line=271 (Compiled frame)
......
?- org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=219, line=809 (Interpreted frame)
?- org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=571 (Compiled frame)
?- org.springframework.web.servlet.FrameworkServlet.doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=511 (Interpreted frame)
?- javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=139, line=717 (Interpreted frame)
?- javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=810 (Compiled frame)
?- org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=173 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.invoke(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=396, line=672 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.processRequest(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=64, line=463 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.doForward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=469, line=398 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.forward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=55, line=301 (Interpreted frame)
?- org.springframework.web.servlet.view.InternalResourceView.renderMergedOutputModel(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=244, line=240 (Interpreted frame)
?- org.springframework.web.servlet.view.AbstractView.render(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=157, line=252 (Interpreted frame)
?- org.springframework.web.servlet.DispatcherServlet.render(org.springframework.web.servlet.ModelAndView, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=225, line=1173 (Compiled frame)
?- org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=219, line=809 (Interpreted frame)
?- org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=571 (Compiled frame)
?- org.springframework.web.servlet.FrameworkServlet.doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=511 (Interpreted frame)
?- javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=139, line=717 (Interpreted frame)
?- javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=810 (Compiled frame)
?- org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=173 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.invoke(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=396, line=672 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.processRequest(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=64, line=463 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.doForward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=469, line=398 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.forward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=55, line=301 (Interpreted frame)
?- org.springframework.web.servlet.view.InternalResourceView.renderMergedOutputModel(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=244, line=240 (Interpreted frame)
?- org.springframework.web.servlet.view.AbstractView.render(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=157, line=252 (Interpreted frame)
?- org.springframework.web.servlet.DispatcherServlet.render(org.springframework.web.servlet.ModelAndView, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=225, line=1173 (Compiled frame)
?- org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=219, line=809 (Interpreted frame)
?- org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=571 (Compiled frame)
?- org.springframework.web.servlet.FrameworkServlet.doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=511 (Interpreted frame)
?- javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=139, line=717 (Interpreted frame)
?- javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=810 (Compiled frame)
?- org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=173 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.invoke(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=396, line=672 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.processRequest(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=64, line=463 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.doForward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=469, line=398 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.forward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=55, line=301 (Interpreted frame)
?- org.springframework.web.servlet.view.InternalResourceView.renderMergedOutputModel(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=244, line=240 (Interpreted frame)
?- org.springframework.web.servlet.view.AbstractView.render(java.util.Map, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=157, line=252 (Interpreted frame)
?- org.springframework.web.servlet.DispatcherServlet.render(org.springframework.web.servlet.ModelAndView, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=225, line=1173 (Compiled frame)
?- org.springframework.web.servlet.DispatcherServlet.doService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=219, line=809 (Interpreted frame)
?- org.springframework.web.servlet.FrameworkServlet.processRequest(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=10, line=571 (Compiled frame)
?- org.springframework.web.servlet.FrameworkServlet.doPost(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=511 (Interpreted frame)
?- javax.servlet.http.HttpServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=139, line=717 (Interpreted frame)
?- javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=810 (Compiled frame)
?- org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=173 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.invoke(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=396, line=672 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.processRequest(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=64, line=463 (Interpreted frame)
?- org.apache.catalina.core.ApplicationDispatcher.doForward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=469, line=398 (Interpreted frame)
? - org.apache.catalina.core.ApplicationDispatcher.forward(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=55, line=301 (Interpreted frame)
......
?????? 通過這個線程堆??梢钥闯鰡栴}根本原因是一直在forward,代碼出現死循環,總的forward的次數有100多次,直到線程堆棧溢出。這也能說明前面通過gdb命令查看到的獲取jdbc連接僅僅是最近的一次循環正在執行的操作而已,不是真正的起因。這個線程的死循環forward同樣影響到了其他的線程,導致出現Error occurred during stack walking錯誤,線程一直處于BLOCKED狀態,總之線程堆棧對分析問題原因是非常有幫助的。
(3).可以通過命令jmap $JAVA_HOME$/bin/java?core-java-16427-1325846515查看到應用的內存使用,結果如下:
using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC
Heap Configuration:
?? MinHeapFreeRatio = 40
?? MaxHeapFreeRatio = 70
?? MaxHeapSize????? = 1342177280 (1280.0MB)
?? NewSize????????? = 134217728 (128.0MB)
?? MaxNewSize?????? = 134217728 (128.0MB)
?? OldSize????????? = 402653184 (384.0MB)
?? NewRatio???????? = 15
?? SurvivorRatio??? = 1024
?? PermSize???????? = 100663296 (96.0MB)
?? MaxPermSize????? = 100663296 (96.0MB)
Heap Usage:
New Generation (Eden + 1 Survivor Space):
?? capacity = 134152192 (127.9375MB)
?? used???? = 22525464 (21.481956481933594MB)
?? free???? = 111626728 (106.4555435180664MB)
?? 16.790977220856742% used
Eden Space:
?? capacity = 134086656 (127.875MB)
?? used???? = 22525464 (21.481956481933594MB)
?? free???? = 111561192 (106.3930435180664MB)
?? 16.799183954591275% used
From Space:
?? capacity = 65536 (0.0625MB)
?? used???? = 0 (0.0MB)
?? free???? = 65536 (0.0625MB)
?? 0.0% used
To Space:
?? capacity = 65536 (0.0625MB)
?? used???? = 0 (0.0MB)
?? free???? = 65536 (0.0625MB)
?? 0.0% used
concurrent mark-sweep generation:
?? capacity = 1207959552 (1152.0MB)
?? used???? = 764300224 (728.8934936523438MB)
?? free???? = 443659328 (423.10650634765625MB)
Exception in thread "main" java.lang.OutOfMemoryError: requested 4096 bytes for jbyte in /BUILD_AREA/jdk1.5.0_08/hotspot/src/share/vm/prims/jni.cpp. Out of swap space?
????? 最后一行出現了OOM,這是因為dump出來的文件太大,32位機器支持的最大內存也就2G,所以在解析時出現OOM問題,可以放到64位服務器上進行解析。
4.常見的core dump原因
?????? 造成coredump的原因很多,常見的有內存訪問越界,非法指針,堆棧溢出等等,也可以通過發送操作系統信號中斷應用,比如kill -SIGSEGV <pid>,這是一個無效存儲信號。如果想手動生成core dump文件,可以通過gdb -p <pid>生成。
? ? ? ? 其實通過上一篇所講的core dump方法可以比較方便找到jvm crash問題所在,但這種方法適合crash比較頻繁,容易獲取到core dump文件,如果沒有這個文件,那怎么可以查到問題呢?,F在回過頭想想通過跟蹤crash時的apache日志,其實也能找到造成crash的請求,下面是某次crash時的日志:
[07/Jan/2012:07:52:40 +0800] "GET /a.xhtml HTTP/1.1" 200 2521 "-" "Mozilla/5.0 "? "a=-; b=-; c=-" -
[07/Jan/2012:07:52:47 +0800] "POST /a.xhtml HTTP/1.1" 503 323 "-" "Opera/9.80 "? "a=-; b=-; c=-" -
[07/Jan/2012:07:52:47 +0800] "POST /b.xhtml HTTP/1.0" 503 323 "-" "-" -? "a=-; b=-; c=-" -
[07/Jan/2012:07:52:51 +0800] "GET /a.xhtml HTTP/1.1" 503 323 "-" "Mozilla/5.0 "? "a=-; b=-; c=-" -
[07/Jan/2012:07:52:52 +0800] "GET /b.xhtml HTTP/1.1" 503 323 "-" "Mozilla/5.0 "? "a=-; b=-; c=-" -
[07/Jan/2012:07:52:48 +0800] "GET /c.do HTTP/1.1" 503 323 "-" "Mozilla/5.0 "? "a=-; b=-; c=-" -
[07/Jan/2012:07:52:52 +0800] "GET /c.do HTTP/1.1" 503 323 "-" "NokiaE63/UCWEB8.1.0.104/28/800"? "a=-; b=-; c=-" -
[07/Jan/2012:07:52:42 +0800] "POST /a.xhtml HTTP/1.1" 503 323 "-" "MAUI_WAP_Browser"? "a=-; b=-; c=-" -
[07/Jan/2012:07:50:26?+0800] "POST /e.xhtml HTTP/1.1" 503 323 "-" "MAUI_WAP_Browser"? "a=-; b=-; c=-" -
[07/Jan/2012:07:52:42 +0800] "GET /c.do HTTP/1.1" 503 323 "-" "MQQBrowser/2.9/Mozilla/5.0 "? "a=-; b=-; c=-" -
[07/Jan/2012:07:52:40 +0800] "GET /c.do HTTP/1.1" 503 323 "-" "MQQBrowser/2.0 "? "a=-; b=-; c=-" -
?????? 這里apache通過接收請求轉發給jboss處理,從上面的日志可以看出07:50:26的這個請求比較特殊,沒按時間先后排列,屬于“插隊”的請求。apache日志中打印出的時間是接收到請求時的時間,但日志是等到請求處理完后才打印,那么這個請求從7:50:26秒進來,處理了2分多鐘,我們應該首先懷疑它;另外一個值得注意的是請求的狀態,除了第一個請求是200狀態外,其他都是503錯誤,這說明“插隊”的請求不是瞬間就讓應用crash的,上面列出的第一個請求7:52:40進來的,還是可以正常執行完返回,同時這個“插隊”的請求在即將導致應用crash時,也會影響其他請求的處理,反映到apache日志上就是503狀態(處于第一個請求和“插隊”的請求之間的請求)。所以在apache日志中找出應用crash時刻的所有請求,然后在其中查找那個“插隊”比較明顯的請求(apache日志總的來說是按時間先后順序排列的,也會有秒級別的差異,但像出現分鐘級別的順序錯亂就需要引起重視),同時結合業務日志進一步確定代碼的執行順序,找出問題。?????
?????? 總之,碰到這種應用意外終止的情況,需要根據日志(內存日志、apache日志、應用日志)找到蛛絲馬跡方能抓到幕后黑手。這貌似一句廢話.
? ??/var/log/syslog.log
轉載于:https://my.oschina.net/CasparLi/blog/336317
總結
以上是生活随笔為你收集整理的如何抓获JVM crash的幕后黑手的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 标准数据库事务完整性约束
- 下一篇: 总结————AJAX应用的五个步骤: