聊聊JVM(九)理解进入safepoint时如何让Java线程全部阻塞
在這篇聊聊JVM(六)理解JVM的safepoint?中說了safepoint的基本概念,VM thread在進行GC前,必須要讓所有的Java線程阻塞,從而stop the world,開始標記。JVM采用了主動式阻塞的方式,Java線程不是隨時都可以進入阻塞,需要運行到特定的點,叫safepoint,在這些點的位置Java線程可以被全部阻塞,整個堆的狀態是一個暫時穩定的狀態,OopMap指出了這個時刻,寄存器和棧內存的哪些具體的地址是引用,從而可以快速找到GC roots來進行對象的標記操作。
?
那么當Java線程運行到safepoint的時候,JVM如何讓Java線程掛起呢?這是一個復雜的操作。很多文章里面說了JIT編譯模式下,編譯器會把很多safepoint檢查的操作插入到編譯偶的指令中,比如下面的指令來自內存篇:JVM內存回收理論與實現
?
?0x01b6d627: call?? 0x01b2b210???????? ; OopMap{[60]=Oop off=460}?? ?
?????????????????????????????????????? ;*invokeinterface size?? ?
?????????????????????????????????????? ; - Client1::main@113 (line 23)?? ?
?????????????????????????????????????? ;?? {virtual_call}?? ?
?0x01b6d62c: nop?????????????????????? ; OopMap{[60]=Oop off=461}?? ?
?????????????????????????????????????? ;*if_icmplt?? ?
?????????????????????????????????????? ; - Client1::main@118 (line 23)?? ?
?0x01b6d62d: test?? %eax,0x160100????? ;?? {poll}?? ?
?0x01b6d633: mov??? 0x50(%esp),%esi?? ?
?0x01b6d637: cmp??? %eax,%esi??
test? %eax,0x160100 就是一個safepoint polling page操作。當JVM要停止所有的Java線程時會把一個特定內存頁設置為不可讀,那么當Java線程讀到這個位置的時候就會被掛起
?
?
這個回答雖然是沒有問題,但是有些點到為止的感覺,有些意猶未盡,我又深挖了一些資料,很多資料連著一起看才能說明問題,下面再深入說說到底JVM是如何讓Java線程全部
阻塞的。
?
Points on Safepoints?這篇文章說明了一些問題。首先是關于一些safepoint的觀點
?
- All commercial GCs use safepoints.
- The GC reigns in all threads at safepoints. This is when it has exact knowledge of things touched by the threads.
- They can also be used for non-GC activity like optimization.
- A thread at a safepoint is not necessarily idle but it often is.
- Safepoint opportunities should be frequent.?
- All threads need to reach a global safepoint typically every dozen or so instructions (for example, at the end of loops).
safepoint機制可以stop the world,不僅僅是在GC的時候用,有很多其他地方也會用它來stop the world,阻塞所有Java線程,從而可以安全地進行一些操作。
?
看一下OpenJDK里面關于safepoint的一些說明
?
?// Begin the process of bringing the system to a safepoint.
// Java threads can be in several different states and are
// stopped by different mechanisms:
//
// 1. Running interpreted
// The interpeter dispatch table is changed to force it to
// check for a safepoint condition between bytecodes.
// 2. Running in native code
// When returning from the native code, a Java thread must check
// the safepoint _state to see if we must block. If the
// VM thread sees a Java thread in native, it does
// not wait for this thread to block. The order of the memory
// writes and reads of both the safepoint state and the Java
// threads state is critical. In order to guarantee that the
// memory writes are serialized with respect to each other,
// the VM thread issues a memory barrier instruction
// (on MP systems). In order to avoid the overhead of issuing
// a mem barrier for each Java thread making native calls, each Java
// thread performs a write to a single memory page after changing
// the thread state. The VM thread performs a sequence of
// mprotect OS calls which forces all previous writes from all
// Java threads to be serialized. This is done in the
// os::serialize_thread_states() call. This has proven to be
// much more efficient than executing a membar instruction
// on every call to native code.
// 3. Running compiled Code
// Compiled code reads a global (Safepoint Polling) page that
// is set to fault if we are trying to get to a safepoint.
// 4. Blocked
// A thread which is blocked will not be allowed to return from the
// block condition until the safepoint operation is complete.
// 5. In VM or Transitioning between states
// If a Java thread is currently running in the VM or transitioning
// between states, the safepointing code will wait for the thread to
// block itself when it attempts transitions to a new state.
//
?
?
可以看到JVM在阻塞全部Java線程之前,Java線程可能處在不同的狀態,這篇聊聊JVM(五)從JVM角度理解線程?說了JVM里面定義的線程所有的狀態。
1. 當線程在解釋模式下執行的時候,讓JVM發出請求之后,解釋器會把指令跳轉到檢查safepoint的狀態,比如檢查某個內存頁位置,從而讓線程阻塞
2. 當Java線程正在執行native code的時候,這種情況最復雜,篇幅也寫的最多。當VM thread看到一個Java線程在執行native code,它不需要等待這個Java線程進入阻塞狀態,因為當Java線程從執行native code返回的時候,Java線程會去檢查safepoint看是否要block(When returning from the native code, a Java thread must check the safepoint _state to see if we must block)
后面說了一大堆關于如何讓讀寫safepoint state和thread state按照嚴格順序執行(serialized),主要用兩種做法,一種是加內存屏障(Memeory barrier),一種是調用mprotected系統調用去強制Java的寫操作按順序執行(The VM thread performs a sequence of mprotect OS calls which forces all previous writes from all Java threads to be serialized.? This is done in the os::serialize_thread_states() call)
JVM采用的后者,因為內存屏障是一個很重的操作,要強制刷新CPU緩存,所以JVM采用了serialation page的方式。
說白了,就是在Java線程從執行native code狀態返回的時候要作線程同步,采用serialtion page的方式做了線程同步,而不是采用內存屏障的方式。熟悉Java內存模型的同學知道,類似volatie這種輕量級同步變量采用的就是內存屏障的方式。
為什么要做線程同步呢,這篇?請教hotspot源碼中關于Serialization Page的問題?解釋了這個問題:
?
?AddressLiteral sync_state(SafepointSynchronize::address_of_state());
__ set(_thread_in_native_trans, G3_scratch);
__ st(G3_scratch, thread_state);
if(os::is_MP()) {
if (UseMembar) {
// Force this write out before the read below
__ membar(Assembler::StoreLoad);
} else {
// Write serialization page so VM thread can do a pseudo remote membar.
// We use the current thread pointer to calculate a thread specific
// offset to write to within the page. This minimizes bus traffic
// due to cache line collision.
__ serialize_memory(G2_thread, G1_scratch, G3_scratch);
}
}
__ load_contents(sync_state, G3_scratch);
__ cmp(G3_scratch, SafepointSynchronize::_not_synchronized);
這段代碼首先將當前線程(不妨稱為thread A)狀態置為_thread_in_native_trans狀態,然后讀sync_state,看是否有線程準備進行GC,有則將當前線程block,等待GC線程進行GC。
?
由于讀sync_state的過程不是原子的,存在一個可能的場景是thread A剛讀到sync_stated,且其值是_not_synchronized,這時thread A被搶占,CPU調度給了準備發起GC的線程(不妨稱為thread B),該線程將sync_stated設置為了_synchronizing,然后讀其他線程的狀態,看其他線程是否都已經處于block狀態或者_thread_in_native狀態,是的話該線程就可以開始GC了,否則它還需要等待。
如果thread A在寫線程狀態與讀sync_state這兩個動作之間缺少membar指令,那么上述過程就有可能出現一個場景,就是thread A讀到了sync_stated為_not_synchronized,而thread B還沒有看到thread A的狀態變為_thread_in_native_trans。這樣thread B就會認為thread A已經具備GC條件(因為處于_thread_in_native狀態),如果其他線程此時也都準備好了,那thread B就會開始GC了。而thread A由于讀到的sync_state是_not_synchronized,因此它不會block,而是會開始執行java代碼,這樣就會導致GC出錯,進而系統崩潰。
?
主要原因就是讀寫safepoint state和thread state是不是原子的,需要同步操作,采用了serialization page是一個輕量級的同步方法。
關于serialation page具體的實現可以看這篇?關于memory_serialize_page的一些疑問?我看了之后的理解是相比與內存屏障每次寫一個內存位置就要刷新CPU緩存的方式,serialization page采用了一個內存頁的方式,每個線程順序寫一個位置,算法要保證多個線程不會寫到同一個位置。然后VM thread把這個內存頁設置為只讀,把線程的狀態刷新到相應的內存位置,然后再設置為可寫。這樣一是避免了刷新CPU緩存的操作,另外是一次可以批量處理多個線程。
?
3. 當JVM以JIT編譯模式運行的時候,就是最初說的在編譯后代碼插入一個檢查全局的safepoint polling page,VM thread把它設置為不可讀,讓Java線程掛起
?
4. 當線程本來就是阻塞狀態的時候,采用了safe region的方式,處于safe region的代碼只有等到被允許的時候才能離開safe region,看這篇聊聊JVM(六)理解JVM的safepoint
?
5. 當線程處在狀態轉化的時候,線程會去檢查safepoint狀態,如果要阻塞,就自己阻塞了
?
那么線程到底是如何自己就阻塞了呢?在第2條的時候說了JVM可以使用mprotect 系統調用來保護一些所有線程可寫的內存位置讓他們不可寫,當線程訪問到這些被保護的內存位置時,會觸發一個SIGSEGV信號,從而可以觸發JVM的signal handler來阻塞這個線程(The GC thread can protect some memory to which all threads in the process can write (using the?mprotect?system call) so they no longer can. Upon accessing this temporarily forbidden memory, a signal handler kicks in
) 。這是mprotect的man page
?
?"If the calling process tries to access memory in a manner that violates the protection, then the kernel generates a SIGSEGV
signal for the process."
?
再看一下JVM如何處理SIGSEGV信號的?hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp
?
?// Check to see if we caught the safepoint code in the
// process of write protecting the memory serialization page.
// It write enables the page immediately after protecting it
// so we can just return to retry the write.
if ((sig == SIGSEGV) &&
os::is_memory_serialize_page(thread, (address) info->si_addr)) {
// Block current thread until the memory serialize page permission restored.
os::block_on_serialize_page_trap();
return true;
}
這下知道test? %eax,0x160100 這個safepoint polling page操作為什么會阻塞線程了吧。
?
JVM要阻塞全部的Java線程的時候,要先檢查所有的Java線程所處的狀態,通過mprotect系統調用來保護一塊全局的內存區域,然后讓Java線程進入安全點去polling這個內存位置,當線程訪問到這個forbidden內存位置的時候會觸發JVM的signal handler來阻塞線程。
?
這個話題還涉及到JVM性能分析的一些場景。通過設置JVM參數 -XX:+PrintGCApplicationStoppedTime 會打出系統停止的時間,類似的日志如下面
?
?Total time for which application threads were stopped: 0.0041000 seconds
Total time for which application threads were stopped: 0.0044230 seconds
Total time for which application threads were stopped: 0.0043610 seconds
Total time for which application threads were stopped: 0.0056040 seconds
Total time for which application threads were stopped: 0.0051020 seconds
<span style="color:#FF0000;">Total time for which application threads were stopped: 8.2834300 seconds</span>
Total time for which application threads were stopped: 0.0110790 seconds
Total time for which application threads were stopped: 0.0098720 seconds
可以看到有一行日志說系統等待了8秒,這是為什么呢,原因是有線程遲遲進入不到safepoint來阻塞,導致其他已經停止的線程也一直等待,VM Thread也在等待所有的Java線程都進入到safepoint阻塞才能開始GC。看這篇ParNew 應用暫停時間偶爾會出現好幾秒的情況。
?
當遇到這種情況,就要分析是不是有大的循環操作,可能這些循環操作的時候JIT優化時沒有插入safepoint檢查的代碼。
?
看到高性能虛擬機圈子的里面有好幾個帖子說到全體Java線程進入到safepoint的時間較長,這和GC本身沒有關系。如果有遇到這種情況的,可能就得去看代碼是否有這種可能會被JIT優化,丟失safepoint的情況。How to get Java stacks when JVM can't reach a safepoint?這篇提到的問題也是safepoint沒有被正確插入導致JVM Freezen,VM線程等待所有Java線程進入safepoint阻塞,而有Java線程做了大操作而遲遲無法進入safepoint。
?
參考資料:
Points on Safepoints
內存篇:JVM內存回收理論與實現
請教hotspot源碼中關于Serialization Page的問題
關于memory_serialize_page的一些疑問
mprotect的man page
ParNew 應用暫停時間偶爾會出現好幾秒的情況
How to get Java stacks when JVM can't reach a safepoint
總結
以上是生活随笔為你收集整理的聊聊JVM(九)理解进入safepoint时如何让Java线程全部阻塞的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 聊聊JVM(八)说说GC标记阶段的一些事
- 下一篇: 聊聊高并发(二)结合实例说说线程封闭和背