當前位置：首頁 > 运维知识 > linux >内容正文

linux

linux的bh文件停止运行,linux 系统 rcu_bh self-detected stall 问题处理

發布時間：2024/9/30 linux 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 linux的bh文件停止运行,linux 系统 rcu_bh self-detected stall 问题处理小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Written by

arstercz

-2019-04-26

linux 系統 rcu_bh self-detected stall 問題處理

問題說明

近期幾臺 linux 機器都報了以下 kernel 提示:

Apr 24 21:02:09 cztest kernel: INFO: rcu_bh self-detected stall on CPU { 0} (t=0 jiffies)

Apr 24 21:02:09 cztest kernel: Pid: 0, comm: swapper/0 Not tainted 3.4.95.R620.CentOS6.5-x86_64.OpenBeta.KVM #1

Apr 24 21:02:09 cztest kernel: Call Trace:

Apr 24 21:02:09 cztest kernel: [] __rcu_pending+0x192/0x4e0

Apr 24 21:02:09 cztest kernel: [] ? tick_nohz_handler+0xf0/0xf0

Apr 24 21:02:09 cztest kernel: [] rcu_check_callbacks+0xcb/0xe0

Apr 24 21:02:09 cztest kernel: [] update_process_times+0x43/0x80

Apr 24 21:02:09 cztest kernel: [] tick_sched_timer+0x61/0xb0

Apr 24 21:02:09 cztest kernel: [] __run_hrtimer+0x5d/0x120

Apr 24 21:02:09 cztest kernel: [] hrtimer_interrupt+0xee/0x250

Apr 24 21:02:09 cztest kernel: [] smp_apic_timer_interrupt+0x64/0xa0

Apr 24 21:02:09 cztest kernel: [] apic_timer_interrupt+0x6a/0x70

Apr 24 21:02:09 cztest kernel: [] ? sched_clock_cpu+0xb8/0x110

Apr 24 21:02:09 cztest kernel: [] ? native_safe_halt+0x6/0x10

Apr 24 21:02:09 cztest kernel: [] ? cpuidle_idle_call+0x1f/0xf0

Apr 24 21:02:09 cztest kernel: [] default_idle+0x27/0x50

Apr 24 21:02:09 cztest kernel: [] cpu_idle+0x89/0xd0

Apr 24 21:02:09 cztest kernel: [] rest_init+0x6d/0x80

Apr 24 21:02:09 cztest kernel: [] start_kernel+0x34d/0x35a

Apr 24 21:02:09 cztest kernel: [] ? kernel_init+0x1d5/0x1d5

Apr 24 21:02:09 cztest kernel: [] x86_64_start_reservations+0x131/0x136

Apr 24 21:02:09 cztest kernel: [] x86_64_start_kernel+0x101/0x110

該主機的環境如下:

System | Dell Inc.; PowerEdge R620; vNot Specified (Rack Mount Chassis)

Platform | Linux

Kernel | 3.4.95

Total Memory | 64G

處理說明

linux 提供了 RCU(read, copy and update) 機制來解決多核處理器之間的數據同步問題, 上述提示中的 rcu_bh 意為 rcu bottom halves, 即 rcu 機制相關的下半部中斷處理, rcu bh 在 2.6.9 內核中引入的主要目的是為了防 DDos 攻擊, 在較新的系統中主要在軟中斷中運行. 系統中一些需要快速處理的中斷程序通常會在上半部處理, 對時間要求比較寬松的中斷程序會在下半部處理. 中斷程序一般都在軟硬件驅動, 內核等層面出現, 用戶空間的應用程序不會做中斷的處理. 按照內核文檔的描述, 以下情況會出現 rcu_bh stall 相關的警告信息:

詳見: kernel-source/Documentation/RCU/stallwarn.txt

So your kernel printed an RCU CPU stall warning. The next question is

"What caused it?" The following problems can result in RCU CPU stall

warnings:

o A CPU looping with interrupts disabled. This condition can

result in RCU-sched and RCU-bh stalls.

o A CPU looping with preemption disabled. This condition can

result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh

stalls.

o A CPU looping with bottom halves disabled. This condition can

result in RCU-sched and RCU-bh stalls.

一共三種情況會出現 rcu_bh 相關的提示, 分別為:

1. CPU 循環處理中禁止了中斷;

2. CPU 循環處理中禁止了搶占, 并且啟用了 ksoftirqd;

3. CPU 循環處理中禁止了下半部;

這幾個條件都是和中斷或內核搶占相關的, 由此看來應用程序不是引起該警告的原因, 更像是系統層面的.

再來看上面的報錯:

INFO: rcu_bh self-detected stall on CPU { 0} (t=0 jiffies)

可以譯為檢測到 cpu 0 上存在 rcu_bh 處理延遲或超時. 不過從 linux-3.4/kernel/rcutree.c 源文件來看:

static void print_cpu_stall(struct rcu_state *rsp)

{

...

printk(KERN_ERR "INFO: %s self-detected stall on CPU", rsp->name);

print_cpu_stall_info_begin();

print_cpu_stall_info(rsp, smp_processor_id());

print_cpu_stall_info_end();

printk(KERN_CONT " (t=%lu jiffies)\n", jiffies - rsp->gp_start);

if (!trigger_all_cpu_backtrace())

dump_stack();

...

t=0 jiffies 這里的 0 就很奇怪, 因為從代碼里看這里的值應該是當前 jiffies(當前系統自啟動以來的節拍總數, jiffies / HZ 即為機器啟動的秒數, 系統的時鐘頻率 HZ 默認為 100, 相當于10ms 一次) 減去中斷程序啟動的 jiffies. 這里的 0 意味著經歷了 0 個節拍, 在 10ms 之內. 下半部的中斷在 10ms 內就被認為是處理延遲或超時. 不過 10ms 還遠沒超過默認的 timeout(60s) 值, 可以從 /sys/module/rcutree/parameters/rcu_cpu_stall_timeout 查看該值. 這個問題先保留, 或許可以通過升級內核版本解決.

另外堆棧中的信息, update_process_times 函數主要通過計時器中斷程序來給當前的進程計時, 不過在計時前對 rcu 的調用進行了檢查, 上述的堆棧信息即從 rcu_check_callbacks 中輸出. 可以看到執行 rcu_check_callbacks 函數后, 無論是否打印堆棧信息都會執行后續的計時操作. 所以從這方面來看上面的信息只是警告信息, 不會影響用戶空間程序的使用.

Apr 24 21:02:09 cztest kernel: [] update_process_times+0x43/0x80

source/kernel/timer.c

void update_process_times(int user_tick)

{

struct task_struct *p = current;

int cpu = smp_processor_id();

/* Note: this timer irq context must be accounted for as well. */

account_process_tick(p, user_tick);

run_local_timers();

rcu_check_callbacks(cpu, user_tick); // ---> rcu_pending -> __rcu_pending -> print_cpu_stall

printk_tick();

#ifdef CONFIG_IRQ_WORK if (in_irq())

irq_work_run();

#endif scheduler_tick();

run_posix_cpu_timers(p);

總結說明

從上述的簡單分析來看, 該消息只是提示信息, 不會是用戶空間的程序來引起, 不過也需要多觀察該 kernel 提示是否頻繁出現. 可以嘗試通過升級內核來解決該問題.

參考:

總結

以上是生活随笔為你收集整理的linux的bh文件停止运行,linux 系统 rcu_bh self-detected stall 问题处理的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： arch linux网络配置,关于arc
下一篇： linux生成公钥实现ssh,linux

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

linux

linux的bh文件停止运行,linux 系统 rcu_bh self-detected stall 问题处理

總結