Linux soft lockup分析
關(guān)鍵詞:watchdog、soft lockup、percpu thread、lockdep等。
?
近日遇到一個soft lockup問題,打印類似“[ 56.032356] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [cat:153]“。
這是lockup檢測機(jī)制在起作用,lockup檢測機(jī)制包括soft lockup detector和hard lockup detector。
借機(jī)分析下soft lockup機(jī)制以及什么情況下導(dǎo)致soft watchdog異常、對watchdog的配置、如何定位異常點(diǎn)。
這里跳過hard lockup detector的分析。
1. soft lockup機(jī)制分析
lockup_detector_init()函數(shù)首先獲取sample_period以及watchdog_cpumask,然后根據(jù)情況創(chuàng)建線程,啟動喂狗程序;創(chuàng)建hrtimer啟動看門狗。
然后有兩個重點(diǎn)一個是創(chuàng)建內(nèi)核線程的API以及struct smp_hotplug_thread結(jié)構(gòu)體。
void __init lockup_detector_init(void) {set_sample_period();----------------------------------------獲取變量sample_period,為watchdog_thresh*2/5,即4秒喂一次狗。 ...cpumask_copy(&watchdog_cpumask, cpu_possible_mask); if (watchdog_enabled)watchdog_enable_all_cpus(); }static int watchdog_enable_all_cpus(void) {int err = 0;if (!watchdog_running) {----------------------------------如果當(dāng)前watchdog_running沒有再運(yùn)行,那么為每個CPU創(chuàng)建一個watchdog/x線程,這些線程每隔sample_period時間喂一次狗。watchdog_threads時watchdog/x線程的主要輸入?yún)?shù),watchdog_cpumask規(guī)定了為哪些CPU創(chuàng)建線程。err = smpboot_register_percpu_thread_cpumask(&watchdog_threads,&watchdog_cpumask);if (err)pr_err("Failed to create watchdog threads, disabled\n");elsewatchdog_running = 1;} else { err = update_watchdog_all_cpus();if (err) {watchdog_disable_all_cpus();pr_err("Failed to update lockup detectors, disabled\n");}}if (err)watchdog_enabled = 0;return err; }static void watchdog_disable_all_cpus(void) {if (watchdog_running) {watchdog_running = 0;smpboot_unregister_percpu_thread(&watchdog_threads);} }static int update_watchdog_all_cpus(void) {int ret;ret = watchdog_park_threads();if (ret)return ret;watchdog_unpark_threads();return 0; }static int watchdog_park_threads(void) {int cpu, ret = 0;atomic_set(&watchdog_park_in_progress, 1);for_each_watchdog_cpu(cpu) {ret = kthread_park(per_cpu(softlockup_watchdog, cpu));---------------------------設(shè)置struct kthread->flags的KTHREAD_SHOULD_PARK位,在watchdog/x線程中會調(diào)用unpark成員函數(shù)進(jìn)行處理。if (ret)break;}atomic_set(&watchdog_park_in_progress, 0);return ret; }static void watchdog_unpark_threads(void) {int cpu;for_each_watchdog_cpu(cpu)kthread_unpark(per_cpu(softlockup_watchdog, cpu));-------------------------------清空struct kthread->flags的KTHREAD_SHOULD_PARK位,在watchdog/x線程中會調(diào)用park成員函數(shù)。 }?
1.1 watchdog_threads結(jié)構(gòu)體介紹
在介紹如何創(chuàng)建watchdog/x線程之前,有必要先介紹一些struct smp_hotplug_thread線程。
struct smp_hotplug_thread {struct task_struct __percpu **store;--------------------------存放percpu strcut task_strcut指針的指針。struct list_head list;int (*thread_should_run)(unsigned int cpu);-------檢查是否應(yīng)該運(yùn)行watchdog/x線程。void (*thread_fn)(unsigned int cpu);--------------watchdog/x線程的主函數(shù)。void (*create)(unsigned int cpu);void (*setup)(unsigned int cpu);------------------在運(yùn)行watchdog/x線程之前的準(zhǔn)備工作。void (*cleanup)(unsigned int cpu, bool online);---在退出watchdog/x線程之后的清楚工作。void (*park)(unsigned int cpu);-------------------當(dāng)CPU offline時,需要臨時停止。void (*unpark)(unsigned int cpu);-----------------當(dāng)CPU變成online時,進(jìn)行準(zhǔn)備工作。cpumask_var_t cpumask;--------------------------------允許哪些CPU online。bool selfparking;const char *thread_comm;------------------------------watchdog/x線程名稱。 };?watchdog_threads是soft lockup監(jiān)控線程的實(shí)體,基于此創(chuàng)建?watchdog/x線程。
static struct smp_hotplug_thread watchdog_threads = {.store = &softlockup_watchdog,.thread_should_run = watchdog_should_run,.thread_fn = watchdog,.thread_comm = "watchdog/%u",.setup = watchdog_enable,.cleanup = watchdog_cleanup,.park = watchdog_disable,.unpark = watchdog_enable, };static void watchdog_enable(unsigned int cpu) {struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);/* kick off the timer for the hardlockup detector */hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);hrtimer->function = watchdog_timer_fn;------------------------------------------創(chuàng)建一個hrtimer,超時函數(shù)為watchdog_timer_fn,這里面會檢查watchdog_touch_ts變量是否超過20秒沒有被更新。如果是,則有soft lockup。/* Enable the perf event */watchdog_nmi_enable(cpu);/* done here because hrtimer_start can only pin to smp_processor_id() */hrtimer_start(hrtimer, ns_to_ktime(sample_period),HRTIMER_MODE_REL_PINNED);---------------------------------------------啟動一個超時為sample_period(4秒)的hrtimer,HRTIMER_MODE_REL_PINNED表示此hrtimer和當(dāng)前CPU綁定。/* initialize timestamp */watchdog_set_prio(SCHED_FIFO, MAX_RT_PRIO - 1);---------------------------------設(shè)置當(dāng)前線程為實(shí)時FIFO,并且優(yōu)先級為實(shí)時99.這個優(yōu)先級表示高于所有的非實(shí)時線程,但是實(shí)時優(yōu)先級最低的。__touch_watchdog();-------------------------------------------------------------更新watchdog_touch_ts變量,相當(dāng)于喂狗操作。 }static void watchdog_set_prio(unsigned int policy, unsigned int prio) {struct sched_param param = { .sched_priority = prio };sched_setscheduler(current, policy, ¶m); }/* Commands for resetting the watchdog */ static void __touch_watchdog(void) {__this_cpu_write(watchdog_touch_ts, get_timestamp());----------------------------喂狗的操作就是更新watchdog_touch_ts變量,也即當(dāng)前時間戳。 }static void watchdog_disable(unsigned int cpu)-------------------------------------相當(dāng)于watchdog_enable()反操作,將線程恢復(fù)為普通線程;取消hrtimer。 {struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);watchdog_set_prio(SCHED_NORMAL, 0);hrtimer_cancel(hrtimer);/* disable the perf event */watchdog_nmi_disable(cpu); }static void watchdog_cleanup(unsigned int cpu, bool online) {watchdog_disable(cpu); } static int watchdog_should_run(unsigned int cpu) {return __this_cpu_read(hrtimer_interrupts) !=__this_cpu_read(soft_lockup_hrtimer_cnt);------------------------------------hrtimer_interrupts記錄了產(chǎn)生hrtimer的次數(shù);在watchdog()中,將hrtimer_interrupts賦給soft_lockup_hrtimer_cnt。兩者相等表示沒有hrtimer產(chǎn)生,不需要運(yùn)行watchdog/x線程;相反不等,則需要watchdog/x線程運(yùn)行。 } static void watchdog(unsigned int cpu) {__this_cpu_write(soft_lockup_hrtimer_cnt,__this_cpu_read(hrtimer_interrupts));-----------------------------------更新soft_lockup_hrtimer_cnt,在watch_should_run()中就返回false,表示線程不需要運(yùn)行,即不需要喂狗。__touch_watchdog();--------------------------------------------------------------雖然就是一句話,但是卻很重要的喂狗操作。 if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))watchdog_nmi_disable(cpu); }?
1.2 創(chuàng)建喂狗線程watchdog/x
在分析了watchdog_threads之后,再來看看如何創(chuàng)建watchdog/x線程。?
int smpboot_register_percpu_thread_cpumask(struct smp_hotplug_thread *plug_thread,const struct cpumask *cpumask) {unsigned int cpu;int ret = 0;if (!alloc_cpumask_var(&plug_thread->cpumask, GFP_KERNEL))return -ENOMEM;cpumask_copy(plug_thread->cpumask, cpumask);get_online_cpus();mutex_lock(&smpboot_threads_lock);for_each_online_cpu(cpu) {------------------------------------------------遍歷所有online CPU,為每個CPU創(chuàng)建一個percpu的watchdog/x線程。ret = __smpboot_create_thread(plug_thread, cpu);if (ret) {smpboot_destroy_threads(plug_thread);-----------------------------創(chuàng)建失敗則釋放相關(guān)資源。free_cpumask_var(plug_thread->cpumask);goto out;}if (cpumask_test_cpu(cpu, cpumask))smpboot_unpark_thread(plug_thread, cpu);--------------------------如果當(dāng)前CPU不在cpumask中,則清空KTHREAD_SHOULD_PARK,進(jìn)而調(diào)用watchdog_therads的umpark成員函數(shù)。}list_add(&plug_thread->list, &hotplug_threads); out:mutex_unlock(&smpboot_threads_lock);put_online_cpus();return ret; }static int __smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu) {struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);struct smpboot_thread_data *td;if (tsk)return 0;td = kzalloc_node(sizeof(*td), GFP_KERNEL, cpu_to_node(cpu));if (!td)return -ENOMEM;td->cpu = cpu;td->ht = ht;tsk =kthread_create_on_cpu(smpboot_thread_fn, td, cpu,ht->thread_comm);-----------------------------------------在指定CPU上創(chuàng)建watchdog/x線程,處理函數(shù)為smpboot_thread_fn()。if (IS_ERR(tsk)) {kfree(td);return PTR_ERR(tsk);}/** Park the thread so that it could start right on the CPU* when it is available.*/kthread_park(tsk);--------------------------------------------------------在CPU上立即啟動watchdog/x線程。get_task_struct(tsk);-----------------------------------------------------增加對線程的引用計數(shù)。*per_cpu_ptr(ht->store, cpu) = tsk;---------------------------------------store存放線程結(jié)構(gòu)體指針的指針。if (ht->create) { if (!wait_task_inactive(tsk, TASK_PARKED))WARN_ON(1);elseht->create(cpu);}return 0; } static int smpboot_thread_fn(void *data) {struct smpboot_thread_data *td = data;struct smp_hotplug_thread *ht = td->ht;while (1) {set_current_state(TASK_INTERRUPTIBLE);preempt_disable();if (kthread_should_stop()) {----------------------------------------如果可以終止線程,調(diào)用cleanup,退出線程。__set_current_state(TASK_RUNNING);preempt_enable();/* cleanup must mirror setup */if (ht->cleanup && td->status != HP_THREAD_NONE)ht->cleanup(td->cpu, cpu_online(td->cpu));kfree(td);return 0;}if (kthread_should_park()) {----------------------------------------如果KTHREAD_SHOULD_PARK置位,調(diào)用park()暫停進(jìn)程執(zhí)行。__set_current_state(TASK_RUNNING);preempt_enable();if (ht->park && td->status == HP_THREAD_ACTIVE) {BUG_ON(td->cpu != smp_processor_id());ht->park(td->cpu);td->status = HP_THREAD_PARKED;}kthread_parkme();/* We might have been woken for stop */continue;}BUG_ON(td->cpu != smp_processor_id());/* Check for state change setup */switch (td->status) {case HP_THREAD_NONE:-----------------------------------------------相當(dāng)于第一次運(yùn)行,調(diào)用setup()進(jìn)行初始化操作。__set_current_state(TASK_RUNNING);preempt_enable();if (ht->setup)ht->setup(td->cpu);td->status = HP_THREAD_ACTIVE;continue;case HP_THREAD_PARKED:---------------------------------------------從parked狀態(tài)恢復(fù)。__set_current_state(TASK_RUNNING);preempt_enable();if (ht->unpark)ht->unpark(td->cpu);td->status = HP_THREAD_ACTIVE;continue;}if (!ht->thread_should_run(td->cpu)) {-----------------------------如果不需要進(jìn)程運(yùn)行,schedule()主動放棄CPU給其他線程使用。preempt_enable_no_resched();schedule();} else {__set_current_state(TASK_RUNNING);preempt_enable();ht->thread_fn(td->cpu);----------------------------------------調(diào)用struct smpboot_thread_fn->thread_fn及watchdog(),進(jìn)行喂狗操作。}} }void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread)----將創(chuàng)建的內(nèi)核線程移除操作。 {get_online_cpus();mutex_lock(&smpboot_threads_lock);list_del(&plug_thread->list);smpboot_destroy_threads(plug_thread);mutex_unlock(&smpboot_threads_lock);put_online_cpus();free_cpumask_var(plug_thread->cpumask); }static void smpboot_destroy_threads(struct smp_hotplug_thread *ht) {unsigned int cpu;/* We need to destroy also the parked threads of offline cpus */for_each_possible_cpu(cpu) {struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);if (tsk) {kthread_stop(tsk);put_task_struct(tsk);*per_cpu_ptr(ht->store, cpu) = NULL;}} }?
1.3 hrtimer看門狗
?在分析了喂狗線程watchdog/x之后,再來分析看門狗是如何實(shí)現(xiàn)的?
看門狗是通過啟動一個周期為4秒的hrtimer來實(shí)現(xiàn)的,這個hrtimer和CPU綁定,使用的變量都是percpu的。確保每個CPU之間不相互干擾。
每次hrtimer超時,都會喚醒watchdog/x線程,并進(jìn)行一次喂狗操作。
因?yàn)閔rtimer超時函數(shù)在軟中斷中調(diào)用,在中斷產(chǎn)生后會比線程優(yōu)先得到執(zhí)行。
所以在watchdog/x線程沒有得到執(zhí)行的情況下,通過is_softlockup()來判斷看門狗是否超過20秒沒有得到喂狗。
static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) {unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);struct pt_regs *regs = get_irq_regs();int duration;int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;if (atomic_read(&watchdog_park_in_progress) != 0)return HRTIMER_NORESTART;/* kick the hardlockup detector */watchdog_interrupt_count();------------------------------------------------------------------沒產(chǎn)生一次中斷,hrtimer_interrupts計數(shù)加1.hrtimer_interrupts記錄了產(chǎn)生hrtimer的次數(shù)。/* kick the softlockup detector */wake_up_process(__this_cpu_read(softlockup_watchdog));---------------------------------------喚醒watchdog/x線程,進(jìn)行喂狗操作。/* .. and repeat */hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));------------------------------------重新設(shè)置超時點(diǎn),形成周期性時鐘。 ... duration = is_softlockup(touch_ts);----------------------------------------------------------返回非0表示,看門狗超時。if (unlikely(duration)) {--------------------------------------------------------------------看門狗超時情況的處理。 if (kvm_check_and_clear_guest_paused())return HRTIMER_RESTART;/* only warn once */if (__this_cpu_read(soft_watchdog_warn) == true) { if (__this_cpu_read(softlockup_task_ptr_saved) !=current) {__this_cpu_write(soft_watchdog_warn, false);__touch_watchdog();}return HRTIMER_RESTART;}if (softlockup_all_cpu_backtrace) { if (test_and_set_bit(0, &soft_lockup_nmi_warn)) {/* Someone else will report us. Let's give up */__this_cpu_write(soft_watchdog_warn, true);return HRTIMER_RESTART;}}pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",smp_processor_id(), duration,current->comm, task_pid_nr(current));-------------------------------------------------打印哪個CPU被卡死duration秒,以及死在哪個進(jìn)程。__this_cpu_write(softlockup_task_ptr_saved, current);print_modules();print_irqtrace_events(current);-----------------------------------------------------------顯示開關(guān)中斷、軟中斷信息,禁止中斷和軟中斷也是造成soft lockup的一個原因。if (regs)---------------------------------------------------------------------------------有寄存器顯示寄存器信息,同時顯示棧信息。show_regs(regs);elsedump_stack();if (softlockup_all_cpu_backtrace) { trigger_allbutself_cpu_backtrace();clear_bit(0, &soft_lockup_nmi_warn);/* Barrier to sync with other cpus */smp_mb__after_atomic();}add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);if (softlockup_panic)---------------------------------------------------------------------如果定義softlockup_panic則進(jìn)入panic()。panic("softlockup: hung tasks");__this_cpu_write(soft_watchdog_warn, true);} else__this_cpu_write(soft_watchdog_warn, false);return HRTIMER_RESTART; }? static void watchdog_interrupt_count(void)
? {
? ? ? __this_cpu_inc(hrtimer_interrupts);
? }
?
2. 對watchdog的設(shè)置
?對watchdog行為的設(shè)置有兩個途徑:通過命令行傳入?yún)?shù)和通過proc設(shè)置。
2.1 通過命令行設(shè)置
通過命令行傳入?yún)?shù),可以對soft lockup進(jìn)行開關(guān)設(shè)置、超時過后是否panic等等行為。
static int __init softlockup_panic_setup(char *str) {softlockup_panic = simple_strtoul(str, NULL, 0);return 1; } __setup("softlockup_panic=", softlockup_panic_setup);static int __init nowatchdog_setup(char *str) {watchdog_enabled = 0;return 1; } __setup("nowatchdog", nowatchdog_setup);static int __init nosoftlockup_setup(char *str) {watchdog_enabled &= ~SOFT_WATCHDOG_ENABLED;return 1; } __setup("nosoftlockup", nosoftlockup_setup);#ifdef CONFIG_SMP static int __init softlockup_all_cpu_backtrace_setup(char *str) {sysctl_softlockup_all_cpu_backtrace =!!simple_strtol(str, NULL, 0);return 1; } __setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup); static int __init hardlockup_all_cpu_backtrace_setup(char *str) {sysctl_hardlockup_all_cpu_backtrace =!!simple_strtol(str, NULL, 0);return 1; } __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup); #endif?
2.2 通過sysfs節(jié)點(diǎn)調(diào)節(jié)watchdog
?watchdog相關(guān)的配置還可以通過proc文件系統(tǒng)進(jìn)行配置。
/proc/sys/kernel/nmi_watchdog-------------------------hard lockup開關(guān),proc_nmi_watchdog()。 /proc/sys/kernel/soft_watchdog------------------------soft lockup開關(guān),proc_soft_watchdog()。 /proc/sys/kernel/watchdog-----------------------------watchdog總開關(guān),proc_watchdog()。 /proc/sys/kernel/watchdog_cpumask---------------------watchdog cpumaks,proc_watchdog_cpumask()。 /proc/sys/kernel/watchdog_thresh----------------------watchdog超時閾值設(shè)置,proc_watchdog_thresh()。?
3. 定位soft lockup異常
引起soft lockup的原因一般是死循環(huán)或者死鎖, 死循環(huán)可以通過棧回溯找到問題點(diǎn);死鎖問題需要打開內(nèi)核的lockdep功能。
打開內(nèi)核的lockdep功能可以參考《Linux死鎖檢測-Lockdep》。
下面看一個while(1)引起的soft lockup異常分析:
[ 5656.032325] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [cat:157]-----------------------CPU、進(jìn)程等信息粗略定位。 [ 5656.039314] Modules linked in: [ 5656.042386] [ 5656.042386] CURRENT PROCESS: [ 5656.042386] [ 5656.048229] COMM=cat PID=157 [ 5656.051117] TEXT=00008000-000c5a68 DATA=000c6f1c-000c7175 BSS=000c7175-000c8000 [ 5656.058432] USER-STACK=7fc1ee50 KERNEL-STACK=bd0b7080 [ 5656.058432] [ 5656.065069] PC: 0x8032a1b2 (clk_summary_show+0x62/0xb4)--------------------------------------------PC指向出問題的點(diǎn),更加精確的定位。 [ 5656.070302] LR: 0x8032a186 (clk_summary_show+0x36/0xb4) [ 5656.075531] SP: 0xbd8b1b74... [ 5656.217622] Call Trace:-----------------------------------------------------------------------------------------通過Call Trace,可以了解如何做到PC指向的問題點(diǎn)的。來龍去脈一目了然。 [<80155c5e>] seq_read+0xc2/0x46c [<802826ac>] full_proxy_read+0x58/0x98 [<8013239c>] do_readv_writev+0x31c/0x384 [<80132458>] vfs_readv+0x54/0x8c [<80160b52>] default_file_splice_read+0x166/0x2b0 [<801606ee>] do_splice_to+0x76/0xb0 [<801607de>] splice_direct_to_actor+0xb6/0x21c [<801609c2>] do_splice_direct+0x7e/0xa8 [<80132a5a>] do_sendfile+0x21a/0x45c [<80133776>] SyS_sendfile64+0xf6/0xfc [<80046186>] csky_systemcall+0x96/0xe0?
轉(zhuǎn)載于:https://www.cnblogs.com/arnoldlu/p/10338850.html
總結(jié)
以上是生活随笔為你收集整理的Linux soft lockup分析的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Linux之 proc文件系统
- 下一篇: 调研