Linux内核Crash分析
http://blog.chinaunix.net/uid-20788636-id-4377271.html?
在工作中經常會遇到一些內核crash的情況,本文就是根據內核出現crash后的打印信息,對其進行了分析,使用的內核版本為:Linux2.6.32。
每一個進程的生命周期內,其生命周期的范圍為幾毫秒到幾個月。一般都是和內核有交互,例如用戶空間程序使用系統調用進入內核空間。這時使用的不再是用戶空間的棧空間,使用對應的內核棧空間。對每一個進程來說,Linux內核都會把兩個不同的數據結構緊湊的存放在一個單獨為進程分配的存儲空間中:一個是內核態的進程堆棧,另一個是緊挨進程描述符的數據結構thread_info,叫線程描述符。內核的堆棧大小一般為8KB,也就是8192個字節,占用兩個頁。在Linux-2.6.32內核中thread_info.h文件中有對內核堆棧的定義:
#define THREAD_SIZE?????????????? 8192
?????????在Linux內核中使用下面的聯合結構體表示一個進程的線程描述符和內核棧,在內核中文件include/linux/sched.h。
union thread_union {
???????? struct thread_info thread_info;
???????? unsigned long stack[THREAD_SIZE/sizeof(long)];
};??????
?????????該結構是一個聯合體,我們在C語言書上看到過關于union的解釋,在在C Programming Language?一書中對于聯合體是這么描述的:
1)?聯合體是一個結構;
2)?它的所有成員相對于基地址的偏移量都為0;
3)?此結構空間要大到足夠容納最"寬"的成員;
4)?其對齊方式要適合其中所有的成員;
?????????通過上面的描述可知,thread_union結構體的大小為8192個字節。也就是stack數組的大小,類型是unsigned long類型。由于聯合體中的成員變量都是占用同一塊內存區域,所以,在平時寫代碼時總有一個概念,對一個聯合體的實例只能使用其中一個成員變量,否則會把原先變量給覆蓋掉,這句話如果正確的話,必須要有一個前提假設,成員占用的字節數相同,當成員所占的字節數不同時,只會覆蓋相應的字節。對于thread_union聯合體,我們是可以同時訪問這兩個成員,只要能夠正確獲取到兩個成員變量的地址。
?????????在內核中的某一個進程使用了過多的棧空間時,內核棧就會溢出到thread_info部分,這將導致嚴重的問題(系統重啟),例如,遞歸調用的層次太深;在函數內定義的數據結構太大。
圖:進程中thread_info??? task_struct和內核棧中的關系
?????????下面我們看一下thread_info的結構體:
struct thread_info {
???????? unsigned long?????????? flags;????????????????? /*?底層標志,*/
???????? int?????????????????????? preempt_count;?????? /* 0 =>?可搶占, <0 => bug */
???????? mm_segment_t??????????????? addr_limit;?????? /*?進程地址空間?*/
???????? struct task_struct?? *task;??????????????? /*當前進程的task_struct指針?*/
???????? struct exec_domain???????? *exec_domain;???????? /*執行區間?*/
???????? __u32????????????????????????? cpu;?????????? /*?當前cpu */
???????? __u32????????????????????????? cpu_domain;??? /* cpu domain */
???????? struct cpu_context_save???????? cpu_context;?? /* cpu context */
???????? __u32????????????????????????? syscall;???? /* syscall number */
???????? __u8??????????????????????????? used_cp[16];??? /* thread used copro */
???????? unsigned long?????????? tp_value;
???????? struct crunch_state???????? crunchstate;
???????? union fp_state????????? fpstate __attribute__((aligned(8)));
???????? union vfp_state???????????????? vfpstate;
#ifdef CONFIG_ARM_THUMBEE
???????? unsigned long?????????? thumbee_state;?????? /* ThumbEE Handler Base register */
#endif
???????? struct restart_block???????? restart_block; /*用于實現信號機制*/
};
PS:(1)flag?用于保存各種特定的進程標志,最重要的兩個是:TIF_SIGPENDING,如果進程有待處理的信號就置位,TIF_NEED_RESCHED表示進程應該需要調度器選擇另一個進程替換本進程執行。
?????????結合上面的知識,看下當內核打印堆棧信息時,都打印了上面信息。下面的打印信息是工作中遇到的一種情況,打印了內核的堆棧信息,PC指針在dev_get_by_flags中,不能訪問的內核虛地址為45685516,內核中一般可訪問的地址都是以0xCXXXXXXX開頭的地址。
Unable to handle kernel paging request at virtual address 45685516
pgd = c65a4000
[45685516] *pgd=00000000
Internal error: Oops: 1 [#1]
last sysfs file: /sys/devices/form/tpm/cfg_l3/l3_rule_add
Modules linked in: splic mmp(P)
CPU: 0??? Tainted: P??????????? (2.6.32.11 #42)
PC is at dev_get_by_flags+0xfc/0x140
LR is at dev_get_by_flags+0xe8/0x140
pc : [<c06bee24>]??? lr : [<c06bee10>]??? psr: 20000013
sp : c07e9c28? ip : 00000000? fp : c07e9c64
r10: c6bcc560? r9 : c646a220? r8 : c66a0000
r7 : c6a00000? r6 : c0204e56? r5 : 30687461? r4 : 45685516
r3 : 00000000? r2 : 00000010? r1 : c0204e56? r0 : ffffffff
Flags: nzCv? IRQs on? FIQs on? Mode SVC_32? ISA ARM? Segment kernel
Control: 0005397f? Table: 065a4000? DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc07e8270)
Stack: (0xc07e9c28 to 0xc07ea000)
9c20:?????????????????? c0204e56 c6a00000 45685516 c69ffff0 c69ffff0 c69ffff0
9c40: c6a00000 30687461 c66a0000 c6a00000 00000007 c64b210c c07e9d24 c07e9c68
9c60: c071f764 c06bed38 c66a0000 c66a0000 c6a00000 c6a00000 c66a0000 c6a00000
9c80: c07e9cfc c07e9c90 c03350d4 c0334b2c 00000034 00000006 00000100 c64b2104
9ca0: 0000c4fb c0243ece c66a0000 c0beed04 c033436c c646a220 c07e9cf4 00000000
9cc0: c66a0000 00000003 c0bee8e8 c0beed04 c07e9d24 c07e9ce0 c06e4f5c 00004c68
9ce0: 00000000 faa9fea9 faa9fea9 00000000 00000000 c6bcc560 c0335138 c646a220
9d00: c66a0000 c64b2104 c085ffbc c66a0000 c0bee8e8 00000000 c07e9d54 c07e9d28
9d20: c071f9a0 c071ebc0 00000000 c071ebb0 80000000 00000007 c67fb460 c646a220
9d40: c0bee8c8 00000608 c07e9d94 c07e9d58 c002a100 c071f84c c0029bb8 80000000
9d60: c07e9d84 c0beee0c c0335138 c66a0000 c646a220 00000000 c4959800 c4959800
9d80: c67fb460 00000000 c07e9dc4 c07e9d98 c078f0f4 c0029bc8 00000000 c0029bb8
9da0: 80000000 c07e9dbc c6b8d340 c66a0520 00000000 c646a220 c07e9dec c07e9dc8
9dc0: c078f450 c078effc 00000000 c67fb460 c6b8d340 00000000 c67fb460 c64b20f2
9de0: c07e9e24 c07e9df0 c078fb60 c078f130 00000000 c078f120 80000000 c0029a94
9e00: 00000806 c6b8d340 c0bee818 00000001 00000000 c4959800 c07e9e64 c07e9e28
9e20: c002a030 c078f804 c64b2070 00000000 c64b2078 ffc45000 c64b20c2 c085c2dc
9e40: 00000000 c085c2c0 00000000 c0817398 00086c2e c085c2c4 c07e9e9c c07e9e68
9e60: c06c2684 c0029bc8 00000001 00000040 00000000 c085c2dc c085c2c0 00000001
9e80: 0000012c 00000040 c085c2d0 c0bee818 c07e9ed4 c07e9ea0 c00284e0 c06c2608
9ea0: bf00da5c 00086c30 00000000 00000001 c097e7d4 c07e8000 00000100 c08162d8
9ec0: 00000002 c097e7a0 c07e9f14 c07e9ed8 c00283d0 c0028478 56251311 00023c88
9ee0: c07e9f0c 00000003 c08187ac 00000018 00000000 01000000 c07ebc70 00023cbc
9f00: 56251311 00023c88 c07e9f24 c07e9f18 c03391e8 c0028348 c07e9f3c c07e9f28
9f20: c0028070 c03391b0 ffffffff 0000001f c07e9f94 c07e9f40 c002d4d0 c0028010
9f40: 00000000 00000001 c07e9f88 60000013 c07e8000 c07ebc78 c0868784 c07ebc70
9f60: 00023cbc 56251311 00023c88 c07e9f94 c07e9f98 c07e9f88 c025c3e4 c025c3f4
9f80: 60000013 ffffffff c07e9fb4 c07e9f98 c025c578 c025c3cc 00000000 c0981204
9fa0: c0025ca0 c0d01140 c07e9fc4 c07e9fb8 c0032094 c025c528 c07e9ff4 c07e9fc8
9fc0: c0008918 c0032048 c0008388 00000000 00000000 c0025ca0 00000000 00053975
9fe0: c0868834 c00260a4 00000000 c07e9ff8 00008034 c0008708 00000000 00000000
Backtrace:
[<c06bed28>] (dev_get_by_flags+0x0/0x140) from [<c071f764>] (arp_process+0xbb4/0xc74)
?r7:c64b210c r6:00000007 r5:c6a00000 r4:c66a0000
?????????(1)首先,看看這段堆棧信息是在內核中那個文件中打印出來的,在fault.c文件中,__do_kernel_fault函數,在上面的打印中Unable to handle kernel paging request at virtual address 45685516,該地址是內核空間不可訪問的地址。
static void
__do_kernel_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
?????????????????? ? struct pt_regs *regs)
{
???????? /*
???????? ?* Are we prepared to handle this kernel fault?
???????? ?*/
???????? if (fixup_exception(regs))
?????????????????? return;
???????? /*
???????? ?* No handler, we'll have to terminate things with extreme prejudice.
???????? ?*/
???????? bust_spinlocks(1);
???????? printk(KERN_ALERT
?????????????????? "Unable to handle kernel %s at virtual address %08lx\n",
?????????????????? (addr < PAGE_SIZE) ? "NULL pointer dereference" :
???????????????????"paging request", addr);
???????? show_pte(mm, addr);
???????? die("Oops", regs, fsr);
???????? bust_spinlocks(0);
???????? do_exit(SIGKILL);
}
(2)?對于下面的兩個信息,在函數show_pte中進行了打印,下面的打印涉及到了頁全局目錄,頁表的知識,暫時先不分析,后續補上。
pgd = c65a4000
[45685516] *pgd=00000000
void show_pte(struct mm_struct *mm, unsigned long addr)
{
???????? pgd_t *pgd;
?
???????? if (!mm)
?????????????????? mm = &init_mm;
?
?????????printk(KERN_ALERT "pgd = %p\n", mm->pgd);
???????? pgd = pgd_offset(mm, addr);
???????? printk(KERN_ALERT "[%08lx] *pgd=%08lx", addr, pgd_val(*pgd));
……………………
}
(3) die函數中調用
?????????在die函數中取得thread_info結構體的地址。
????????? struct thread_info *thread = current_thread_info();
static inline struct thread_info *current_thread_info(void)
{
????????? register unsigned long sp asm ("sp");
????????? return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
}
Sp:?0xc07e9c28????通過current_thread_info得到?thread_info的地址
(0xc07e9c28 & 0xffffe000) = 0xC07E8000(thread_info的地址,也就是棧底的地址)
(4)下面的打印信息在__die函數中打印
Internal error: Oops: 1 [#1]
last sysfs file: /sys/devices/form/tpm/cfg_l2/l2_rule_add
Modules linked in: splic mmp(P)
CPU: 0??? Tainted: P??????????? (2.6.32.11 #42)
PC is at dev_get_by_flags+0xfc/0x140
LR is at dev_get_by_flags+0xe8/0x140
pc : [<c06bee24>]??? lr : [<c06bee10>]??? psr: 20000013
sp : c07e9c28? ip : 00000000? fp : c07e9c64
r10: c6bcc560? r9 : c646a220? r8 : c66a0000
r7 : c6a00000? r6 : c0204e56? r5 : 30687461? r4 : 30687461
r3 : 00000000? r2 : 00000010? r1 : c0204e56? r0 : ffffffff
Flags: nzCv? IRQs on? FIQs on? Mode SVC_32? ISA ARM? Segment kernel
Control: 0005397f? Table: 065a4000? DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc07e8270)
Stack: (0xc07e9c28 to 0xc07ea000)
?????????函數的調用關系:die("Oops", regs, fsr);---à???? __die(str, err, thread, regs);
下面是__die函數的定義:
static void __die(const char *str, int err, struct thread_info *thread, struct pt_regs *regs)
{
????????? struct task_struct *tsk = thread->task;
????????? static int die_counter;
/*Internal error: Oops: 1 [#1]*/
????????? printk(KERN_EMERG "Internal error: %s: %x [#%d]" S_PREEMPT S_SMP "\n",
????????? ?????? str, err, ++die_counter);
/*last sysfs file: /sys/devices/form/tpm/cfg_l2/l2_rule_add*/
????????? sysfs_printk_last_file();
/*內核中加載的模塊信息Modules linked in: splic mmp(P) */
????????? print_modules();
/*打印寄存器信息*/
????????? __show_regs(regs);
/*Process swapper (pid: 0, stack limit = 0xc07e8270) tsk->comm? task_struct結構體中的comm表示的是除去路徑后的可執行文件名稱,這里的swapper為idle進程,進程號為0,創建內核進程init;其中stack limit = 0xc07e8270 ?指向thread_info的結束地址。*/
????????? printk(KERN_EMERG "Process %.*s (pid: %d, stack limit = 0x%p)\n",
?????????????????? TASK_COMM_LEN, tsk->comm, task_pid_nr(tsk), thread + 1);
/* dump_mem?函數打印從棧頂到當前sp之間的內容*/
????????? if (!user_mode(regs) || in_interrupt()) {
?????????????????? dump_mem(KERN_EMERG, "Stack: ", regs->ARM_sp,
??????????????????????????? ?THREAD_SIZE + (unsigned long)task_stack_page(tsk));
?????????????????? dump_backtrace(regs, tsk);
?????????????????? dump_instr(KERN_EMERG, regs);
????????? }
}
?????????在上面的函數中,主要使用了thread_info,task_struct,sp之間的指向關系。task_struct結構體的成員stack是棧底,也是對應thread_info結構體的地址。堆棧數據是從
棧底+8K的地方開始向下存的。SP指向的是當前的棧頂。(unsigned long)task_stack_page(tsk),
#define task_stack_page(task)??????? ((task)->stack)?,該宏根據task_struct得到棧底,也就是thread_info地址。
#define task_thread_info(task)?????? ((struct thread_info *)(task)->stack),該宏根據task_struct得到thread_info指針。
(5)dump_backtrace函數
?????????該函數用于打印函數的調用關系。Fp為幀指針,用于追溯程序的方式,方向跟蹤調用函數。該函數主要是fp進行檢查,看看能否進行backtrace,如果可以就調用匯編的c_backtrace,在arch/arm/lib/backtrace.S函數中。
static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
{
???????? unsigned int fp, mode;
???????? int ok = 1;
?
???????? printk("Backtrace: ");
?
???????? if (!tsk)
?????????????????? tsk = current;
?
???????? if (regs) {
?????????????????? fp = regs->ARM_fp;
?????????????????? mode = processor_mode(regs);
???????? } else if (tsk != current) {
?????????????????? fp = thread_saved_fp(tsk);
?????????????????? mode = 0x10;
???????? } else {
?????????????????? asm("mov %0, fp" : "=r" (fp) : : "cc");
?????????????????? mode = 0x10;
???????? }
?
???????? if (!fp) {
?????????????????? printk("no frame pointer");
?????????????????? ok = 0;
???????? } else if (verify_stack(fp)) {
?????????????????? printk("invalid frame pointer 0x%08x", fp);
?????????????????? ok = 0;
???????? } else if (fp < (unsigned long)end_of_stack(tsk))
?????????????????? printk("frame pointer underflow");
???????? printk("\n");
?
???????? if (ok)
?????????????????? c_backtrace(fp, mode);
}
(6)dump_instr
根據PC指針和指令mode,?打印出當前執行的指令碼
Code: 0a000008 e5944000 e2545000 0a000005 (e4153010)
內核中函數的調用關系
總結
以上是生活随笔為你收集整理的Linux内核Crash分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 马斯克强制员工坐班 德工会回怼:支持所有
- 下一篇: 最简单的拨号方案