内核栈溢出【转】
轉自:http://linuxperf.com/?p=116
在Linux系統上,進程運行分為用戶態與內核態,進入內核態之后使用的是內核棧,作為基本的安全機制,用戶程序不能直接訪問內核棧,所以盡管內核棧屬于進程的地址空間,但與用戶棧是分開的。Linux的內核棧大小是固定的,從2.6.32-520開始缺省大小是16KB,之前的kernel版本缺省大小是8KB。內核棧的大小可以修改,但要通過重新編譯內核實現。以下文件定義了它的大小:
arch/x86/include/asm/page_64_types.h
8KB:
#define THREAD_ORDER 1
16KB:
#define THREAD_ORDER 2
由于內核棧的大小是有限的,就會有發生溢出的可能,比如調用嵌套太多、參數太多都會導致內核棧的使用超出設定的大小。內核棧溢出的結果往往是系統崩潰,因為溢出會覆蓋掉本不該觸碰的數據,首當其沖的就是thread_info — 它就在內核棧的底部,內核棧是從高地址往低地址生長的,一旦溢出首先就破壞了thread_info,thread_info里存放著指向進程的指針等關鍵數據,遲早會被訪問到,那時系統崩潰就是必然的事。
【小知識】:把thread_info放在內核棧的底部是一個精巧的設計,在高端CPU中比如PowerPC、Itanium往往都保留了一個專門的寄存器來存放當前進程的指針,因為這個指針的使用率極高,然而x86的寄存器太少了,專門分配一個寄存器實在太奢侈,所以Linux巧妙地利用了棧寄存器,把thread_info放在內核棧的底部,這樣通過棧寄存器里的指針可以很方便地算出thread_info的地址,而thread_info的第一個字段就是進程的指針。
內核棧溢出導致的系統崩潰有時會被直接報出來,比如你可能會看到:
... Call Trace: [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0 BUG: unable to handle kernel NULL pointer dereference at 00000000000009e8 IP: [<ffffffff8100f4dd>] print_context_stack+0xad/0x140 PGD 5fdb8ae067 PUD 5fdbee9067 PMD 0 Thread overran stack, or stack corrupted Oops: 0000 [#1] SMP ...| 1 2 3 4 5 6 7 8 9 | ... Call Trace: [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0 BUG: unable to handle kernel NULL pointer dereference at 00000000000009e8 IP: [<ffffffff8100f4dd>] print_context_stack+0xad/0x140 PGD 5fdb8ae067 PUD 5fdbee9067 PMD 0 Thread overran stack, or stack corrupted Oops: 0000 [#1] SMP ... |
但更多的情況是不直接報錯,而是各種奇怪的panic。在分析vmcore的時候,它們的共同點是thread_info被破壞了。以下是一個實例,注意在task_struct中stack字段直接指向內核棧底部也就是thread_info的位置,我們看到thread_info顯然被破壞了:cpu的值大得離譜,而且指向task的指針與task_struct的實際地址不匹配:
crash64> struct task_struct ffff8800374cb540 struct task_struct { state = 2, stack = 0xffff8800bae2a000, ... crash64> thread_info 0xffff8800bae2a000 struct thread_info { task = 0xffff8800458efba0, exec_domain = 0xffffffff, flags = 0, status = 0, cpu = 91904, preempt_count = 0, ...| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | crash64> struct task_struct ffff8800374cb540 struct task_struct { ??state = 2, ??stack = 0xffff8800bae2a000, ... crash64> thread_info 0xffff8800bae2a000 struct thread_info { ??task = 0xffff8800458efba0, ??exec_domain = 0xffffffff, ??flags = 0, ??status = 0, ??cpu = 91904, ??preempt_count = 0, ... |
作為一種分析故障的手段,可以監控內核棧的大小和深度,方法如下:
# mount -t debugfs nodev /sys/kernel/debug # echo 1 > /proc/sys/kernel/stack_tracer_enabled| 1 2 | # mount -t debugfs nodev /sys/kernel/debug # echo 1 > /proc/sys/kernel/stack_tracer_enabled |
然后檢查下列數值,可以看到迄今為止內核棧使用的峰值和對應的backtrace:
# cat /sys/kernel/debug/tracing/stack_max_size # cat /sys/kernel/debug/tracing/stack_trace| 1 2 | # cat /sys/kernel/debug/tracing/stack_max_size # cat /sys/kernel/debug/tracing/stack_trace |
你可以寫個腳本定時收集上述數據,有助于找到導致溢出的代碼。下面是一個輸出結果的實例:
# cat /sys/kernel/debug/tracing/stack_max_size 7272 # cat /sys/kernel/debug/tracing/stack_trace Depth Size Location (61 entries) ----- ---- -------- 0) 7080 224 select_task_rq_fair+0x3be/0x980 1) 6856 112 try_to_wake_up+0x14a/0x400 2) 6744 16 wake_up_process+0x15/0x20 3) 6728 16 wakeup_softirqd+0x35/0x40 4) 6712 48 raise_softirq_irqoff+0x4f/0x90 5) 6664 48 __blk_complete_request+0x132/0x140 6) 6616 16 blk_complete_request+0x25/0x30 7) 6600 32 scsi_done+0x2f/0x60 8) 6568 48 megasas_queue_command+0xd1/0x140 [megaraid_sas] 9) 6520 48 scsi_dispatch_cmd+0x1ac/0x340 10) 6472 96 scsi_request_fn+0x415/0x590 11) 6376 32 __generic_unplug_device+0x32/0x40 12) 6344 112 __make_request+0x170/0x500 13) 6232 224 generic_make_request+0x21e/0x5b0 14) 6008 80 submit_bio+0x8f/0x120 15) 5928 112 _xfs_buf_ioapply+0x194/0x2f0 [xfs] 16) 5816 48 xfs_buf_iorequest+0x4f/0xe0 [xfs] 17) 5768 32 xlog_bdstrat+0x2a/0x60 [xfs] 18) 5736 80 xlog_sync+0x1e0/0x3f0 [xfs] 19) 5656 48 xlog_state_release_iclog+0xb3/0xf0 [xfs] 20) 5608 144 _xfs_log_force_lsn+0x1cc/0x270 [xfs] 21) 5464 32 xfs_log_force_lsn+0x18/0x40 [xfs] 22) 5432 80 xfs_alloc_search_busy+0x10c/0x160 [xfs] 23) 5352 112 xfs_alloc_get_freelist+0x113/0x170 [xfs] 24) 5240 48 xfs_allocbt_alloc_block+0x33/0x70 [xfs] 25) 5192 240 xfs_btree_split+0xbd/0x710 [xfs] 26) 4952 96 xfs_btree_make_block_unfull+0x12d/0x190 [xfs] 27) 4856 224 xfs_btree_insrec+0x3ef/0x5a0 [xfs] 28) 4632 144 xfs_btree_insert+0x93/0x180 [xfs] 29) 4488 176 xfs_free_ag_extent+0x414/0x7e0 [xfs] 30) 4312 224 xfs_alloc_fix_freelist+0xf4/0x480 [xfs] 31) 4088 96 xfs_alloc_vextent+0x173/0x600 [xfs] 32) 3992 240 xfs_bmap_btalloc+0x167/0x9d0 [xfs] 33) 3752 16 xfs_bmap_alloc+0xe/0x10 [xfs] 34) 3736 432 xfs_bmapi+0x9f6/0x11a0 [xfs] 35) 3304 272 xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs] 36) 3032 208 xfs_iomap+0x389/0x440 [xfs] 37) 2824 32 xfs_map_blocks+0x2d/0x40 [xfs] 38) 2792 272 xfs_page_state_convert+0x2f8/0x750 [xfs] 39) 2520 80 xfs_vm_writepage+0x86/0x170 [xfs] 40) 2440 32 __writepage+0x17/0x40 41) 2408 304 write_cache_pages+0x1c9/0x4a0 42) 2104 16 generic_writepages+0x24/0x30 43) 2088 48 xfs_vm_writepages+0x5e/0x80 [xfs] 44) 2040 16 do_writepages+0x21/0x40 45) 2024 128 __filemap_fdatawrite_range+0x5b/0x60 46) 1896 48 filemap_write_and_wait_range+0x5a/0x90 47) 1848 320 xfs_write+0xa2f/0xb70 [xfs] 48) 1528 16 xfs_file_aio_write+0x61/0x70 [xfs] 49) 1512 304 do_sync_readv_writev+0xfb/0x140 50) 1208 224 do_readv_writev+0xcf/0x1f0 51) 984 16 vfs_writev+0x46/0x60 52) 968 208 nfsd_vfs_write+0x107/0x430 [nfsd] 53) 760 96 nfsd_write+0xe7/0x100 [nfsd] 54) 664 112 nfsd3_proc_write+0xaf/0x140 [nfsd] 55) 552 64 nfsd_dispatch+0xfe/0x240 [nfsd] 56) 488 128 svc_process_common+0x344/0x640 [sunrpc] 57) 360 32 svc_process+0x110/0x160 [sunrpc] 58) 328 48 nfsd+0xc2/0x160 [nfsd] 59) 280 96 kthread+0x96/0xa0 60) 184 184 child_rip+0xa/0x20| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | # cat /sys/kernel/debug/tracing/stack_max_size 7272 # cat /sys/kernel/debug/tracing/stack_trace ????????Depth????Size?? Location????(61 entries) ????????-----????----?? -------- ??0)???? 7080???? 224?? select_task_rq_fair+0x3be/0x980 ??1)???? 6856???? 112?? try_to_wake_up+0x14a/0x400 ??2)???? 6744??????16?? wake_up_process+0x15/0x20 ??3)???? 6728??????16?? wakeup_softirqd+0x35/0x40 ??4)???? 6712??????48?? raise_softirq_irqoff+0x4f/0x90 ??5)???? 6664??????48?? __blk_complete_request+0x132/0x140 ??6)???? 6616??????16?? blk_complete_request+0x25/0x30 ??7)???? 6600??????32?? scsi_done+0x2f/0x60 ??8)???? 6568??????48?? megasas_queue_command+0xd1/0x140 [megaraid_sas] ??9)???? 6520??????48?? scsi_dispatch_cmd+0x1ac/0x340 10)???? 6472??????96?? scsi_request_fn+0x415/0x590 11)???? 6376??????32?? __generic_unplug_device+0x32/0x40 12)???? 6344???? 112?? __make_request+0x170/0x500 13)???? 6232???? 224?? generic_make_request+0x21e/0x5b0 14)???? 6008??????80?? submit_bio+0x8f/0x120 15)???? 5928???? 112?? _xfs_buf_ioapply+0x194/0x2f0 [xfs] 16)???? 5816??????48?? xfs_buf_iorequest+0x4f/0xe0 [xfs] 17)???? 5768??????32?? xlog_bdstrat+0x2a/0x60 [xfs] 18)???? 5736??????80?? xlog_sync+0x1e0/0x3f0 [xfs] 19)???? 5656??????48?? xlog_state_release_iclog+0xb3/0xf0 [xfs] 20)???? 5608???? 144?? _xfs_log_force_lsn+0x1cc/0x270 [xfs] 21)???? 5464??????32?? xfs_log_force_lsn+0x18/0x40 [xfs] 22)???? 5432??????80?? xfs_alloc_search_busy+0x10c/0x160 [xfs] 23)???? 5352???? 112?? xfs_alloc_get_freelist+0x113/0x170 [xfs] 24)???? 5240??????48?? xfs_allocbt_alloc_block+0x33/0x70 [xfs] 25)???? 5192???? 240?? xfs_btree_split+0xbd/0x710 [xfs] 26)???? 4952??????96?? xfs_btree_make_block_unfull+0x12d/0x190 [xfs] 27)???? 4856???? 224?? xfs_btree_insrec+0x3ef/0x5a0 [xfs] 28)???? 4632???? 144?? xfs_btree_insert+0x93/0x180 [xfs] 29)???? 4488???? 176?? xfs_free_ag_extent+0x414/0x7e0 [xfs] 30)???? 4312???? 224?? xfs_alloc_fix_freelist+0xf4/0x480 [xfs] 31)???? 4088??????96?? xfs_alloc_vextent+0x173/0x600 [xfs] 32)???? 3992???? 240?? xfs_bmap_btalloc+0x167/0x9d0 [xfs] 33)???? 3752??????16?? xfs_bmap_alloc+0xe/0x10 [xfs] 34)???? 3736???? 432?? xfs_bmapi+0x9f6/0x11a0 [xfs] 35)???? 3304???? 272?? xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs] 36)???? 3032???? 208?? xfs_iomap+0x389/0x440 [xfs] 37)???? 2824??????32?? xfs_map_blocks+0x2d/0x40 [xfs] 38)???? 2792???? 272?? xfs_page_state_convert+0x2f8/0x750 [xfs] 39)???? 2520??????80?? xfs_vm_writepage+0x86/0x170 [xfs] 40)???? 2440??????32?? __writepage+0x17/0x40 41)???? 2408???? 304?? write_cache_pages+0x1c9/0x4a0 42)???? 2104??????16?? generic_writepages+0x24/0x30 43)???? 2088??????48?? xfs_vm_writepages+0x5e/0x80 [xfs] 44)???? 2040??????16?? do_writepages+0x21/0x40 45)???? 2024???? 128?? __filemap_fdatawrite_range+0x5b/0x60 46)???? 1896??????48?? filemap_write_and_wait_range+0x5a/0x90 47)???? 1848???? 320?? xfs_write+0xa2f/0xb70 [xfs] 48)???? 1528??????16?? xfs_file_aio_write+0x61/0x70 [xfs] 49)???? 1512???? 304?? do_sync_readv_writev+0xfb/0x140 50)???? 1208???? 224?? do_readv_writev+0xcf/0x1f0 51)??????984??????16?? vfs_writev+0x46/0x60 52)??????968???? 208?? nfsd_vfs_write+0x107/0x430 [nfsd] 53)??????760??????96?? nfsd_write+0xe7/0x100 [nfsd] 54)??????664???? 112?? nfsd3_proc_write+0xaf/0x140 [nfsd] 55)??????552??????64?? nfsd_dispatch+0xfe/0x240 [nfsd] 56)??????488???? 128?? svc_process_common+0x344/0x640 [sunrpc] 57)??????360??????32?? svc_process+0x110/0x160 [sunrpc] 58)??????328??????48?? nfsd+0xc2/0x160 [nfsd] 59)??????280??????96?? kthread+0x96/0xa0 60)??????184???? 184?? child_rip+0xa/0x20 |
轉載于:https://www.cnblogs.com/sky-heaven/p/8566384.html
總結
- 上一篇: mac下为什么光标按方向键只能一个字一个
- 下一篇: iOS多线程编程:线程同步总结 NSCo