FP寄存器
Role in the procedure call standard r15 PC The Program Counter. r14 LR The Link Register. r13 SP The Stack Pointer. r12 IP The Intra-Procedure-call scratch register. (可簡單的認為暫存SP)
實際上,還有一個r11是optional的,被稱為FP,即frame pointer。
1,stack frame stack我們都知道,每一個進程都有自己的棧。考慮進程執(zhí)行時發(fā)生函數(shù)調用的場景,母函數(shù)和子函數(shù)使用的是同一個棧,在通常的情況下,我們并不需要區(qū)分母函數(shù)和子函數(shù)分別使用了棧的哪個部分。但是,當我們需要在執(zhí)行過程中對函數(shù)調用進行backtrace的時候,這一信息就很重要了。 簡單的說,stack frame就是一個函數(shù)所使用的stack的一部分,所有函數(shù)的stack frame串起來就組成了一個完整的棧。stack frame的兩個邊界分別由FP和SP來限定。
2,backtrace 在程序執(zhí)行過程中(通常是發(fā)生了某種意外情況而需要進行調試),通過SP和FP所限定的stack frame,就可以得到母函數(shù)的SP和FP,從而得到母函數(shù)的stack frame(PC,LR,SP,FP會在函數(shù)調用的第一時間壓棧),以此追溯,即可得到所有函數(shù)的調用順序。
3,gcc關于stack frame的優(yōu)化選項 看起來FP只是在backtrace的時候有用,所以如果我們沒有backstrace的需求,我們是否可以不使用FP。 其實gcc就有一個關于stack frame的優(yōu)化選項: -fomit-frame-pointer ================================================================================= Don't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines.
On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn't exist. The machine-description macro "FRAME_POINTER_REQUIRED" controls whether a target machine supports this flag.
==================================================================================
這里引用別人關于這一參數(shù)的實驗,自己就不做了。
從實驗可以看出,優(yōu)化后的差別是相當明顯的。當然,具體能帶來多大的性能提升,不好界定。
另外,x86中EBP寄存器相當于ARM中的FP寄存器。
==================================================================================
http://blog.csdn.net/byzs/article/details/2220461
環(huán)境:X86+Redhat 9.0,gcc 3.2.2
源文件如下:
$ cat test.c?
void a(unsigned long a, unsigned int b)
{
??????? unsigned long i;
??????? unsigned int j;
??????? i = a;
??????? j = b;
??????? i++;
??????? j += 2;
}
默認編譯選項:
$ gcc -c test.c -o with_SFP.o
反匯編后是這個樣子:
$ objdump -D with_SFP.o
with_SFP.o:???? file format elf32-i386
Disassembly of section .text:
00000000 <a>:
?? 0:?? 55????????????????????? push?? %ebp
?? 1:?? 89 e5?????????????????? mov??? %esp,%ebp
?? 3:?? 83 ec 08??????????????? sub??? $0x8,%esp
?? 6:?? 8b 45 08??????????????? mov??? 0x8(%ebp),%eax
?? 9:?? 89 45 fc??????????????? mov??? %eax,0xfffffffc(%ebp)
?? c:?? 8b 45 0c??????????????? mov??? 0xc(%ebp),%eax
?? f:?? 89 45 f8??????????????? mov??? %eax,0xfffffff8(%ebp)
? 12:?? 8d 45 fc??????????????? lea??? 0xfffffffc(%ebp),%eax
? 15:?? ff 00?????????????????? incl?? (%eax)
? 17:?? 8d 45 f8??????????????? lea??? 0xfffffff8(%ebp),%eax
? 1a:?? 83 00 02??????????????? addl?? $0x2,(%eax)
? 1d:?? c9????????????????????? leave??
? 1e:?? c3????????????????????? ret????
Disassembly of section .data:
可以看到函數(shù)ENTER時首先把上一層函數(shù)的EBP入棧,設置本函數(shù)的EBP,然后會根據(jù)臨時變量的數(shù)量和對齊要求去設置ESP,也就產生了函數(shù)的stack frame。
我們再看看函數(shù)的返回:"leave"指令相當于"mov %ebp,%esp;pop %ebp",也就是ENTER是兩條指令的恢復過程,所以,后面的"ret"指令和"call"指令對應。
這里backtrace就可以根據(jù)現(xiàn)有函數(shù)EBP指針得知上一個函數(shù)的EBP----棧底再往上保存著上一個函數(shù)的EBP和EIP,然后就可以得知函數(shù)調用的路徑。
SFP是可以在編譯時候優(yōu)化掉的,用"-fomit-frame-pointer"選項
編譯:
$ gcc -fomit-frame-pointer -c test.c -o no_SFP.o
$ objdump -D no_SFP.o
no_SFP.o:???? file format elf32-i386
Disassembly of section .text:
00000000 <a>:
?? 0:?? 83 ec 08??????????????? sub??? $0x8,%esp
?? 3:?? 8b 44 24 0c???????????? mov??? 0xc(%esp,1),%eax
?? 7:?? 89 44 24 04???????????? mov??? %eax,0x4(%esp,1)
?? b:?? 8b 44 24 10???????????? mov??? 0x10(%esp,1),%eax
?? f:?? 89 04 24??????????????? mov??? %eax,(%esp,1)
? 12:?? 8d 44 24 04???????????? lea??? 0x4(%esp,1),%eax
? 16:?? ff 00?????????????????? incl?? (%eax)
? 18:?? 89 e0?????????????????? mov??? %esp,%eax
? 1a:?? 83 00 02??????????????? addl?? $0x2,(%eax)
? 1d:?? 83 c4 08??????????????? add??? $0x8,%esp
? 20:?? c3????????????????????? ret????
Disassembly of section .data:
這里把EBP省掉了,ESP兼職了EBP的部分工作(索引臨時變量)。
顯而易見,代碼難懂了;-P, 代碼執(zhí)行長度縮短了,應該能引起效率的提升。 可惡的是,不能用backtrace調試了。
看一下arm下面的情況:
含有SFP的版本:
$ arm-linux-objdump -D SFP_arm.o
SFP_arm.o :???? file format elf32-littlearm
Disassembly of section .text:
00000000 <a>:
?? 0:?? e1a0c00d??????? mov???? ip, sp
?? 4:?? e92dd800??????? stmdb?? sp!, {fp, ip, lr, pc}
?? 8:?? e24cb004??????? sub???? fp, ip, #4????? ; 0x4
?? c:?? e24dd010??????? sub???? sp, sp, #16???? ; 0x10
? 10:?? e50b0010??????? str???? r0, [fp, -#16]
? 14:?? e50b1014??????? str???? r1, [fp, -#20]
? 18:?? e51b3010??????? ldr???? r3, [fp, -#16]
? 1c:?? e50b3018??????? str???? r3, [fp, -#24]
? 20:?? e51b3014??????? ldr???? r3, [fp, -#20]
? 24:?? e50b301c??????? str???? r3, [fp, -#28]
? 28:?? e51b3018??????? ldr???? r3, [fp, -#24]
? 2c:?? e2833001??????? add???? r3, r3, #1????? ; 0x1
? 30:?? e50b3018??????? str???? r3, [fp, -#24]
? 34:?? e51b301c??????? ldr???? r3, [fp, -#28]
? 38:?? e2833002??????? add???? r3, r3, #2????? ; 0x2
? 3c:?? e50b301c??????? str???? r3, [fp, -#28]
? 40:?? e91ba800??????? ldmdb?? fp, {fp, sp, pc}
Disassembly of section .data:
優(yōu)化后的版本:
$ arm-linux-objdump -D no_SFP_arm.o
no_SFP_arm.o:???? file format elf32-littlearm
Disassembly of section .text:
00000000 <a>:
?? 0:?? e24dd010??????? sub???? sp, sp, #16???? ; 0x10
?? 4:?? e58d000c??????? str???? r0, [sp, #12]
?? 8:?? e58d1008??????? str???? r1, [sp, #8]
?? c:?? e59d300c??????? ldr???? r3, [sp, #12]
? 10:?? e58d3004??????? str???? r3, [sp, #4]
? 14:?? e59d3008??????? ldr???? r3, [sp, #8]
? 18:?? e58d3000??????? str???? r3, [sp]
? 1c:?? e59d3004??????? ldr???? r3, [sp, #4]
? 20:?? e2833001??????? add???? r3, r3, #1????? ; 0x1
? 24:?? e58d3004??????? str???? r3, [sp, #4]
? 28:?? e59d3000??????? ldr???? r3, [sp]
? 2c:?? e2833002??????? add???? r3, r3, #2????? ; 0x2
? 30:?? e58d3000??????? str???? r3, [sp]
? 34:?? e28dd010??????? add???? sp, sp, #16???? ; 0x10
? 38:?? e1a0f00e??????? mov???? pc, lr
Disassembly of section .data:
這里,"fp"充當了"EBP"的角色,ESP在X86里面被leave隱含的恢復好了,所以沒有顯示設置的必要。
看起來arm平臺上"-fomit-frame-pointer"選項的優(yōu)化作用更加明顯。?
總結
- 上一篇: java 固定电话_Java 编写过滤手
- 下一篇: 取模与取余的不同