當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

CS:APP第三章知识总结（汇编语言、机器码、寄存器、编译器优化、函数底层实现、浮点指令）

發布時間：2024/3/26 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 CS:APP第三章知识总结（汇编语言、机器码、寄存器、编译器优化、函数底层实现、浮点指令）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

高級語言相對匯編語言的優勢
編譯器優化的選項

高級語言相對匯編語言的優勢

開發效率高。IDE和編譯器會提醒你的錯誤。由于編譯器優化的存在，高級語言在執行效率上的劣勢并不大。
出錯概率
跨平臺

cc是C compiler的縮寫。

編譯器優化的選項

-Og 使機器碼的結構與源代碼相似，避免代碼的變形，通常用于教學。實際使用中，一般使用更高級別的優化，如-O1 or -O2。

關于非法地址錯誤：
At any given time, only limited sub-ranges of virtual addresses are considered valid. For example, x86-64 virtual addresses are represented by 64-bit words. In current implementations of these machines, the upper 16 bits must be set to zero, and so an address can potentially specify a byte
over a range of 248, or 64 terabytes.The operating system manages this virtual address space, translating virtual addresses into the physical addresses of values in the actual processor memory.

$ gcc -Og -S mstore.c
$ gcc -Og -c mstore.c
$ objdump -d mstore.o
objdump據傳是gcc下的反匯編工具。

在link這一步一定需要一個main函數。僅含非main函數的.c也可以生成.s匯編和.o機器碼。main函數的加入會使得尺寸大大增加，因為 it contains not just the machine code for the procedures we provided but also code used to start and terminate the program as well as to interact with the operating system.

linker到底做了什么：
1.shifted the location of the code to a different range of addresses
2. match function calls with the locations of the executable code for
those functions（也就是call命令中會指明被call函數的地址）
3. NOP have been inserted to grow the code for the function to 16 bytes, enabling a better placement of the next block of code in terms of memory system performance.

P205，AT&T格式和Intel格式的匯編代碼有所區別。本科學的是Intel，GCC跟這本書默認使用AT&T。

P208，各寄存器的歷史稱謂和作用。

P209，立即數、寄存器與內存三種操作數的表示。

P212，對64位寄存器進行mov操作時，對低1字節、低2字節的操作不影響高位；對低4字節的操作會清零高位。

Recall that when performing a cast that involves both a size change and a
change of “signedness” in C, the operation should change the size first (Section 2.2.6).

Since the stack is contained in the same memory as the program code and
other forms of program data, programs can access arbitrary positions within the stack using the standard memory addressing methods.（可以隨機訪問的棧，可與STL對比）

In addition, LEA can be used to compactly describe common arithmetic operations.

用cl寄存器做移位操作數：
The higher-order bits are ignored. So, for example, when register %cl has hexadecimal value 0xFF, then instruction salb would shift by 7, while salw would shift by 15, sall would shift by 31, and salq would shift by 63.

使用補碼表示有符號數的一個原因：
We see that most of the instructions shown in Figure 3.10 can be used for either unsigned or two’s-complement arithmetic. This is one of the features that makes two’s-complement arithmetic the preferred way to implement signed integer arithmetic.
有符號數和無符號數的差異：
They use different versions of right shifts, division and multiplication instructions, and different combinations of condition codes.

用rax和rdx拼接成oct word用于乘除法：
multiplying two 64-bit signed or unsigned integers can yield a product that requires 128 bits to represent.

條件跳轉的實現依靠flag寄存器：
In addition to the integer registers, the CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.
flag寄存器中不同位的組合可以表示是正是負、是0是非0、是大是小。
SET系列指令可以取出標志寄存器的內容，放進通用寄存器中。

jmp label (direct jump) 指令中的label在得到.o文件時會被翻譯：
In generating the object-code file, the assembler determines the addresses of all labeled instructions and encodes the jump targets (the addresses of the destination instructions) as part of the jump instructions.
此外也可jmp reg或者jmp mem (indirect jmp)
Conditional jumps can only be direct.

匯編代碼中rep ret的解釋：
AMD recommends using the combination of rep followed by ret to avoid making the ret instruction the destination of a conditional jump instruction. According to AMD, their processors cannot properly predict the destination of a ret instruction when it is reached from a jump instruction. The rep instruction serves as a form of no-operation here, and so inserting it as the jump destination does not change behavior of the code, except to make it faster on AMD processors.
類似的古怪問題可能需要查Intel或者AMD的文檔。

除了條件跳轉，比如jne，還有條件賦值，比如cmovge（詳細列表見P245）。都可以實現條件分支。在書中所舉的例子中（條件分支隨機）后者實際執行的效率更高，因為前者需要為分支預測錯誤付出較高的代價，而后者不需要預測分支。
The flow of control does not depend on data, and this makes it easier for the processor to keep its pipeline full. （P243）
在C層次上，只在if內寫賦值語句即可編譯出條件賦值語句而不是條件跳轉語句。
編譯器無法可靠決斷用條件跳轉還是條件賦值，因為編譯器不知道條件分支的分布。如果分支的內容比較復雜，那條件賦值可能會更慢。
（我自己做實驗時沒發現條件賦值指令,但加上-O1之后出現了cmovg。-Og看來是默認的）

while會被轉為do-while的形式。先單獨判斷一次條件表達式，決定是否跳轉到done。其它與do-while相同。
在-O1下，while、do-while、for都會被轉為guarded-do形式。

用switch代替一堆if-else:
They are particularly useful when dealing with tests where there can be a large number of possible outcomes. Not only do they make the C code more readable, but they also allow an efficient implementation using a data structure called a jump table.
The advantage of using a jump table over a long sequence of if-else statements is that the time taken to perform the switch is independent of the number of switch cases.
P262有一個很好的例子。

函數的內部實現：控制信息傳遞、數據傳遞、內存分配
As P calls Q, control and data information are added to the end of the stack.

many procedures have six or fewer arguments, and so all of their parameters can be passed in registers.

CALL instruction pushes an address A onto the stack and sets the PC to the beginning
of Q. The counterpart instruction ret pops an address A off the stack and sets the PC to A.

如果參數數量超過六個，則通過棧傳參。需要注意的是，通過棧傳參會8字節對齊。

caller調用callee的時候，caller的入參可能不會被馬上用到，而rdi要給callee用，這時候就要先保存rdi；callee返回的結果可能不會馬上用到，而rax要給下一個callee用，這時候就要保存rax。

在匯編語言中，二維數組的實現以第一個下標為高位，第二個下標為低位。

For data type T and integer constant N, consider a declaration of the form T A[N];
Let us denote the starting location as xA. The declaration has two effects. First, it allocates a contiguous region of L . N bytes in memory, where L is the size (in bytes) of data type T . Second, it introduces an identifier A that can be used as a pointer to the beginning of the array. The value of this pointer will be xA.

循環變量是下標變量時，循環變量可能會被優化掉，轉而變成指針在while內更新。這樣可以省掉很多乘法（index=Ni+j變成了ptr+=size或者ptr+=Nsize（此處是移位，因為size為2的冪））。

The struct data type constructor is the closest thing C provides to the objects of C++ and Java.The objects of C++ and Java are more elaborate than structures in C, in that they also associate
a set of methods with an object that can be invoked to perform computation.

對結構體成員變量的訪問，在編譯階段就會被轉換為結構體首地址加偏移量的形式。
The selection of the different fields of a structure is handled completely at compile time. The machine code contains no information about the field declarations or the names of the fields.

聯合的優劣：
Unions can be useful in several contexts. However, they can also lead to nasty bugs, since they bypass the safety provided by the C type system. One application is when we know in advance that the use of two different fields in a data structure will be mutually exclusive. Then, declaring these two fields as part of a union rather than a structure will reduce the total space allocated.
聯合情景下的強制類型轉換P299，這時整型和浮點型轉換前后的字節存儲是相同的。

對齊問題：（K就是數據類型的大小）

關于指針：
Casting from one type of pointer to another changes its type but not its value. Pointers can also point to functions.The value of a function pointer is the address of the first instruction in the machine-code representation of the function.

DDD調試器，圖形化GDB。還有之前講的gef和pwntools。

ASLR：
Thus, even if many machines are running identical code, they would all be using different stack addresses. This is implemented by allocating a random amount of space between 0 and n bytes on the stack at the start of a program, for example, by using the allocation function alloca, which allocates space for a specified number of bytes on the stack.
If we set up a 256-byte nop sled, then the randomization over n = 223 can be cracked by enumerating 215 = 32,768 starting addresses, which is entirely feasible for a determined attacker. For the 64-bit case, trying to enumerate 224 = 16,777,216 is a bit more daunting. We can see that stack randomization and other aspects of ASLR can increase the effort required to successfully attack a system, and therefore greatly reduce the rate at which a virus or worm can spread,
but it cannot provide a complete safeguard.

Buffer canary：
Stack protection does a good job of preventing a buffer overflow attack from corrupting state stored on the program stack. It incurs only a small performance penalty, especially because gcc only inserts it when there is a local buffer of type char in the function.

不可執行棧（NX位）：
Some types of programs require the ability to dynamically generate and execute code. For example, “just-in-time” compilation techniques dynamically generate code for programs written in interpreted languages, such as Java, to improve execution performance. Whether or not the run-time system can restrict the executable code to just that part generated by the compiler in creating the original
program depends on the language and the operating system.

對于變長棧（中括號內是變量），需要使用rbp來幫忙尋找定長的local variable。（如果用rsp來尋址的話，偏移量會與中括號內的變量相關，而用rbp則可以確保偏移量為常數）
不是所有的函數都會使用rbp。如果要用，記住rbp是一個callee-saved寄存器。開頭要保存，結束要恢復。這被稱為convention（哪些是callee-saved、各寄存器的用途，都可算是convention）

指令層面對圖像、視頻、音頻處理的優化：
single instruction, multiple data, or SIMD（P322）
media register被稱為MM，擴展版本包括XMM、YMM，它們被用于存儲浮點數。

當scalar跟vector相對時：
operations like y=a+r, where y and a are vectors, while r is a real scalar. It essentially adds the scalar r to every element of a.
當scalar跟compound相對時：
C++, on the other hand, as well as other higher-level languages, supports operations on user-defined types, which are by definition not scalar, or on other types that have no immediate support from hardware. （built_in類型的一般是scalar，自定義類型的一般是compound。）

the code optimization guidelines recommend that 32-bit memory data satisfy a 4-byte alignment and that 64-bit data satisfy an 8-byte alignment.

Up to eight floating-point arguments can be passed in XMM registers %xmm0–%xmm7. These registers are used in the order the arguments are listed. Additional floating-point arguments can be passed on the stack.

A function that returns a floating-point value does so in register %xmm0.

All XMM registers are caller saved. The callee may overwrite any of these registers without first saving it.

注意，下圖中的十位數是十進制的。

浮點相關的指令助記符（如move類的、compare類的）都很長，遇到時可返回原書查閱。

浮點數之間比較大小時的四種情況（unordered是由于NaN）。在C代碼中比較浮點數的大小應該也要寫四種情況。