mach-o格式分析
0x00 摘要
人生無根蒂,飄如陌上塵。 分散逐風(fēng)轉(zhuǎn),此已非常身。
— 陶淵明 《雜詩》
mach-o格式是OS X系統(tǒng)上的可執(zhí)行文件格式,類似于windows的PE與linux的ELF,如果不徹底搞清楚mach-o的格式與相關(guān)知識,去做其他研究,無異于建造空中閣樓。
每個Mach-O文件斗包含一個Mach-O頭,然后是載入命令(Load Commands),最后是數(shù)據(jù)塊(Data)。
接下來就對整個Mach-O的格式做出詳細(xì)的分析。
0x01 Mach-O格式簡單介紹
Mach-O文件的格式如下圖所示:
又如下幾個部分組成:
- Header:保存了Mach-O的一些基本信息,包括了平臺、文件類型、LoadCommands的個數(shù)等等。
- LoadCommands:這一段緊跟Header,加載Mach-O文件時會使用這里的數(shù)據(jù)來確定內(nèi)存的分布。
- Data:每一個segment的具體數(shù)據(jù)都保存在這里,這里包含了具體的代碼、數(shù)據(jù)等等。
0x02 Headers
2.1 數(shù)據(jù)結(jié)構(gòu)
Headers的定義可以在開源的內(nèi)核代碼中找到。
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | /* * The 32-bit mach header appears at the very beginning of the object file for * 32-bit architectures. */ struct mach_header { uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; /* type of file */ uint32_t ncmds; /* number of load commands */ uint32_t sizeofcmds; /* the size of all the load commands */ uint32_t flags; /* flags */ }; /* Constant for the magic field of the mach_header (32-bit architectures) */ #define MH_MAGIC 0xfeedface /* the mach magic number */ #define MH_CIGAM 0xcefaedfe /* NXSwapInt(MH_MAGIC) */ /* * The 64-bit mach header appears at the very beginning of object files for * 64-bit architectures. */ struct mach_header_64 { uint32_t magic; /* mach magic number identifier */ cpu_type_t cputype; /* cpu specifier */ cpu_subtype_t cpusubtype; /* machine specifier */ uint32_t filetype; /* type of file */ uint32_t ncmds; /* number of load commands */ uint32_t sizeofcmds; /* the size of all the load commands */ uint32_t flags; /* flags */ uint32_t reserved; /* reserved */ }; /* Constant for the magic field of the mach_header_64 (64-bit architectures) */ #define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */ #define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */ |
根據(jù)mach_header與mach_header_64的定義,很明顯可以看出,Headers的主要作用就是幫助系統(tǒng)迅速的定位Mach-O文件的運行環(huán)境,文件類型。
2.2 實例
使用工具分析一個mach-o文件來具體的看一下Mach-O Headers。
通過otool可以得到Mach header的具體的情況,但是可讀性略微有一點差。
| 1 2 3 4 5 | ? bin otool -h git git: Mach header magic cputype cpusubtype caps filetype ncmds sizeofcmds flags 0xfeedfacf 16777223 3 0x80 2 17 1432 0x00200085 |
還有一個工具是MachOview可以看的更清楚一點。
- MagicNumber的值為0xFEEDFACF所以該文件是一個64位平臺上的文件
- CPU Type和CPU SubType也很容易理解,運行在X86_64的CPU平臺上
- File Type標(biāo)示了該文件是一個可執(zhí)行文件,后面具體分析
- Flags標(biāo)示了這個MachO文件的四個特性,后面具體分析
2.3 具體參數(shù)
2.3.1 FileType
因為Mach-O文件不僅僅用來實現(xiàn)可執(zhí)行文件,同時還用來實現(xiàn)了其他內(nèi)容
- 內(nèi)核擴展
- 庫文件
- CoreDump
- …
他的源碼定義如下:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | #define MH_OBJECT 0x1 /* relocatable object file */ #define MH_EXECUTE 0x2 /* demand paged executable file */ #define MH_FVMLIB 0x3 /* fixed VM shared library file */ #define MH_CORE 0x4 /* core file */ #define MH_PRELOAD 0x5 /* preloaded executable file */ #define MH_DYLIB 0x6 /* dynamically bound shared library */ #define MH_DYLINKER 0x7 /* dynamic link editor */ #define MH_BUNDLE 0x8 /* dynamically bound bundle file */ #define MH_DYLIB_STUB 0x9 /* shared library stub for static */ /* linking only, no section contents */ #define MH_DSYM 0xa /* companion file with only debug */ /* sections */ #define MH_KEXT_BUNDLE 0xb /* x86_64 kexts */ |
解釋一下一些常用到的文件類型。
| MH_OBJECT | 編譯過程中產(chǎn)生的*.obj文件 | gcc -c xxx.c 生成xxx.o文件 |
| MH_EXECUTABLE | 可執(zhí)行二進(jìn)制文件 | /usr/bin/git |
| MH_CORE | CoreDump | 崩潰時的Dump文件 |
| MH_DYLIB | 動態(tài)庫 | /usr/lib/里面的那些庫文件 |
| MH_DYLINKER | 連接器linker | /usr/lib/dyld文件 |
| MH_KEXT_BUNDLE | 內(nèi)核擴展文件 | 自己開發(fā)的簡單內(nèi)核模塊 |
2.3.2 flags
Mach-O headers還包含了一些很重要的dyld的加載參數(shù)。代碼中的定義如下:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | #define MH_INCRLINK 0x2 /* the object file is the output of an incremental link against a base file and can't be link edited again */ #define MH_DYLDLINK 0x4 /* the object file is input for the dynamic linker and can't be staticly link edited again */ #define MH_BINDATLOAD 0x8 /* the object file's undefined references are bound by the dynamic linker when loaded. */ #define MH_PREBOUND 0x10 /* the file has its dynamic undefined references prebound. */ #define MH_SPLIT_SEGS 0x20 /* the file has its read-only and read-write segments split */ #define MH_LAZY_INIT 0x40 /* the shared library init routine is to be run lazily via catching memory faults to its writeable segments (obsolete) */ #define MH_TWOLEVEL 0x80 /* the image is using two-level name space bindings */ ... //太長,有興趣可以自己看源碼 // EXTERNAL_HEADERS/mach-o/x86_64/loader.h |
同樣簡單的介紹幾個比較重要的。
| MH_NOUNDEFS | 目標(biāo)沒有未定義的符號,不存在鏈接依賴 |
| MH_DYLDLINK | 該目標(biāo)文件是dyld的輸入文件,無法被再次的靜態(tài)鏈接 |
| MH_PIE | 允許隨機的地址空間 |
| MH_ALLOW_STACK_EXECUTION | 棧內(nèi)存可執(zhí)行代碼,一般是默認(rèn)關(guān)閉的。 |
| MH_NO_HEAP_EXECUTION | 堆內(nèi)存無法執(zhí)行代碼 |
2.4 Headers小結(jié)
0x03 Load Commands
這是load_command的數(shù)據(jù)結(jié)構(gòu)
| 1 2 3 4 | struct load_command { uint32_t cmd; /* type of load command */ uint32_t cmdsize; /* total size of command in bytes */ }; |
Load Commands 直接就跟在Header后面,所有command占用內(nèi)存的總和在Mach-O Header里面已經(jīng)給出了。在加載過Header之后就是通過解析LoadCommand來加載接下來的數(shù)據(jù)了。我簡單的看了一下內(nèi)核中是如何解析macho數(shù)據(jù)的,拋開內(nèi)核的實現(xiàn)細(xì)節(jié),邏輯其實也十分簡單。
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | static load_return_t parse_machfile( struct vnode *vp, vm_map_t map, thread_t thread, struct mach_header *header, off_t file_offset, off_t macho_size, int depth, int64_t aslr_offset, int64_t dyld_aslr_offset, load_result_t *result ) { [...] //此處省略大量初始化與檢測 /* * Loop through each of the load_commands indicated by the * Mach-O header; if an absurd value is provided, we just * run off the end of the reserved section by incrementing * the offset too far, so we are implicitly fail-safe. */ offset = mach_header_sz; ncmds = header->ncmds; while (ncmds--) { /* * Get a pointer to the command. */ lcp = (struct load_command *)(addr + offset); //lcp設(shè)為當(dāng)前要解析的cmd的地址 oldoffset = offset; //oldoffset是從macho文件內(nèi)存開始的地方偏移到當(dāng)前command的偏移量 offset += lcp->cmdsize; //重新計算offset,再加上當(dāng)前command的長度,offset的值為文件內(nèi)存起始地址到下一個command的偏移量 /* * Perform prevalidation of the struct load_command * before we attempt to use its contents. Invalid * values are ones which result in an overflow, or * which can not possibly be valid commands, or which * straddle or exist past the reserved section at the * start of the image. */ if (oldoffset > offset || lcp->cmdsize < sizeof(struct load_command) || offset > header->sizeofcmds + mach_header_sz) { ret = LOAD_BADMACHO; break; } //做了一個檢測,與如何加載進(jìn)入內(nèi)存無關(guān) /* * Act on struct load_command's for which kernel * intervention is required. */ switch(lcp->cmd) { case LC_SEGMENT: [...] ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result); break; case LC_SEGMENT_64: [...] ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result); break; case LC_UNIXTHREAD: if (pass != 1) break; ret = load_unixthread( (struct thread_command *) lcp, thread, slide, result); break; case LC_MAIN: if (pass != 1) break; if (depth != 1) break; ret = load_main( (struct entry_point_command *) lcp, thread, slide, result); break; case LC_LOAD_DYLINKER: if (pass != 3) break; if ((depth == 1) && (dlp == 0)) { dlp = (struct dylinker_command *)lcp; dlarchbits = (header->cputype & CPU_ARCH_MASK); } else { ret = LOAD_FAILURE; } break; case LC_UUID: if (pass == 1 && depth == 1) { ret = load_uuid((struct uuid_command *) lcp, (char *)addr + mach_header_sz + header->sizeofcmds, result); } break; case LC_CODE_SIGNATURE: [...] ret = load_code_signature( (struct linkedit_data_command *) lcp, vp, file_offset, macho_size, header->cputype, result); [...] break; #if CONFIG_CODE_DECRYPTION case LC_ENCRYPTION_INFO: case LC_ENCRYPTION_INFO_64: if (pass != 3) break; ret = set_code_unprotect( (struct encryption_info_command *) lcp, addr, map, slide, vp, file_offset, header->cputype, header->cpusubtype); if (ret != LOAD_SUCCESS) { printf("proc %d: set_code_unprotect() error %d " "for file \"%s\"\n", p->p_pid, ret, vp->v_name); /* * Don't let the app run if it's * encrypted but we failed to set up the * decrypter. If the keys are missing it will * return LOAD_DECRYPTFAIL. */ if (ret == LOAD_DECRYPTFAIL) { /* failed to load due to missing FP keys */ proc_lock(p); p->p_lflag |= P_LTERM_DECRYPTFAIL; proc_unlock(p); } psignal(p, SIGKILL); } break; #endif default: /* Other commands are ignored by the kernel */ ret = LOAD_SUCCESS; break; } if (ret != LOAD_SUCCESS) break; } if (ret != LOAD_SUCCESS) break; } [...] //此處略去加載之后的處理代碼 } |
3.1cmdsize字段
這里主要看while循環(huán)剛剛進(jìn)入的時候幾行代碼,來理解是如何通過load_command的cmd字段來解析Macho文件的數(shù)據(jù)。
| 1 2 3 4 5 6 7 8 | ... lcp = (struct load_command *)(addr + offset); //lcp設(shè)為當(dāng)前要解析的cmd的地址 oldoffset = offset; //oldoffset是從macho文件內(nèi)存開始的地方偏移到當(dāng)前command的偏移量 offset += lcp->cmdsize; //重新計算offset,再加上當(dāng)前command的長度,offset的值為文件內(nèi)存起始地址到下一個command的偏移量 ... |
3.2 cmd字段
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | switch(lcp->cmd) { case LC_SEGMENT: [...] ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result); break; case LC_SEGMENT_64: [...] ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result); break; case LC_UNIXTHREAD: if (pass != 1) break; ret = load_unixthread( (struct thread_command *) lcp, thread, slide, result); break; case LC_MAIN: if (pass != 1) break; if (depth != 1) break; ret = load_main( (struct entry_point_command *) lcp, thread, slide, result); break; case LC_LOAD_DYLINKER: if (pass != 3) break; if ((depth == 1) && (dlp == 0)) { dlp = (struct dylinker_command *)lcp; dlarchbits = (header->cputype & CPU_ARCH_MASK); } else { ret = LOAD_FAILURE; } break; case LC_UUID: if (pass == 1 && depth == 1) { ret = load_uuid((struct uuid_command *) lcp, (char *)addr + mach_header_sz + header->sizeofcmds, result); } break; case LC_CODE_SIGNATURE: [...] ret = load_code_signature( (struct linkedit_data_command *) lcp, vp, file_offset, macho_size, header->cputype, result); [...] break; #if CONFIG_CODE_DECRYPTION case LC_ENCRYPTION_INFO: case LC_ENCRYPTION_INFO_64: if (pass != 3) break; ret = set_code_unprotect( (struct encryption_info_command *) lcp, addr, map, slide, vp, file_offset, header->cputype, header->cpusubtype); if (ret != LOAD_SUCCESS) { printf("proc %d: set_code_unprotect() error %d " "for file \"%s\"\n", p->p_pid, ret, vp->v_name); /* * Don't let the app run if it's * encrypted but we failed to set up the * decrypter. If the keys are missing it will * return LOAD_DECRYPTFAIL. */ if (ret == LOAD_DECRYPTFAIL) { /* failed to load due to missing FP keys */ proc_lock(p); p->p_lflag |= P_LTERM_DECRYPTFAIL; proc_unlock(p); } psignal(p, SIGKILL); } break; #endif default: /* Other commands are ignored by the kernel */ ret = LOAD_SUCCESS; break; } |
從這一段代碼可以看出,根據(jù)cmd字段的類型不同,使用了不同的函數(shù)來加載。簡單的列出一張表看一看在內(nèi)核代碼中不同的command類型都有哪些作用。
| LC_SEGMENT;LC_SEGMENT_64 | load_segment | 將segment中的數(shù)據(jù)加載并映射到進(jìn)程的內(nèi)存空間去 |
| LC_LOAD_DYLINKER | load_dylinker | 調(diào)用/usr/lib/dyld程序 |
| LC_UUID | load_uuid | 加載128-bit的唯一ID |
| LC_THREAD | load_thread | 開啟一個MACH線程,但是不分配棧空間。 |
| LC_UNIXTHREAD | load_unixthread | 開啟一個UNIX線程 |
| LC_CODE_SIGNATURE | load_code_signature | 進(jìn)行數(shù)字簽名 |
| LC_ENCRYPTION_INFO | set_code_unprotect | 加密二進(jìn)制文件 |
0x04 Segment&Section
加載數(shù)據(jù)時,主要加載的就是LC_SEGMET活著LC_SEGMENT_64。其他的Segment的用途在上一節(jié)已經(jīng)簡單的介紹了,這里不做深究。
LCSEGMENT以及LC_SEGMENT_64的數(shù)據(jù)結(jié)構(gòu)是這樣的。
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | struct segment_command { /* for 32-bit architectures */ uint32_t cmd; /* LC_SEGMENT */ uint32_t cmdsize; /* includes sizeof section structs */ char segname[16]; /* segment name */ uint32_t vmaddr; /* memory address of this segment */ uint32_t vmsize; /* memory size of this segment */ uint32_t fileoff; /* file offset of this segment */ uint32_t filesize; /* amount to map from the file */ vm_prot_t maxprot; /* maximum VM protection */ vm_prot_t initprot; /* initial VM protection */ uint32_t nsects; /* number of sections in segment */ uint32_t flags; /* flags */ }; struct segment_command_64 { /* for 64-bit architectures */ uint32_t cmd; /* LC_SEGMENT_64 */ uint32_t cmdsize; /* includes sizeof section_64 structs */ char segname[16]; /* segment name */ uint64_t vmaddr; /* memory address of this segment */ uint64_t vmsize; /* memory size of this segment */ uint64_t fileoff; /* file offset of this segment */ uint64_t filesize; /* amount to map from the file */ vm_prot_t maxprot; /* maximum VM protection */ vm_prot_t initprot; /* initial VM protection */ uint32_t nsects; /* number of sections in segment */ uint32_t flags; /* flags */ }; |
可以看出,這里大部分的數(shù)據(jù)是用來幫助內(nèi)核將Segment映射到虛擬內(nèi)存的。主要要關(guān)注的是nsects
字段,標(biāo)示了Segment中有多少secetion。section是具體有用的數(shù)據(jù)存放的地方。
Section的數(shù)據(jù)結(jié)構(gòu)如下:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | struct section { /* for 32-bit architectures */ char sectname[16]; /* name of this section */ char segname[16]; /* segment this section goes in */ uint32_t addr; /* memory address of this section */ uint32_t size; /* size in bytes of this section */ uint32_t offset; /* file offset of this section */ uint32_t align; /* section alignment (power of 2) */ uint32_t reloff; /* file offset of relocation entries */ uint32_t nreloc; /* number of relocation entries */ uint32_t flags; /* flags (section type and attributes)*/ uint32_t reserved1; /* reserved (for offset or index) */ uint32_t reserved2; /* reserved (for count or sizeof) */ }; struct section_64 { /* for 64-bit architectures */ char sectname[16]; /* name of this section */ char segname[16]; /* segment this section goes in */ uint64_t addr; /* memory address of this section */ uint64_t size; /* size in bytes of this section */ uint32_t offset; /* file offset of this section */ uint32_t align; /* section alignment (power of 2) */ uint32_t reloff; /* file offset of relocation entries */ uint32_t nreloc; /* number of relocation entries */ uint32_t flags; /* flags (section type and attributes)*/ uint32_t reserved1; /* reserved (for offset or index) */ uint32_t reserved2; /* reserved (for count or sizeof) */ uint32_t reserved3; /* reserved */ }; |
除了同樣有幫助內(nèi)存映射的變量外,在了解Mach-O格式的時候,只需要知道不同的Section有著不同的作用就可以了。
| __text | 代碼 |
| __cstring | 硬編碼的字符串 |
| __const | const 關(guān)鍵詞修飾過的變量 |
| __DATA.__bss | bss段 |
因為section類型已經(jīng)是最小的分類了,還有更多復(fù)雜section段就不一一例舉了,遇到?jīng)]見過的section類型可以自行查找Apple文檔。
0x05 小結(jié)
通過對Mach-O格式的仔細(xì)分析,可以更好的理解Mach-O文件的加載過程,為研究dyld或者其他OS X系統(tǒng)下的模塊打好基礎(chǔ)。
參考
1.mach-o文件加載的全過程(1)
http://dongaxis.github.io/2015/01/01/mac-o%E6%96%87%E4%BB%B6%E5%8A%A0%E8%BD%BD%E7%9A%84%E5%85%A8%E8%BF%87%E7%A8%8B-1/
2.Mach-O 可執(zhí)行文件
http://objccn.io/issue-6-3/
3.iPhone Mach-O文件格式與代碼簽名
http://zhiwei.li/text/2012/02/15/iphone-mach-o%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E4%B8%8E%E4%BB%A3%E7%A0%81%E7%AD%BE%E5%90%8D/
4.Dynamic Linking of Imported Functions in Mach-O
http://www.codeproject.com/Articles/187181/Dynamic-Linking-of-Imported-Functions-in-Mach-O
5.otool詳解Mach-o文件頭部
http://www.mc2lab.com/?p=68
原文地址: http://turingh.github.io/2016/03/07/mach-o%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E5%88%86%E6%9E%90/
總結(jié)
以上是生活随笔為你收集整理的mach-o格式分析的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Omnispace 收藏夹
- 下一篇: Mach-O的动态链接相关知识