上一篇 Mach-O应用 fishhook动态修改C函数 了解了fishhook的原理,现在来看一下它的代码,看它是如何一步一步替换原有函数实现的。
我们再来看看rebind_symbols这个对外的接口,其中应用到的C函数作用如下:
_dyld_image_count(void)
当前dyld装载的image数量
_dyld_get_image_header(unit32_t image_index)
返回image对应的Mach Header地址
_dyld_get_image_vmaddr_slide(unit32_t image_index)
虚拟内存中的地址偏移量
对实现的分析会 rebind_symbols 函数为入口,首先看一下函数的调用栈:
1 2 3 4 5 6 int rebind_symbols (struct rebinding rebindings[], size_t rebindings_nel) ;└── extern void _dyld_register_func_for_add_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide)); static void _rebind_symbols_for_image(const struct mach_header *header, intptr_t slide)└── static void rebind_symbols_for_image (struct rebindings_entry *rebindings, const struct mach_header *header, intptr_t slide) └── static void perform_rebinding_with_section (struct rebindings_entry *rebindings, section_t *section, intptr_t slide, nlist_t *symtab, char *strtab, uint32_t *indirect_symtab)
其实函数调用栈非常简单,因为整个库中也没有几个函数,rebind_symbols 作为接口,其主要作用就是注册一个函数并在镜像加载时回调:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 int rebind_symbols (struct rebinding rebindings[], size_t rebindings_nel) { int retval = prepend_rebindings(&_rebindings_head, rebindings, rebindings_nel); if (retval < 0 ) return retval; if (!_rebindings_head->next) { _dyld_register_func_for_add_image(_rebind_symbols_for_image); } else { uint32_t c = _dyld_image_count(); for (uint32_t i = 0 ; i < c; i++) { _rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i)); } } return retval; }
在 rebind_symbols 最开始执行时,会先调用一个 prepend_rebindings 的函数,将整个 rebindings 数组添加到 _rebindings_head 这个私有数据结构的头部:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 static int prepend_rebindings (struct rebindings_entry **rebindings_head, struct rebinding rebindings[], size_t nel) { struct rebindings_entry *new_entry = malloc (sizeof (struct rebindings_entry)); if (!new_entry) { return -1 ; } new_entry->rebindings = malloc (sizeof (struct rebinding) * nel); if (!new_entry->rebindings) { free (new_entry); return -1 ; } memcpy (new_entry->rebindings, rebindings, sizeof (struct rebinding) * nel); new_entry->rebindings_nel = nel; new_entry->next = *rebindings_head; *rebindings_head = new_entry; return 0 ; }
也就是说每次调用的 rebind_symbols 方法传入的 rebindings 数组以及数组的长度都会以 rebindings_entry 的形式添加到 _rebindings_head 这个私有链表的首部:
1 2 3 4 5 6 7 struct rebindings_entry { struct rebinding *rebindings ; size_t rebindings_nel; struct rebindings_entry *next ; }; static struct rebindings_entry *_rebindings_head ;
这样可以通过判断 _rebindings_head->next 的值来判断是否为第一次调用,然后使用 _dyld_register_func_for_add_image 将 _rebind_symbols_for_image 注册为回调或者为所有存在的镜像单独调用 _rebind_symbols_for_image:
1 2 3 static void _rebind_symbols_for_image(const struct mach_header *header, intptr_t slide) { rebind_symbols_for_image(_rebindings_head, header, slide); }
_rebind_symbols_for_image 只是对另一个名字非常相似的函数 rebind_symbols_for_image 的封装,从这个函数开始,就到了重绑定符号的过程;不过由于这个方法的实现比较长,具体分析会分成三个部分并省略一些不影响理解的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 static void rebind_symbols_for_image (struct rebindings_entry *rebindings, const struct mach_header *header, intptr_t slide) { segment_command_t *cur_seg_cmd; segment_command_t *linkedit_segment = NULL ; struct symtab_command * symtab_cmd = NULL ; struct dysymtab_command * dysymtab_cmd = NULL ; uintptr_t cur = (uintptr_t )header + sizeof (mach_header_t ); for (uint i = 0 ; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) { cur_seg_cmd = (segment_command_t *)cur; if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) { if (strcmp (cur_seg_cmd->segname, SEG_LINKEDIT) == 0 ) { linkedit_segment = cur_seg_cmd; } } else if (cur_seg_cmd->cmd == LC_SYMTAB) { symtab_cmd = (struct symtab_command*)cur_seg_cmd; } else if (cur_seg_cmd->cmd == LC_DYSYMTAB) { dysymtab_cmd = (struct dysymtab_command*)cur_seg_cmd; } } ... }
这部分的代码主要功能是从镜像中查找 linkedit_segment symtab_command 和 dysymtab_command;在开始查找之前,要先跳过 mach_header_t 长度的位置,然后将当前指针强转成 segment_command_t,通过对比 cmd 的值,来找到需要的 segment_command_t。
在查找了几个关键的 segment 之后,我们可以根据几个 segment 获取对应表的内存地址:
1 2 3 4 5 6 7 8 9 10 11 static void rebind_symbols_for_image (struct rebindings_entry *rebindings, const struct mach_header *header, intptr_t slide) { ... uintptr_t linkedit_base = (uintptr_t )slide + linkedit_segment->vmaddr - linkedit_segment->fileoff; nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff); char *strtab = (char *)(linkedit_base + symtab_cmd->stroff); uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff); ... }
在 linkedit_segment 结构体中获得其虚拟地址以及文件偏移量,然后通过一下公式来计算当前 __LINKEDIT 段的位置:
slide + vmaffr - fileoff
类似地,在 symtab_command 中获取符号表偏移量和字符串表偏移量,从 dysymtab_command 中获取间接符号表(indirect symbol table)偏移量,就能够获得_符号表_、_字符串表_以及_间接符号表_的引用了。
该函数的最后一部分就开启了遍历模式,查找整个镜像中的 SECTION_TYPE 为 S_LAZY_SYMBOL_POINTERS 或者 S_NON_LAZY_SYMBOL_POINTERS 的 section,然后调用下一个函数 perform_rebinding_with_section 来对 section 中的符号进行处理:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 static void perform_rebinding_with_section (struct rebindings_entry *rebindings, section_t *section, intptr_t slide, nlist_t *symtab, char *strtab, uint32_t *indirect_symtab) { uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1; void **indirect_symbol_bindings = (void **)((uintptr_t )slide + section->addr); for (uint i = 0 ; i < section->size / sizeof (void *); i++) { uint32_t symtab_index = indirect_symbol_indices[i]; uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx; char *symbol_name = strtab + strtab_offset; struct rebindings_entry *cur = rebindings; while (cur) { for (uint j = 0 ; j < cur->rebindings_nel; j++) { if (strcmp (&symbol_name[1 ], cur->rebindings[j].name) == 0 ) { if (cur->rebindings[j].replaced != NULL && indirect_symbol_bindings[i] != cur->rebindings[j].replacement) { *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i]; } indirect_symbol_bindings[i] = cur->rebindings[j].replacement; goto symbol_loop; } } cur = cur->next; } symbol_loop:; } }
该函数的实现的核心内容就是将符号表中的 symbol_name 与 rebinding 中的名字 name 进行比较,如果出现了匹配,就会将原函数的实现传入 origian_open 函数指针的地址,并使用新的函数实现 new_open 代替原实现:
1 2 3 4 if (cur->rebindings[j].replaced != NULL && indirect_symbol_bindings[i] != cur->rebindings[j].replacement) { *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i]; }
indirect_symbol_bindings[i] = cur->rebindings[j].replacement; // 使用新的函数实现 new_open 替换原实现 如果你理解了上面的实现代码,该函数的其它代码就很好理解了:
通过 indirect_symtab + section->reserved1 获取 indirect_symbol_indices *,也就是符号表的数组
通过 (void **)((uintptr_t)slide + section->addr) 获取函数指针列表 indirect_symbol_bindings
遍历符号表数组 indirect_symbol_indices * 中的所有符号表中,获取其中的符号表索引 symtab_index
通过符号表索引 symtab_index 获取符号表中某一个 n_list 结构体,得到字符串表中的索引 symtab[symtab_index].n_un.n_strx
最后在字符串表中获得符号的名字 char *symbol_name
到这里比较前的准备工作就完成了,剩下的代码会遍历整个 rebindings_entry 数组,在其中查找匹配的符号,完成函数实现的替换:
1 2 3 4 5 6 7 8 9 10 11 12 13 while (cur) { for (uint j = 0 ; j < cur->rebindings_nel; j++) { if (strcmp (&symbol_name[1 ], cur->rebindings[j].name) == 0 ) { if (cur->rebindings[j].replaced != NULL && indirect_symbol_bindings[i] != cur->rebindings[j].replacement) { *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i]; } indirect_symbol_bindings[i] = cur->rebindings[j].replacement; goto symbol_loop; } } cur = cur->next; }
在之后对某一函数的调用(例如 open),当查找其函数实现时,都会查找到 new_open 的函数指针;在 new_open 调用 origianl_open 时,同样也会执行原有的函数实现,因为我们通过 *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i] 将原函数实现绑定到了新的函数指针上。