自研serverless平台存在一个问题很多年了,引入cpython以后,就不能使用tcmalloc了
否则会直接coredump,这个问题不解决,使用平台的同学就没办法进行内存泄露分析
在一个多部门组成的python和C++的混合脚本上,问题爆发了,由于申请内存是一个部门的模块,释放内存又是另外一个部门的模块,跨部门协作下的内存排查太过困难了
因此还是需要从平台侧解决这个问题
coredump问题
一开始让业务去掉python,看看纯C++代码有没有哪里内存泄露,但不幸的是,依然发生了coredump
关键堆栈中,看起来是dlopen打开了某个动态库触发的问题
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 #0 0x00007fd71df27428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007fd71df2902a in __GI_abort () at abort.c:89 #2 0x00007fd71df697ea in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fd71e082ed8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175 #3 0x00007fd71df7237a in malloc_printerr (ar_ptr=<optimized out>, ptr=<optimized out>, str=0x7fd71e07fcaf "free(): invalid pointer", action=3) at malloc.c:5006 #4 _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3867 #5 0x00007fd71df7653c in __GI___libc_free (mem=<optimized out>) at malloc.c:2968 #6 0x00007fd6e809616c in ?? () from so/libnuma.so.1 #7 0x00007fd7253c267a in call_init (l=0xd65ca00, argc=argc@entry=1, argv=argv@entry=0x7ffe88af4708, env=env@entry=0xa20e6c0) at dl-init.c:58 #8 0x00007fd7253c27cb in call_init (env=0xa20e6c0, argv=0x7ffe88af4708, argc=1, l=<optimized out>) at dl-init.c:30 #9 _dl_init (main_map=main_map@entry=0xd654800, argc=1, argv=0x7ffe88af4708, env=0xa20e6c0) at dl-init.c:120 #10 0x00007fd7253c78e2 in dl_open_worker (a=a@entry=0x7ffe88af0a50) at dl-open.c:575 #11 0x00007fd7253c2564 in _dl_catch_error (objname=objname@entry=0x7ffe88af0a40, errstring=errstring@entry=0x7ffe88af0a48, mallocedp=mallocedp@entry=0x7ffe88af0a3f, operate=operate@entry=0x7fd7253c74d0 <dl_open_worker>, args=args@entry=0x7ffe88af0a50) at dl-error.c:187 #12 0x00007fd7253c6da9 in _dl_open (file=0xa9b9140 "npm/OneLeafProxy@7.2.20/lib/libhycodecsa.so", mode=-2147483638, caller_dlopen= 0x1054469 <leafcore::Loader::sysDLOpen(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+111>, nsid=-2, argc=<optimized out>, argv=<optimized out>, env=0xa20e6c0) at dl-open.c:660 #13 0x00007fd7251aef09 in dlopen_doit (a=a@entry=0x7ffe88af0c80) at dlopen.c:66 #14 0x00007fd7253c2564 in _dl_catch_error (objname=0xa1c4010, errstring=0xa1c4018, mallocedp=0xa1c4008, operate=0x7fd7251aeeb0 <dlopen_doit>, args=0x7ffe88af0c80) at dl-error.c:187 #15 0x00007fd7251af571 in _dlerror_run (operate=operate@entry=0x7fd7251aeeb0 <dlopen_doit>, args=args@entry=0x7ffe88af0c80) at dlerror.c:163 #16 0x00007fd7251aefa1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87 #17 0x0000000001054469 in leafcore::Loader::sysDLOpen (this=0xa8ec780, filename="npm/OneLeafProxy@7.2.20/lib/libhycodecsa.so") at src/loader.cpp:145 #18 0x0000000001054c40 in leafcore::Loader::loadDynamicLibraries (this=0xa8ec780, dylibs=std::vector of length 23, capacity 32 = {...}, libraryDir="", handles=std::map with 9 elements = {...}) at src/loader.cpp:197 #19 0x000000000105f086 in leafcore::CppLoader::load (this=0xa8ec780, binary="\006\000\026\000*\a\000\261vX\177ELF\002\001\001\000\000\000\000\000\000\000\000\000\001\000>\000\001", '\000' <repeats 19 times>, "\330\354\244\000\000\000\000\000\000\000\000\000@\000\000\000\000\000@\000&2\001\000UH\211\345AVSH\203\354 H\211}\320H\211u\330L\213u\320H\270\000\000\000\000\000\000\000\000L\211\367\377\320H\270\000\000\000\000\000\000\000\000H\203\300\020I\211\006L\211\363H\203\303\bH\213u\330H\270\000\000\000\000\000\000\000\000H\211\337\377\320\353\000A\307F(\000\000\000\000H\270\000\000\000\000\000\000\000\000L\211\367\377\320\353\000H\203\304 [A^]\303H\211E\340\211U\354\353\026"..., libraryDir="") at cpp/cpp_loader.cpp:113 #20 0x0000000000bf7fc7 in Engine::addModule (this=this@entry=0xa296000, scriptInfo=..., errMsg="", doPrepare=doPrepare@entry=0) at Engine.cpp:418 #21 0x0000000000bfdc7e in Engine::doLoadModule (this=0xa296000, sScriptName="97b893b5-2aa1-4def-8b9a-6b9e2446c8ea_0", sVersion="2", iError=@0x7ffe88af458c: HUYA::PreinstallRetValue_Success, errMsg="", doPrepare=0) at Engine.cpp:932 #22 0x0000000000b00dda in main (argc=1, argv=0x7ffe88af4708) at main.cpp:121
但是我本地的简单测试用例链接了tcmalloc以后,再打开这个动态库又一点问题都没有
思考了一下,堆栈里面有个关键信息是dlopen中调用glibc的free失败了,提示是无效的指针
1 2 3 4 5 #3 0x00007fd71df7237a in malloc_printerr (ar_ptr=<optimized out>, ptr=<optimized out>, str=0x7fd71e07fcaf "free(): invalid pointer", action=3) at malloc.c:5006 #4 _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3867 #5 0x00007fd71df7653c in __GI___libc_free (mem=<optimized out>) at malloc.c:2968 ... #16 0x00007fd7251aefa1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
问题复现
emmm,free应该被tcmalloc hook掉了,tcmalloc使用mmap和sbrk分配内存,哪里来的glibc的free?
莫非,是tcmalloc的bug?hook失效了?
考虑到serverless平台在dlopen动态库的时候确实有不太常规的操作(使用RTLD_DEEPBIND和RTLD_LOCAL参数),因此我实现了一个简单的demo来复现这个问题
思路上,是模拟遇到的coredump场景,在主程序进行内存申请,在动态库中进行内存释放
flowchart LR
Main[主程序]
Lib[动态库]
Mem[内存块#40;由主程序分配#41;]
Main -->|malloc#40;ptr#41;| Mem
Main -->|freeMemory#40;ptr#41;| Lib
Lib -->|free#40;ptr#41;| Mem
动态库代码很简单:在动态库中进行glibc的内存释放
https://github.com/tedcy/tcmalloc_hook_debug/blob/master/tcmalloc_coredump/dynamic_lib.cpp
1 2 3 4 5 #include <cstdlib> extern "C" void freeMemory (void * ptr) { free (ptr); }
然后是主程序,为了对比问题,我在代码中实现了直接链接和dlsym打开的方式
https://github.com/tedcy/tcmalloc_hook_debug/blob/master/tcmalloc_coredump/main.cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 #include <gperftools/tcmalloc.h> #include <dlfcn.h> extern "C" void freeMemory (void * ptr) ;using namespace std;void my_free (void * ptr) { tc_free (ptr); } int main () { void * ptr = malloc (128 ); if (!ptr) { return 1 ; } free (ptr); ptr = malloc (128 ); if (!ptr) { return 1 ; } freeMemory (ptr); ptr = malloc (128 ); if (!ptr) { return 1 ; } void * handle = dlopen ("./libdynamic1.so" , RTLD_NOW | RTLD_DEEPBIND | RTLD_LOCAL); if (!handle) { return 1 ; } using FreeMemoryFuncT = decltype (&freeMemory); FreeMemoryFuncT freeMemory1 = (FreeMemoryFuncT)dlsym (handle, "freeMemory" ); if (!freeMemory1) { return 1 ; } freeMemory1 (ptr); return 0 ; }
这里dlopen打开的libdynamic1.so是makefile复制出来的
https://github.com/tedcy/tcmalloc_hook_debug/blob/master/tcmalloc_coredump/Makefile
1 2 3 4 5 6 7 8 9 10 11 12 MAIN_DIR := $(shell git rev-parse --show-toplevel) all: libdynamic.so g++ -std=c++14 -g -o main main.cpp -L. -ldynamic -ltcmalloc -lpthread -ldl patchelf --set-rpath . main libdynamic.so: dynamic_lib.cpp g++ -g -shared -fPIC -o libdynamic.so dynamic_lib.cpp cp libdynamic.so libdynamic1.so clean: rm -f main libdynamic.so libdynamic1.so
为什么要这么做呢?
这是因为dlopen打开已经打开的动态库(直接链接的动态库也算dlopen打开的),只会使用之前的缓存(即使这一次dlopen传入参数和上次不同)
用不同的路径或是软链打开也没有用,底层的文件描述符指向的是同一个路径,只有复制一个文件再dlopen才有效
编译运行,果然core了,gdb看下
1 2 3 4 5 6 7 8 9 10 (gdb) bt #0 0x00007f79ff5e7428 in __GI_raise (sig=sig@entry=6 ) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007f79ff5e902a in __GI_abort () at abort .c:89 #2 0x00007f79ff6297ea in __libc_message (do_abort=do_abort@entry=2 , fmt=fmt@entry=0x7f79ff742ed8 "*** Error in `%s': %s: 0x%s ***\n" ) at ../sysdeps/posix/libc_fatal.c:175 #3 0x00007f79ff63237a in malloc_printerr (ar_ptr=<optimized out>, ptr=<optimized out>, str=0x7f79ff73fcaf "free(): invalid pointer" , action=3 ) at malloc .c:5006 #4 _int_free (av=<optimized out>, p=<optimized out>, have_lock=0 ) at malloc .c:3867 #5 0x00007f79ff63653c in __GI___libc_free (mem=<optimized out>) at malloc .c:2968 #6 0x00007f7a00190168 in freeMemory (ptr=0x172a000 ) at dynamic_lib.cpp:4 #7 0x0000000000401289 in main () at main.cpp:44
嗯,完美复现,堆栈和遇到的问题一毛一样
原因分析
看来是使用RTLD_DEEPBIND和RTLD_LOCAL参数导致tcmalloc出现了bug
根据测试发现:
dlopen("./libdynamic1.so", RTLD_NOW | RTLD_DEEPBIND);
会core
dlopen("./libdynamic1.so", RTLD_NOW | RTLD_LOCAL);
不会core
看来是RTLD_DEEPBIND参数的问题,看下man dlopen
RTLD_DEEPBIND (since glibc 2.3.4)
Place the lookup scope of the symbols in this shared object ahead of the global scope. This means that a self-contained object will use its own symbols in preference to global symbols with the same name contained in objects that have already been loaded.
将此共享对象中符号的查找作用域置于全局作用域之前。也就是说,一个自包含的对象会优先使用它自身的符号,而不是那些已被加载的对象中同名的全局符号。
此共享对象中符号的查找作用域(the lookup scope of the symbols in this shared object) 是什么呢?
以libdynamic.so为例,他依赖的动态库是记录在elf格式里面的
1 2 ~ readelf -d libdynamic.so|grep NEEDED 0x0000000000000001 (NEEDED) Shared library: [libc.so.6 ]
使用ldd来看的话,还会列出操作系统的搜索路径
1 2 3 4 ~ ldd libdynamic.so linux-vdso.so.1 => (0x00007ffffaf49000 ) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2a20cf8000 ) /lib64/ld-linux-x86-64. so.2 (0x00007f2a210c2000 )
复杂一些的例子,比如ssh
1 2 3 4 5 6 7 8 ~ readelf -d /usr/bin/ssh|grep NEEDED 0x0000000000000001 (NEEDED) Shared library: [libselinux.so.1 ] 0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.1 .0 .0 ] 0x0000000000000001 (NEEDED) Shared library: [libdl.so.2 ] 0x0000000000000001 (NEEDED) Shared library: [libz.so.1 ] 0x0000000000000001 (NEEDED) Shared library: [libresolv.so.2 ] 0x0000000000000001 (NEEDED) Shared library: [libgssapi_krb5.so.2 ] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6 ]
由于他的动态库还会依赖其他动态库,因此ldd来看会复杂许多
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ~ ldd /usr/bin/ssh linux-vdso.so.1 => (0x00007ffc493ca000 ) libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f7ae1732000 ) libcrypto.so.1 .0 .0 => /lib/x86_64-linux-gnu/libcrypto.so.1 .0 .0 (0x00007f7ae12ee000 ) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7ae10ea000 ) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f7ae0ed0000 ) libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f7ae0cb5000 ) libgssapi_krb5.so.2 => /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f7ae0a6b000 ) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7ae06a1000 ) libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f7ae0431000 ) /lib64/ld-linux-x86-64. so.2 (0x00007f7ae1c04000 ) libkrb5.so.3 => /usr/lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f7ae015f000 ) libk5crypto.so.3 => /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f7adff30000 ) libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f7adfd2c000 ) libkrb5support.so.0 => /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f7adfb21000 ) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7adf904000 ) libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f7adf700000 )
那么此共享对象中符号的查找作用域(the lookup scope of the symbols in this shared object) ,也就是说直接依赖的动态库
将此共享对象中符号的查找作用域置于全局作用域之前 ,也就是说我不管你全局作用域这个函数符号是咋样的,我就只调用我直接依赖的动态库
tcmalloc hook了glibc的malloc,free等内存分配函数,是基于动态链接的符号覆盖来做的。
里面涉及到的got表等具体原理可以参看我之前写的博客https://weakyon.com/2022/09/12/magical-effect-of-hook.html
这里不做展开,简单地说,tcmalloc在他的代码里面实现了一个叫malloc的函数,由于优先级高过glibc的,因此hook了glibc的malloc
RTLD_DEEPBIND打破了这个简单的hook规则
解决办法
做一个深度的hook,把glibc的malloc和free等内存分配函数在内存中的汇编代码,修改成跳转到hook函数,就能解决这个问题了
hook代码如下:
https://github.com/tedcy/tcmalloc_hook_debug/blob/master/tcmalloc_fix_coredump/hook.cpp
思路上,是先把页面权限从只读改成可写,然后写入到被hook函数开头:
0xFF2500000000的6字节绝对地址跳转指令(这个指令的含义可以看另外一篇博客https://weakyon.com/2025/08/28/analyzing-the-source-of-LLVM-MCJIT.html#ff-25jmpq)
hook函数的8字节地址
合计16字节
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 #include <sys/mman.h> #include <dlfcn.h> #include <cstring> #include <stdlib.h> #include <stdint.h> void simple_hook (void *sym, void * targetFunc) { unsigned char patch[14 ] = {0xFF , 0x25 , 0x00 , 0x00 , 0x00 , 0x00 }; memcpy (&patch[6 ], &targetFunc, 8 ); void * pstart = reinterpret_cast<void *>(reinterpret_cast<uint64_t >(sym) & 0xFFFFFFFFFFFFF000 ); if (mprotect(pstart, 4096 , PROT_READ | PROT_WRITE | PROT_EXEC) != 0 ) abort (); memcpy (sym, patch, sizeof (patch)); if (mprotect(pstart, 4096 , PROT_READ | PROT_EXEC) != 0 ) abort (); }
main函数需要把被hook的free地址从glibc中加载出来,传入sym
同样的,需要把hook的tc_free地址也加载出来,传入targetFunc
由于当前主程序已经hook成功了,所以RTLD_DEFAULT的free符号指向了tcmalloc的free
完整代码如下:
https://github.com/tedcy/tcmalloc_hook_debug/blob/master/tcmalloc_fix_coredump/main.cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 void hookGlibc () { const char * library_path = "/lib/x86_64-linux-gnu/libc.so.6" ; void * handle = dlopen (library_path, RTLD_LOCAL | RTLD_DEEPBIND | RTLD_NOW); if (!handle) { cerr << "Failed to open library: " << dlerror () << endl; quick_exit (0 ); } void * symbol = dlsym (handle, "free" ); if (symbol) { cout << "Symbol found in " << library_path << " at address: " << symbol << endl; } else { cerr << "Failed to find symbol in " << library_path << ": " << dlerror () << endl; quick_exit (0 ); } void * hookSymbol = dlsym (RTLD_DEFAULT,"free" ); if (hookSymbol) { cout << "Symbol found in RTLD_DEFAULT at address: " << hookSymbol << endl; } else { cerr << "Failed to find symbol in RTLD_DEFAULT: " << dlerror () << endl; quick_exit (0 ); } simple_hook (symbol, hookSymbol); } int main () { hookGlibc (); ...省略 }
验证
运行以后,不再出现coredump,问题解决
gdb验证下glibc的free是否被正确覆盖了
查看不用tcmalloc和使用tcmalloc的free符号
首先写一个简单程序看下不带tcmalloc的时候,free指向的哪个符号?
1 2 3 4 5 6 #include <stdlib.h> int main () { void * ptr = malloc (128 ); free (ptr); return 0 ; }
编译运行
1 2 3 4 5 6 7 8 9 10 11 12 13 ~ g++ -std =c++14 -g -o main main.cpp ~ gdb main (gdb) b main Breakpoint 1 at 0x401419 : file main.cpp, line 2. (gdb) r Starting program: /root/tcmalloc_hook_debug/tcmalloc_fix_coredump/main Breakpoint 1 , main () at main.cpp:37 void * ptr = malloc (128 ); (gdb) n free (ptr) (gdb) n return 0;
这时free已经执行完了,free的got表被填充好了
搜索下got表的plt桩,可以发现有好多个
1 2 3 4 5 6 7 8 (gdb) info func free \@plt All functions matching regular expression "free\@plt" : Non-debugging symbols: 0x00000000004010b0 free @plt0x00007ffff7bd3ce0 free @plt0x00007ffff78d8db0 free @plt0x00007ffff6f6a7e0 free @plt
第一个看起来是主程序的,dump下确认下
1 2 (gdb) info symbol 0x00000000004010b0 free@plt in section .plt of /root/tcmalloc_hook_debug/tcmalloc_fix_coredump/main
然后看下plt桩的汇编代码指向哪个got表
1 2 3 4 5 (gdb) disassemble 0x00000000004010b0 Dump of assembler code for function free @plt: 0x00000000004010b0 <+0 >: jmpq *0x2f8a (%rip) # 0x404040 0x00000000004010b6 <+6 >: pushq $0x8 0x00000000004010bb <+11 >: jmpq 0x401020
那么打印got表里面存储的真实free地址
1 2 3 4 5 6 7 8 9 10 (gdb) x/gx 0x404040 0x404040: 0x00007ffff750b4f0 (gdb) disassemble 0x00007ffff750b4f0 Dump of assembler code for function __GI___libc_free: 0x00007ffff750b4f0 <+0>: push %r13 0x00007ffff750b4f2 <+2>: push %r12 0x00007ffff750b4f4 <+4>: push %rbp 0x00007ffff750b4f5 <+5>: push %rbx 0x00007ffff750b4f6 <+6>: sub $0x28,%rsp 0x00007ffff750b4fa <+10>: mov 0x33f9f7(%rip),%rax # 0x7ffff784aef8
可知glibc的free是__GI___libc_free,同理,把tcmalloc一起编译以后按这个步骤打印,此时的free指向的是tc_free
所以要验证的就是__GI___libc_free是否正确跳转到tc_free
验证hook成功
在执行hook前打印下__GI___libc_free
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (gdb) b main Breakpoint 1 at 0x401409: file main.cpp, line 36. (gdb) r Starting program: /root/tcmalloc_hook_debug/tcmalloc_fix_coredump/main Breakpoint 1, main () at main.cpp:36 36 hookGlibc(); (gdb) disassemble __GI___libc_free Dump of assembler code for function __GI___libc_free: 0x00007ffff71144f0 <+0>: push %r13 0x00007ffff71144f2 <+2>: push %r12 0x00007ffff71144f4 <+4>: push %rbp 0x00007ffff71144f5 <+5>: push %rbx 0x00007ffff71144f6 <+6>: sub $0x28,%rsp 0x00007ffff71144fa <+10>: mov 0x33f9f7(%rip),%rax # 0x7ffff7453ef8
在hook后再打印下glibc的free内容
1 2 3 4 5 6 7 (gdb) n Symbol found in /lib/x86_64-linux-gnu/libc.so.6 at address: 0x7ffff71144f0 Symbol found in RTLD_DEFAULT at address: 0x7ffff7a16f00 40 void *ptr = malloc(128); (gdb) disassemble __GI___libc_free Dump of assembler code for function __GI___libc_free: 0x00007ffff71144f0 <+0>: jmpq *0x0(%rip) # 0x7ffff71144f6 <__GI___libc_free+6>
可以看到已经变成了jmpq *0x0(%rip),也就是跳转到下一个指令0x7ffff71144f6里面记录的地址
看下这个地址内存储是什么
1 2 3 4 5 6 7 8 (gdb) x/gx 0x7ffff71144f6 0x7ffff71144f6 <__GI___libc_free+6>: 0x00007ffff7a16f00 (gdb) disassemble 0x00007ffff7a16f00 Dump of assembler code for function tc_free(void*): 0x00007ffff7a16f00 <+0>: mov 0x3b81f9(%rip),%rax # 0x7ffff7dcf100 <_ZN4base8internal13delete_hooks_E> 0x00007ffff7a16f07 <+7>: test %rax,%rax 0x00007ffff7a16f0a <+10>: jne 0x7ffff7a16fa0 <tc_free(void*)+160> 0x00007ffff7a16f10 <+16>: mov 0x211e11(%rip),%rax # 0x7ffff7c28d28
嗯,正是tc_free,验证完毕
hook的函数列表
要hook的api,都在tcmalloc的一个文件里面
https://github.com/gperftools/gperftools/blob/gperftools-2.7/src/gperftools/tcmalloc.h.in#L87-L106
编写简单的代码全部hook一遍就ok了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 #include <gperftools/tcmalloc.h> void * getGlibc () { const char * library_path = "/lib/x86_64-linux-gnu/libc.so.6" ; void * handle = dlopen(library_path, RTLD_LOCAL | RTLD_DEEPBIND | RTLD_NOW); if (!handle) { cerr << "Failed to open library: " << dlerror() << endl ; quick_exit(0 ); } return handle; } #define HOOK_FUNC(libc_func) \ do { \ void *f = dlsym(getGlibc(), #libc_func); \ assert(f); \ simple_hook(f, (void*)tc_##libc_func); \ if (!libc_func##_f) { \ cout << "hook " #libc_func " failed" << std::endl; \ std::quick_exit(0); \ } \ } while (0) #define HOOK_FUNC_RENAME(libc_func, tcmalloc_func) \ do { \ void *f = dlsym(getGlibc(), #libc_func); \ assert(f); \ simple_hook(f, (void*)tcmalloc_func); \ if (!libc_func##_f) { \ cout << "hook " #libc_func " failed" << std::endl; \ std::quick_exit(0); \ } \ } while (0) void hookAll () { HOOK_FUNC(malloc ); HOOK_FUNC(free ); HOOK_FUNC(realloc ); HOOK_FUNC(calloc ); HOOK_FUNC(cfree); HOOK_FUNC(memalign); HOOK_FUNC(posix_memalign); HOOK_FUNC(valloc); HOOK_FUNC(pvalloc); HOOK_FUNC(malloc_stats); HOOK_FUNC(mallopt); HOOK_FUNC_RENAME(malloc_usable_size, tc_malloc_size); }
内存统计不全问题
但是我发现高兴的还是太早了,在某些场景下,发现有些内存没有采集到
简化成了这个case:
首先,在动态库执行mmap申请一部分内存
https://github.com/tedcy/tcmalloc_hook_debug/blob/master/tcmalloc_miss_mmap_hook/dynamic_lib.cpp
1 2 3 4 5 6 7 #include <cstdlib> #include <sys/mman.h> extern "C" void * test () { return mmap(NULL , 4 * 1024 * 1024 , PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1 , 0 ); }
随后在主程序dlopen加载并使用,随后dump出内存消耗
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 #include <iostream> #include <fstream> #include <dlfcn.h> #include <assert.h> using namespace std;#include <gperftools/heap-profiler.h> int main () { void *handle = dlopen ("./libdynamic.so" , RTLD_NOW | RTLD_LOCAL); if (!handle) { return 1 ; } using testFuncT = void *(*)(); testFuncT testFunc = (testFuncT)dlsym (handle, "test" ); if (!testFunc) { return 1 ; } HeapProfilerStart ("" ); for (int i = 0 ;i < 10 ;i++) { testFunc (); } string s = GetHeapProfile (); HeapProfilerStop (); fstream f; f.open ("./allbin.hprof" , ios_base::out); f << s; f.close (); return 0 ; }
根据文档https://gperftools.github.io/gperftools/heapprofile.html
设置环境变量HEAP_PROFILE_MMAP,手动打开mmap采集来运行
1 ~ HEAP_PROFILE_MMAP=1 ./main
把吐出的allbin.hprof用pprof解析一下(ignore选项是HEAP_PROFILE_MMAP文档中提到的)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 ~ pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc' --text --lines ./main allbin.hprof Using local file ./main. Using local file allbin.hprof. Total: 44.6 MB 40.0 97.6 % 97.6 % 40.0 97.6 % test /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/dynamic_lib.cpp:7 1.0 2.4 % 100.0 % 1.0 2.4 % base::subtle::NoBarrier_CompareAndSwap (inline ) /root/tcmalloc_hook_debug/gperftools/build/../src/base/atomicops-internals-x86.h:81 0.0 0.0 % 100.0 % 1.0 2.4 % GetHeapProfile /root/tcmalloc_hook_debug/gperftools/build/../src/heap-profiler.cc:213 0.0 0.0 % 100.0 % 1.0 2.4 % SpinLock::Lock (inline ) /root/tcmalloc_hook_debug/gperftools/build/../src/base/spinlock.h:69 0.0 0.0 % 100.0 % 1.0 2.4 % SpinLockHolder::SpinLockHolder (inline ) /root/tcmalloc_hook_debug/gperftools/build/../src/base/spinlock.h:133 0.0 0.0 % 100.0 % 41.0 100.0 % __libc_start_main /build/glibc-Cl5G7W/glibc-2.23 /csu/../csu/libc-start.c:291 0.0 0.0 % 100.0 % 41.0 100.0 % _start ??:0 0.0 0.0 % 100.0 % 1.0 2.4 % base::subtle::Acquire_CompareAndSwap (inline ) /root/tcmalloc_hook_debug/gperftools/build/../src/base/atomicops-internals-x86.h:109 0.0 0.0 % 100.0 % 40.0 97.6 % main /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main.cpp:24 0.0 0.0 % 100.0 % 1.0 2.4 % main /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main.cpp:27
此时解析出来dynamic_lib.cpp里面用mmap申请了40MB的内存,这是符合预期的
那么把dlopen("./libdynamic.so", RTLD_NOW | RTLD_LOCAL);
增加RTLD_DEEPBIND,改成dlopen("./libdynamic.so", RTLD_NOW | RTLD_DEEPBIND| RTLD_LOCAL);再试试看?
1 2 3 4 5 6 7 8 9 10 11 12 13 ~ pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc' --text --l ines ./main allbin.hprof Using local file ./main. Using local file allbin.hprof. Total: 4.6 MB 1.0 100.0% 100.0% 1.0 100.0% base::subtle::NoBarrier_CompareAndSwap (inline) /root/tcmalloc_hook_debug/gperftools/build/../src/base/atomicops-internals-x86.h:81 0.0 0.0% 100.0% 1.0 100.0% GetHeapProfile /root/tcmalloc_hook_debug/gperftools/build/../src/heap-profiler.cc:213 0.0 0.0% 100.0% 1.0 100.0% SpinLock::Lock (inline) /root/tcmalloc_hook_debug/gperftools/build/../src/base/spinlock.h:69 0.0 0.0% 100.0% 1.0 100.0% SpinLockHolder::SpinLockHolder (inline) /root/tcmalloc_hook_debug/gperftools/build/../src/base/spinlock.h:133 0.0 0.0% 100.0% 1.0 100.0% __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:291 0.0 0.0% 100.0% 1.0 100.0% _start ??:0 0.0 0.0% 100.0% 1.0 100.0% base::subtle::Acquire_CompareAndSwap (inline) /root/tcmalloc_hook_debug/gperftools/build/../src/base/atomicops-internals-x86.h:109 0.0 0.0% 100.0% 1.0 100.0% main /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main.cpp:27
复现了,dynamic_lib.cpp里面的mmap没有采集到
解决办法
simple_hook可能存在的问题
那还是需要hook,但是这里又不太一样了
上一小节里面hook的函数列表,tcmalloc完全实现了glibc的全部api,因此直接修改glibc的free跳转到到tcmalloc的free,glibc的free直接作废了
而mmap这个api,是tcmalloc用来申请内存的通道,要是直接作废了,tcmalloc也用不了了
解决思路是:写一段代码确认正常使用mmap和sbrk的时候用的什么符号,看tcmalloc本身会不会用
也就是https://github.com/tedcy/tcmalloc_hook_debug/blob/master/tcmalloc_mmap_sbrk/main.cpp
1 2 3 4 5 6 7 8 #include <unistd.h> #include <sys/mman.h> int main () { sbrk(10 ); mmap(NULL , 4 * 1024 * 1024 , PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1 , 0 ); return 0 ; }
先看sbrk的使用符号
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 (gdb) b main Breakpoint 1 at 0x40116a : file main.cpp, line 4. (gdb) r Starting program: /root/tcmalloc_hook_debug/tcmalloc_fix_mmap_hook/tmp/main Breakpoint 1 , main () at main.cpp:4 4 sbrk(10 );(gdb) n 6 MAP_ANONYMOUS | MAP_PRIVATE, -1 , 0 );(gdb) n 7 return 0 ;(gdb) info func sbrk\@ All functions matching regular expression "sbrk\@" : Non-debugging symbols: 0x0000000000401050 sbrk@plt(gdb) disassemble 0x0000000000401050 Dump of assembler code for function sbrk@plt: 0x0000000000401050 <+0 >: jmpq *0x2fba (%rip) # 0x404010 0x0000000000401056 <+6 >: pushq $0x2 0x000000000040105b <+11 >: jmpq 0x401020 End of assembler dump. (gdb) x/gx 0x404010 0x404010 : 0x00007ffff7b09e80 (gdb) disassemble 0x00007ffff7b09e80 Dump of assembler code for function __GI___sbrk: 0x00007ffff7b09e80 <+0 >: push %r12 0x00007ffff7b09e82 <+2 >: mov 0x2c703f (%rip),%r12 # 0x7ffff7dd0ec8 0x00007ffff7b09e89 <+9 >: push %rbp
再看mmap的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 (gdb) info func mmap\@ All functions matching regular expression "mmap\@" : Non-debugging symbols: 0x0000000000401030 mmap@plt(gdb) disassemble 0x0000000000401030 Dump of assembler code for function mmap@plt: 0x0000000000401030 <+0 >: jmpq *0x2fca (%rip) # 0x404000 0x0000000000401036 <+6 >: pushq $0x0 0x000000000040103b <+11 >: jmpq 0x401020 End of assembler dump. (gdb) x/gx 0x404000 0x404000 : 0x00007ffff7b0e680 (gdb) disassemble 0x00007ffff7b0e680 Dump of assembler code for function __mmap: 0x00007ffff7b0e680 <+0 >: test %rdi,%rdi 0x00007ffff7b0e683 <+3 >: push %r15 0x00007ffff7b0e685 <+5 >: mov %r9,%r15
这两个符号都是glibc的
1 2 3 4 (gdb) info symbol __mmap mmap64 in section .text of /lib/x86_64-linux-gnu/libc.so.6 (gdb) info symbol __GI___sbrk sbrk in section .text of /lib/x86_64-linux-gnu/libc.so.6
然后看下代码,tcmalloc分别是如何实现sbrk和mmap的
tcmalloc的sbrk实现
https://github.com/gperftools/gperftools/blob/gperftools-2.7/src/malloc_hook_mmap_linux.h#L211-L218
1 2 3 4 5 6 7 8 9 extern "C" void * __sbrk(intptr_t increment);extern "C" void * sbrk (intptr_t increment) __THROW { MallocHook::InvokePreSbrkHook(increment); void *result = __sbrk(increment); MallocHook::InvokeSbrkHook(result, increment); return result; }
嗯,直接调用的glibc的符号,那看来不能直接覆盖掉glibc的sbrk
tcmalloc的mmap实现
https://github.com/gperftools/gperftools/blob/gperftools-2.7/src/malloc_hook_mmap_linux.h#L173-L184
1 2 3 4 5 6 7 8 9 10 11 12 extern "C" void * mmap (void *start, size_t length, int prot, int flags, int fd, off_t offset) __THROW { MallocHook::InvokePreMmapHook(start, length, prot, flags, fd, offset); void *result; if (!MallocHook::InvokeMmapReplacement( start, length, prot, flags, fd, offset, &result)) { result = do_mmap64(start, length, prot, flags, fd, static_cast<size_t >(offset)); } MallocHook::InvokeMmapHook(result, start, length, prot, flags, fd, offset); return result; }
接着看do_mmap64
https://github.com/gperftools/gperftools/blob/gperftools-2.7/src/malloc_hook_mmap_linux.h#L61C1-L65C2
1 2 3 4 5 static inline void * do_mmap64 (void *start, size_t length, int prot, int flags, int fd, __off64_t offset) __THROW { return sys_mmap(start, length, prot, flags, fd, offset); }
阿,这里调用的就是linux的接口了
https://github.com/gperftools/gperftools/blob/gperftools-2.7/src/base/linux_syscall_support.h#L2800-L2805
1 2 3 4 5 6 7 8 if defined (__x86_64__) LSS_INLINE void * LSS_NAME (mmap) (void *s, size_t l, int p, int f, int d, int64_t o) { LSS_BODY(6 , void *, mmap, LSS_SYSCALL_ARG(s), LSS_SYSCALL_ARG(l), LSS_SYSCALL_ARG(p), LSS_SYSCALL_ARG(f), LSS_SYSCALL_ARG(d), (uint64_t )(o)); }
所以没有用glibc的mmap封装,直接进系统调用了
小结
那么mmap依然可以使用simple_hook,而mmap就需要使用更强力的hook工具了
PFishHook
也就是https://github.com/Menooker/PFishHook
PFishHook copies a few bytes at the head of the target function to a new "shadown function". Then it replace the head of the target function with a jump to the function specified by the user. And it returns the address of the "shadown function" to users. PFishHook 将目标函数头部的几个字节复制到一个新的“shadown 函数”中。然后将目标函数的头部替换为跳转到用户指定的函数。最后将“shadown 函数”的地址返回给用户。
The "shadown function" has the same functionality of the original function. “shadown 函数”具有与原始函数相同的功能。
以hook free为例
把free函数hook到my_free,而这个my_free里面调用PFishHook创建的shadown 函数
1 2 3 4 5 6 7 8 9 10 HookStatus HookIt (void * oldfunc, void ** poutold, void * newfunc) ; using free_t = decltype(&name); static free_t free_f = nullptr;void my_free (void *ptr) { free_f(ptr); } void main () { HookIt(free_f, dlsym(getGlibc(), "free" ), (void *)&my_free) }
hook前
1 2 3 4 5 6 7 8 9 10 11 (gdb) x/10i __GI___libc_free 0x7ffff6dcf4f0 <__GI___libc_free>: push %r13 0x7ffff6dcf4f2 <__GI___libc_free+2>: push %r12 0x7ffff6dcf4f4 <__GI___libc_free+4>: push %rbp 0x7ffff6dcf4f5 <__GI___libc_free+5>: push %rbx 0x7ffff6dcf4f6 <__GI___libc_free+6>: sub $0x28,%rsp 0x7ffff6dcf4fa <__GI___libc_free+10>: mov 0x33f9f7(%rip),%rax # 0x7ffff710eef8 0x7ffff6dcf501 <__GI___libc_free+17>: mov (%rax),%rax 0x7ffff6dcf504 <__GI___libc_free+20>: test %rax,%rax 0x7ffff6dcf507 <__GI___libc_free+23>: jne 0x7ffff6dcf5e0 <__GI___libc_free+240> 0x7ffff6dcf50d <__GI___libc_free+29>: test %rdi,%rdi
hook后
1 2 3 4 5 6 7 8 9 10 11 (gdb) x/10i __GI___libc_free 0x7ffff6dcf4f0 <__GI___libc_free>: jmpq *0x0(%rip) # 0x7ffff6dcf4f6 <__GI___libc_free+6> 0x7ffff6dcf4f6 <__GI___libc_free+6>: xchg %eax,%esi 0x7ffff6dcf4f7 <__GI___libc_free+7>: push %rbp 0x7ffff6dcf4f8 <__GI___libc_free+8>: add %al,(%rax) 0x7ffff6dcf4fb <__GI___libc_free+11>: add %al,(%rax) 0x7ffff6dcf4fd <__GI___libc_free+13>: add %cl,%ah 0x7ffff6dcf4ff <__GI___libc_free+15>: int3 0x7ffff6dcf500 <__GI___libc_free+16>: int3 0x7ffff6dcf501 <__GI___libc_free+17>: mov (%rax),%rax 0x7ffff6dcf504 <__GI___libc_free+20>: test %rax,%rax
可以看到,也是用的0xff25跳转到了0x7ffff6dcf4f6里面指向的地址
这个地址里面的值是
1 2 (gdb) x/gx 0x7ffff6dcf4f6 0x7ffff6dcf4f6 <__GI___libc_free+6>: 0x0000000000405596
这个值指向的函数正是my_free
1 2 (gdb) x/x 0x405596 0x405596 <my_free(void*)>: 0x55
再看下my_free里面调用的shadown 函数
1 2 3 4 5 6 7 8 9 10 11 (gdb) x/10i free_f 0x7ffff654b018: push %r13 0x7ffff654b01a: push %r12 0x7ffff654b01c: push %rbp 0x7ffff654b01d: push %rbx 0x7ffff654b01e: sub $0x28,%rsp 0x7ffff654b022: mov 0xbc3ecf(%rip),%rax # 0x7ffff710eef8 0x7ffff654b029: jmpq *0x0(%rip) # 0x7ffff654b02f 0x7ffff654b02f: add %esi,%ebp 0x7ffff654b031: fdiv %st,%st(6) 0x7ffff654b033: (bad)
确实,和原来的free函数一毛一样
解决sbrk的hook
但是tcmalloc的sbrk代码写死了,没法改,只能把代码复制出来,然后让glibc的srbk指向他
graph LR
new_glibc_sbrk["new_glibc_sbrk"]
my_sbrk["my_sbrk(从tcmalloc_sbrk复制过来改掉对glibc的调用)"]
old_glibc_sbrk["old_glibc_sbrk(PFishHook的shadown函数)"]
new_glibc_sbrk --> my_sbrk
my_sbrk --> old_glibc_sbrk
代码如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 #include <gperftools/malloc_hook.h> void MallocHook::InvokePreSbrkHook (ptrdiff_t increment) { InvokePreSbrkHookSlow(increment); } void MallocHook::InvokeSbrkHook (const void * result, ptrdiff_t increment) { InvokeSbrkHookSlow(result, increment); } extern "C" void *my_sbrk (intptr_t increment) __THROW { MallocHook::InvokePreSbrkHook(increment); void *result = sbrk_f(increment); MallocHook::InvokeSbrkHook(result, increment); return result; }
验证
完整实例在这里
https://github.com/tedcy/tcmalloc_hook_debug/blob/master/tcmalloc_fix_mmap_hook/main.cpp
编译运行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ~ ./main Starting tracking the heap ~ pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc' --text --li nes ./main allbin.hprof Using local file ./main. Using local file allbin.hprof. Total: 166.8 MB 40.0 49.4% 49.4% 40.0 49.4% test /root/tcmalloc_hook_debug/tcmalloc_fix_mmap_hook/dynamic_lib.cpp:5 40.0 49.4% 98.8% 40.0 49.4% test /root/tcmalloc_hook_debug/tcmalloc_fix_mmap_hook/dynamic_lib.cpp:8 1.0 1.2% 100.0% 1.0 1.2% base::subtle::NoBarrier_CompareAndSwap (inline) /root/tcmalloc_hook_debug/gperftools/build/../src/base/atomicops-internals-x86.h:81 0.0 0.0% 100.0% 1.0 1.2% GetHeapProfile /root/tcmalloc_hook_debug/gperftools/build/../src/heap-profiler.cc:213 0.0 0.0% 100.0% 1.0 1.2% SpinLock::Lock (inline) /root/tcmalloc_hook_debug/gperftools/build/../src/base/spinlock.h:69 0.0 0.0% 100.0% 1.0 1.2% SpinLockHolder::SpinLockHolder (inline) /root/tcmalloc_hook_debug/gperftools/build/../src/base/spinlock.h:133 0.0 0.0% 100.0% 81.0 100.0% __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:291 0.0 0.0% 100.0% 81.0 100.0% _start ??:0 0.0 0.0% 100.0% 1.0 1.2% base::subtle::Acquire_CompareAndSwap (inline) /root/tcmalloc_hook_debug/gperftools/build/../src/base/atomicops-internals-x86.h:109 0.0 0.0% 100.0% 80.0 98.8% main /root/tcmalloc_hook_debug/tcmalloc_fix_mmap_hook/main.cpp:128 0.0 0.0% 100.0% 1.0 1.2% main /root/tcmalloc_hook_debug/tcmalloc_fix_mmap_hook/main.cpp:133
可以看到dynamic_lib.cpp的mmap消耗也成功统计到了
总结
本文遇到了dlopen使用RTLD_DEEPBIND选项以后,tcmalloc的hook失效,导致coredump和统计不全的问题
因此需要帮tcmalloc实现hook:
附录
GetHeapProfile和GetHeapSample
有两种方式都可以dump出pprof来
1 2 3 4 5 6 7 HeapProfilerStart ("" );for (int i = 0 ;i < 10 ;i++) { testFunc (); } string s = GetHeapProfile (); HeapProfilerStop ();
或者是
1 2 3 4 5 for (int i = 0 ;i < 10 ;i++) { testFunc (); } string s; MallocExtension::instance ()->GetHeapSample (&s)
两者有什么区别呢?
快速源码分析
简单的进行一个快速源码分析,根据gperftools-2.7,在https://github.com/gperftools/gperftools/blob/gperftools-2.7/src/tcmalloc.cc中
入口函数tc_malloc的堆栈调用大概如下:
GetHeapSample
是否触发采样逻辑如下:
1 2 3 4 5 6 7 8 9 10 11 12 inline bool ThreadCache::SampleAllocation (size_t k) { return !sampler_.RecordAllocation (k); } inline bool Sampler::RecordAllocation (size_t k) { if (static_cast <size_t >(bytes_until_sample_) < k) { bool result = RecordAllocationSlow (k); return result; } else { bytes_until_sample_ -= k; return true ; } }
可以看到,维护了一个全局的bytes_until_sample_,由环境变量TCMALLOC_SAMPLE_PARAMETER控制,建议值为512KB
也就是每分配满512KB,就会进行一次采样,举个例子:
每次分配1M的都会有采样(大于512KB)
A函数申请511次1KB,然后B函数申请1KB,此时B函数就会被采样
GetHeapProfile
使用GetHeapProfile之前,需要调用HeapProfilerStart
1 2 3 4 5 extern "C" void HeapProfilerStart (const char * prefix) { ... MallocHook::AddNewHook (&NewHook); MallocHook::AddDeleteHook (&DeleteHook); }
此时base::internal::new_hooks_被填入hook函数
那么分配/释放路径会强制走“慢路径”
性能测试
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 #include <iostream> #include <vector> #include <chrono> #include <gperftools/heap-profiler.h> constexpr int kAllocCount = 5000000 ; constexpr int kSize = 64 ; double test_alloc_free () { std::vector<void *> ptrs; ptrs.reserve (kAllocCount); auto start = std::chrono::high_resolution_clock::now (); for (int i = 0 ; i < kAllocCount; ++i) { ptrs.push_back (::operator new (kSize)); } for (int i = 0 ; i < kAllocCount; ++i) { ::operator delete (ptrs[i]) ; } auto end = std::chrono::high_resolution_clock::now (); std::chrono::duration<double > diff = end - start; return diff.count (); } int main () { std::cout << "Running allocation benchmark with " << kAllocCount << " x " << kSize << " bytes...\n\n" ; double t1 = test_alloc_free (); std::cout << "[No profiler] Time = " << t1 << " s" << std::endl; HeapProfilerStart ("" ); double t2 = test_alloc_free (); HeapProfilerStop (); std::cout << "[With HeapProfiler] Time = " << t2 << " s" << std::endl; std::cout << "\nOverhead: " << (t2 / t1 - 1.0 ) * 100.0 << " % slower\n" ; return 0 ; }
在Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz的洋垃圾上,运行结果
1 2 3 4 5 6 7 8 ~ HEAP_PROFILE_ALLOCATION_INTERVAL=0 HEAP_PROFILE_INUSE_INTERVAL=0 ./main Running allocation benchmark with 5000000 x 64 bytes... [No profiler] Time = 0.748285 s Starting tracking the heap [With HeapProfiler] Time = 201.31 s Overhead: 26802.9 % slower
也就是差了2-3个数量级,HeapProfilerStart以后的每次内存分配耗时
201.31/5000000*1000*1000=40.262us
相当于4次ssd随机读写,400次内存访问
结论
HeapProfilerStart开启的是“基于MallocHook的逐条记录型堆剖析”,它不会改变tcmalloc自身的“采样分配”逻辑(ThreadCache::SampleAllocation 和 DoSampledAllocation)。两者互不替换、可以并存。
性能会下降的挺多,使用需要慎重