MCJIT是无法进行pprof的,原因先要了解下pprof原理
pprof原理
pprof一般读取一个叫hprof后缀名的文件
一个典型的例子是https://github.com/tedcy/tcmalloc_hook_debug/blob/master/tcmalloc_miss_mmap_hook/main.cpp
dump出来的allbin.hprof格式如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 heap profile: 1 : 1048576 [ 1 : 1048576 ] @ heapprofile 1 : 1441792 [ 1 : 1441792 ] @ 0x00407135 0x0040765e 0x0042ae86 0x0041ae5f 0x0040643a 0x004059e4 0x7f86e47ad830 0x00405899 10 : 41943040 [ 10 : 41943040 ] @ 0x7f86e5a38179 0x00405a00 0x7f86e47ad830 0x00405899 1 : 1048576 [ 1 : 1048576 ] @ 0x004215f6 0x00421576 0x00421a66 0x0041f65e 0x0041f9c3 0x004105a6 0x0042cfeb 0x0040624f 0x00405a1d 0x7f86e47ad830 0x00405899 1 : 1114112 [ 1 : 1114112 ] @ 0x00407135 0x0040765e 0x0042ae86 0x004063ff 0x004059e4 0x7f86e47ad830 0x00405899 1 : 131072 [ 1 : 131072 ] @ 0x00407135 0x0040765e 0x0042ae86 0x0041cf07 0x0041c0b8 0x00406100 0x00409f82 0x0042cf86 0x0040624f 0x00405a1d 0x7f86e47ad830 0x00405899 1 : 1048576 [ 1 : 1048576 ] @ 0x0040624f 0x00405a1d 0x7f86e47ad830 0x00405899 MAPPED_LIBRARIES: 00400000 -00404000 r--p 00000000 00 :00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main00404000 -0042e000 r-xp 00004000 00 :00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main0042e000 -0043 d000 r--p 0002e000 00 :00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main0043 d000-0043e000 r--p 0003 d000 00 :00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main0043e000 -0043f 000 rw-p 0003e000 00 :00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main0043f 000-005f 6000 rw-p 00000000 00 :00 0 01347000 -01 d48000 rw-p 00000000 00 :00 0 [heap]7f 86e1d1d000-7f 86e478d000 rw-p 00000000 00 :00 0 7f 86e478d000-7f 86e494d000 r-xp 00000000 00 :00 1237829 /lib/x86_64-linux-gnu/libc-2.23 .so7f 86e494d000-7f 86e4b4d000 ---p 001 c0000 00 :00 1237829 /lib/x86_64-linux-gnu/libc-2.23 .so7f 86e4b4d000-7f 86e4b51000 r--p 001 c0000 00 :00 1237829 /lib/x86_64-linux-gnu/libc-2.23 .so7f 86e4b51000-7f 86e4b53000 rw-p 001 c4000 00 :00 1237829 /lib/x86_64-linux-gnu/libc-2.23 .so7f 86e4b53000-7f 86e4b57000 rw-p 00000000 00 :00 0 7f 86e4b57000-7f 86e4b6d000 r-xp 00000000 00 :00 1237850 /lib/x86_64-linux-gnu/libgcc_s.so.1 7f 86e4b6d000-7f 86e4d6c000 ---p 00016000 00 :00 1237850 /lib/x86_64-linux-gnu/libgcc_s.so.1 7f 86e4d6c000-7f 86e4d6d000 rw-p 00015000 00 :00 1237850 /lib/x86_64-linux-gnu/libgcc_s.so.1 7f 86e4d6d000-7f 86e4e75000 r-xp 00000000 00 :00 1237861 /lib/x86_64-linux-gnu/libm-2.23 .so7f 86e4e75000-7f 86e5074000 ---p 00108000 00 :00 1237861 /lib/x86_64-linux-gnu/libm-2.23 .so7f 86e5074000-7f 86e5075000 r--p 00107000 00 :00 1237861 /lib/x86_64-linux-gnu/libm-2.23 .so7f 86e5075000-7f 86e5076000 rw-p 00108000 00 :00 1237861 /lib/x86_64-linux-gnu/libm-2.23 .so7f 86e5076000-7f 86e51e8000 r-xp 00000000 00 :00 1082942627 /usr/lib/x86_64-linux-gnu/libstdc++.so.6 .0 .21 7f 86e51e8000-7f 86e53e8000 ---p 00172000 00 :00 1082942627 /usr/lib/x86_64-linux-gnu/libstdc++.so.6 .0 .21 7f 86e53e8000-7f 86e53f2000 r--p 00172000 00 :00 1082942627 /usr/lib/x86_64-linux-gnu/libstdc++.so.6 .0 .21 7f 86e53f2000-7f 86e53f4000 rw-p 0017 c000 00 :00 1082942627 /usr/lib/x86_64-linux-gnu/libstdc++.so.6 .0 .21 7f 86e53f4000-7f 86e53f8000 rw-p 00000000 00 :00 0 7f 86e53f8000-7f 86e53fb000 r-xp 00000000 00 :00 1237842 /lib/x86_64-linux-gnu/libdl-2.23 .so7f 86e53fb000-7f 86e55fa000 ---p 00003000 00 :00 1237842 /lib/x86_64-linux-gnu/libdl-2.23 .so7f 86e55fa000-7f 86e55fb000 r--p 00002000 00 :00 1237842 /lib/x86_64-linux-gnu/libdl-2.23 .so7f 86e55fb000-7f 86e55fc000 rw-p 00003000 00 :00 1237842 /lib/x86_64-linux-gnu/libdl-2.23 .so7f 86e55fc000-7f 86e5614000 r-xp 00000000 00 :00 1237897 /lib/x86_64-linux-gnu/libpthread-2.23 .so7f 86e5614000-7f 86e5813000 ---p 00018000 00 :00 1237897 /lib/x86_64-linux-gnu/libpthread-2.23 .so7f 86e5813000-7f 86e5814000 r--p 00017000 00 :00 1237897 /lib/x86_64-linux-gnu/libpthread-2.23 .so7f 86e5814000-7f 86e5815000 rw-p 00018000 00 :00 1237897 /lib/x86_64-linux-gnu/libpthread-2.23 .so7f 86e5815000-7f 86e5819000 rw-p 00000000 00 :00 0 7f 86e5819000-7f 86e583f000 r-xp 00000000 00 :00 1237745 /lib/x86_64-linux-gnu/ld-2.23 .so7f 86e588e000-7f 86e5a35000 rw-p 00000000 00 :00 0 7f 86e5a37000-7f 86e5a38000 r--p 00000000 00 :00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so7f 86e5a38000-7f 86e5a39000 r-xp 00001000 00 :00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so7f 86e5a39000-7f 86e5a3a000 r--p 00002000 00 :00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so7f 86e5a3a000-7f 86e5a3b000 r--p 00002000 00 :00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so7f 86e5a3b000-7f 86e5a3c000 rw-p 00003000 00 :00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so7f 86e5a3c000-7f 86e5a3e000 rw-p 00000000 00 :00 0 7f 86e5a3e000-7f 86e5a3f000 r--p 00025000 00 :00 1237745 /lib/x86_64-linux-gnu/ld-2.23 .so7f 86e5a3f000-7f 86e5a40000 rw-p 00026000 00 :00 1237745 /lib/x86_64-linux-gnu/ld-2.23 .so7f 86e5a40000-7f 86e5a41000 rw-p 00000000 00 :00 0 7f fd52f83000-7f fd52fa5000 rw-p 00000000 00 :00 0 [stack ]7f fd52fbf000-7f fd52fc2000 r--p 00000000 00 :00 0 [vvar]7f fd52fc2000-7f fd52fc3000 r-xp 00000000 00 :00 0 [vdso]ffffffffff600000-ffffffffff601000 --xp 00000000 00 :00 0 [vsyscall]
第一行约定了是heap profile格式
从第二行开始,每一行都是一个统计到的调用栈
从MAPPED_LIBRARIES开始实际上是/proc/{pid}/maps,例如随便dump一个1号进程
1 2 3 4 5 6 7 8 9 10 11 ~ cat /proc/1 /maps|head 5600e0612000 -5600e0636000 r-xp 00000000 08 :c1 1612585013 /bin/dash5600e0835000 -5600e0837000 r--p 00023000 08 :c1 1612585013 /bin/dash5600e0837000 -5600e0838000 rw-p 00025000 08 :c1 1612585013 /bin/dash5600e0838000 -5600e083 a000 rw-p 00000000 00 :00 0 5600e0 e51000-5600e0 e72000 rw-p 00000000 00 :00 0 [heap]7f cc8832e000-7f cc884ee000 r-xp 00000000 08 :c1 1237829 /lib/x86_64-linux-gnu/libc-2.23 .so7f cc884ee000-7f cc886ee000 ---p 001 c0000 08 :c1 1237829 /lib/x86_64-linux-gnu/libc-2.23 .so7f cc886ee000-7f cc886f2000 r--p 001 c0000 08 :c1 1237829 /lib/x86_64-linux-gnu/libc-2.23 .so7f cc886f2000-7f cc886f4000 rw-p 001 c4000 08 :c1 1237829 /lib/x86_64-linux-gnu/libc-2.23 .so7f cc886f4000-7f cc886f8000 rw-p 00000000 00 :00 0
所以pprof就是把第二行的虚拟内存地址,通过查询maps信息,定位到是主程序或者动态库,然后用addr2line把他解析出来
以上面的10: 41943040 [ 10: 41943040] @ 0x7f86e5a38179 0x00405a00 0x7f86e47ad830 0x00405899为例
这说明这个堆栈调用了10次,一共使用了41943040字节(也就是40MB)
1 2 3 4 5 ~ addr2line -fe main 0x7f86e5a38179 0x00405a00 0x7f86e47ad830 0x00405899 ??:0 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main.cpp:23 ??:0 ??:?
只解析出来一个堆栈信息,因为其他符号在动态库里面
例如0x7f86e5a38179就在7f86e5a38000-7f86e5a39000 r-xp 00001000 00:00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so
这个r-xp的x指的就是代码段也能对应上,所以要解析的地址是0x7f86e5a38179-7f86e5a38000=0x179
然后注意这里还有动态库本身的偏移0x1000,因此完整的地址是0x1179
1 2 3 ~ addr2line -fe ./libdynamic.so 0x1179 test /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/dynamic_lib.cpp:7
同理,0x7f86e47ad830对应了7f86e478d000-7f86e494d000 r-xp 00000000 00:00 1237829 /lib/x86_64-linux-gnu/libc-2.23.so,也就是0x20830
1 2 3 ~ addr2line -fe /lib/x86_64-linux-gnu/libc-2.23.so 0x20830 __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:325
而0x00405a00和0x00405899对应了主程序00404000-0042e000 r-xp 00004000 00:00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main,也就是
1 2 3 4 5 ~ addr2line -fe main 0x00405a00 0x00405899 main /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main.cpp:23 _start ??:?
所以完整的调用栈就是_start->__libc_start_main->main->test
pprof-go解析错乱bug
由于MCJIT的链接过程是运行中进行的,因此没有一个可执行的ELF文件对其映射,它的dump结果大致是这样的
1 2 3 4 5 6 7 8 9 10 7f f4d5415000-7f f4d5418000 r-xp 00000000 00 :00 156754 /lib/x86_64-linux-gnu/libdl-2.23 .so7f f4d5418000-7f f4d5617000 ---p 00003000 00 :00 156754 /lib/x86_64-linux-gnu/libdl-2.23 .so7f f4d5617000-7f f4d5618000 r-xp 00002000 00 :00 156754 /lib/x86_64-linux-gnu/libdl-2.23 .so7f f4d5618000-7f f4d5619000 rwxp 00003000 00 :00 156754 /lib/x86_64-linux-gnu/libdl-2.23 .so7f f4d5619000-7f f4d563f000 r-xp 00000000 00 :00 156749 /lib/x86_64-linux-gnu/ld-2.23 .so7f f4d5647000-7f f4d5662000 r-xp 00000000 00 :00 0 7f f4d5662000-7f f4d5736000 rwxp 00000000 00 :00 0 7f f4d5736000-7f f4d5753000 r-xp 00000000 00 :00 3277415 /data/app/taf/tafnode/data/libleafInterface.so7f f4d5753000-7f f4d5754000 r-xp 0001 c000 00 :00 3277415 /data/app/taf/tafnode/data/libleafInterface.so7f f4d5754000-7f f4d5755000 rwxp 0001 d000 00 :00 3277415 /data/app/taf/tafnode/data/libleafInterface.so
其中7ff4d5647000-7ff4d5662000 r-xp 00000000 00:00 0是MCJIT的代码段
pprof-go在10年前兼容过有一些工具的bug,https://github.com/google/pprof/commit/5509abdceb968dbea918e8f6c83e14cffb81230d
Combine adjacent mappings even if offsets are unavailable
将相邻的映射合并,即使偏移不可用
Some profile handlers will split mappings into multiple adjacent entries. This interferes with pprof when overriding the main binary.
一些性能分析处理器会把映射拆分成多个相邻的条目。这在覆盖主二进制文件时会干扰 pprof。
Currently there is code that attempts to combine these mappings, but it fails if the mapping offsets aren't available. Add a check to combine them in that situation.
当前已有尝试合并这些映射的代码,但如果映射的偏移量不可用则会失败。添加一项检查,以便在这种情况下也能将它们合并。
这个合并功能,会导致把MCJIT的代码段映射到/data/app/taf/tafnode/data/libleafInterface.so去,从而解析出奇奇怪怪的符号来
修复这个问题需要删除https://github.com/google/pprof/blob/main/profile/legacy_profile.go#L229-L238和https://github.com/google/pprof/blob/main/profile/profile.go#L258-L277
MCJIT配合pprof-go正确解析
上一个部分的修复解决了解析错乱的问题,下一步是尽量提供可用的符号,让pprof-go可以正确解析
MCJIT的实例在core的时候,gdb是可以dump出正确的脚本堆栈的,研究了一下gdb是怎么拿到虚拟地址对应二进制的
主要流程是:llvm会把编译代码的虚拟地址段对应代码的二进制,写到gdb规定的__jit_debug_descriptor里面去,gdb就可以解析了
gdb的pprof实现
https://github.com/llvm/llvm-project/blob/llvmorg-7.1.0/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L24-L62
先声明了gdb的c语言数据结构,和全局变量__jit_debug_descriptor
这个数据结构的核心是jit_code_entry链表,包含了每个elf格式代码段的地址symfile_addr,和大小symfile_size,有了这个信息,gdb就可以把/proc/{pid}/maps中缺失elf文件的内存段映射到正确的位置上
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 extern "C" { typedef enum { JIT_NOACTION = 0 , JIT_REGISTER_FN, JIT_UNREGISTER_FN } jit_actions_t ; struct jit_code_entry { struct jit_code_entry *next_entry ; struct jit_code_entry *prev_entry ; const char *symfile_addr; uint64_t symfile_size; }; struct jit_descriptor { uint32_t version; uint32_t action_flag; struct jit_code_entry *relevant_entry ; struct jit_code_entry *first_entry ; }; struct jit_descriptor __jit_debug_descriptor = { 1 , 0 , nullptr, nullptr }; }
随后在NotifyDebugger中,会把总的JITCodeEntry注册进gdb的这个地址里面
1 2 3 4 5 6 7 8 9 10 11 12 13 14 void NotifyDebugger (jit_code_entry* JITCodeEntry) { __jit_debug_descriptor.action_flag = JIT_REGISTER_FN; JITCodeEntry->prev_entry = nullptr; jit_code_entry* NextEntry = __jit_debug_descriptor.first_entry; JITCodeEntry->next_entry = NextEntry; if (NextEntry) { NextEntry->prev_entry = JITCodeEntry; } __jit_debug_descriptor.first_entry = JITCodeEntry; __jit_debug_descriptor.relevant_entry = JITCodeEntry; __jit_debug_register_code(); }
而NotifyDebugger是被GDBJITRegistrationListener::NotifyObjectEmitted调用的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 void GDBJITRegistrationListener::NotifyObjectEmitted ( const ObjectFile &Object, const RuntimeDyld::LoadedObjectInfo &L) { OwningBinary<ObjectFile> DebugObj = L.getObjectForDebug(Object); if (!DebugObj.getBinary()) return ; const char *Buffer = DebugObj.getBinary()->getMemoryBufferRef().getBufferStart(); size_t Size = DebugObj.getBinary()->getMemoryBufferRef().getBufferSize(); const char *Key = Object.getMemoryBufferRef().getBufferStart(); assert(Key && "Attempt to register a null object with a debugger." ); llvm::MutexGuard locked (*JITDebugLock) ; assert(ObjectBufferMap.find(Key) == ObjectBufferMap.end() && "Second attempt to perform debug registration." ); jit_code_entry* JITCodeEntry = new jit_code_entry(); if (!JITCodeEntry) { llvm::report_fatal_error( "Allocation failed when registering a JIT entry!\n" ); } else { JITCodeEntry->symfile_addr = Buffer; JITCodeEntry->symfile_size = Size; ObjectBufferMap[Key] = RegisteredObjectInfo(Size, JITCodeEntry, std ::move(DebugObj)); NotifyDebugger(JITCodeEntry); } }
当MCJIT的重定向完成以后,就会回调注册上的每个RegistrationListener,把内存中制作好的elf格式的内存类通过RuntimeDyld::LoadedObjectInfo传入
我的MCJIT的pprof实现
可以实现相似的流程,让MCJIT输出一些额外的数据,然后解析虚拟地址的时候去读这些数据就可以了
MCJIT输出额外数据
这里需要输出两种数据:
elf格式的数据
每个section的start和size,用于传递给addr2line进行符号解析
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 class PPROFJITRegistrationListener : public llvm::JITEventListener {public : static vector<string>& dumpPaths () { static thread_local vector<string> vs; return vs; } private : void NotifyObjectEmitted ( const llvm::object::ObjectFile& Obj, const llvm::RuntimeDyld::LoadedObjectInfo& L) override { llvm::object::OwningBinary<llvm::object::ObjectFile> DebugO = L.getObjectForDebug (Obj); if (!DebugO.getBinary ()) return ; auto DebugObj = DebugO.getBinary (); const char * Buffer = DebugObj->getMemoryBufferRef ().getBufferStart (); size_t Size = DebugObj->getMemoryBufferRef ().getBufferSize (); for (auto & dumpPath : dumpPaths ()){ taf::TC_File::save2file (dumpPath + ".elf" , string (Buffer, Size)); } internal_jce::PPROFSections pprofSections; for (auto si = DebugObj->section_begin (), se = DebugObj->section_end (); si != se; ++si) { internal_jce::PPROFSection pprofSection; auto & section = *si; llvm::StringRef name; section.getName (name); pprofSection.name = name; pprofSection.start = section.getAddress (); pprofSection.size = section.getSize (); pprofSections.sections.push_back (move (pprofSection)); } for (auto & dumpPath : dumpPaths ()) { taf::TC_File::save2file (dumpPath + ".meta" , pprofSections.writeToJsonString ()); } } };
输出的文件如下,存在大量的section(因为c++每个模板都是一个group section,在静态链接阶段会合并到同一个section,MCJIT没做这个)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 ~ readelf -S /pprof_meta/P00332.module.media.test.MediaRenderRun.elf |head -n 100 There are 43626 section headers, starting at offset 0x1c8a7e8: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .strtab STRTAB 0000000000000000 01a67cf8 0000000000222ae9 0000000000000000 0 0 1 [ 2] .text PROGBITS 00007ff1abd5f000 00000040 00000000000ffc52 0000000000000000 AX 0 0 16 [ 3] .rela.text RELA 0000000000000000 011b0010 0000000000102cc0 0000000000000018 43625 2 8 [ 4] .text.startup PROGBITS 00007ff1c526a000 000ffca0 000000000000541a 0000000000000000 AX 0 0 16 [ 5] .rela.text.startu RELA 0000000000000000 012b2cd0 0000000000007830 0000000000000018 43625 4 8 [ 6] .gcc_except_table PROGBITS 00007ff1abe82000 001050bc 000000000002a840 0000000000000000 A 0 0 4 [ 7] .rela.gcc_except_ RELA 0000000000000000 012ba500 0000000000001668 0000000000000018 43625 6 8 [ 8] .group GROUP 0000000000000000 010a6da0 000000000000000c 0000000000000004 43625 34890 4 [ 9] .text._ZNSt9excep PROGBITS 00007ff1abbcefd0 0012f900 000000000000001f 0000000000000000 AXG 0 0 16 [10] .rela.text._ZNSt9 RELA 0000000000000000 012bbb68 0000000000000018 0000000000000018 G 43625 9 8 [11] .group GROUP 0000000000000000 010a6dac 000000000000000c 0000000000000004 43625 37011 4 [12] .text._ZStplIcSt1 PROGBITS 00007ff1abbaf4c0 0012f920 0000000000000053 0000000000000000 AXG 0 0 16 [13] .rela.text._ZStpl RELA 0000000000000000 012bbb80 0000000000000048 0000000000000018 G 43625 12 8 [14] .group GROUP 0000000000000000 010a6db8 000000000000000c 0000000000000004 43625 37017 4 [15] .text._ZStplIcSt1 PROGBITS 00007ff1abbaf820 0012f980 000000000000008d 0000000000000000 AXG 0 0 16
pprof-go实现
pprof-go源码分析
数据结构
pprof-go会把pprof格式的文件封装成Profile
其中堆栈信息,封装成Location
而/proc/{pid}/maps的数据,封装成Mapping
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 type Mapping struct { ID uint64 Start uint64 Limit uint64 Offset uint64 File string } type Location struct { ID uint64 Mapping *Mapping Address uint64 } type Profile struct { ... Mapping []*Mapping Location []*Location ... }
函数地址映射到ELF文件解析
Profile的解析逻辑比较深,调用栈如下
main()
driver.PProf()
internaldriver.PProf()
fetchProfiles()
grabSourcesAndBases()
chunkedGrab()
concurrentGrab()
grabProfile()
fetch()
profile.Parse()
ParseData()
parseLegacy()
parseHeap()
parseAdditionalSections()
ParseMemoryMapFromScanner()
ParseMemoryMapFromScanner会把解析到的Location映射到对应的Mapping上
https://github.com/google/pprof/blob/9e5a51aed1e8fb135a04db444b671cd8256cccf4/profile/legacy_profile.go#L1031-L1042
1 2 3 4 5 6 7 8 9 10 11 12 func (p *Profile) ParseMemoryMapFromScanner(s *bufio.Scanner) error { mapping, err := parseProcMapsFromScanner(s) if err != nil { return err } p.Mapping = append (p.Mapping, mapping...) p.massageMappings() p.remapLocationIDs() p.remapFunctionIDs() p.remapMappingIDs() return nil }
核心的映射代码就是remapMappingIDs了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 func (p *Profile) remapMappingIDs() { ... nextLocation: for _, l := range p.Location { a := l.Address if l.Mapping != nil || a == 0 { continue } for _, m := range p.Mapping { if m.Start <= a && a < m.Limit { l.Mapping = m continue nextLocation } } if fake == nil { fake = &Mapping{ ID: 1 , Limit: ^uint64 (0 ), } p.Mapping = append (p.Mapping, fake) } l.Mapping = fake } }
把函数地址和ELF文件传递给addr2line处理
https://github.com/google/pprof/blob/9e5a51aed1e8fb135a04db444b671cd8256cccf4/internal/symbolizer/symbolizer.go#L207-L244
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 func symbolizeOneMapping (m *profile.Mapping, locs []*profile.Location, obj plugin.ObjFile, addFunction func (*profile.Function) *profile.Function) { for _, l := range locs { stack, err := obj.SourceLine(l.Address) if err != nil || len (stack) == 0 { continue } l.Line = make ([]profile.Line, len (stack)) l.IsFolded = false for i, frame := range stack { ...省略,处理堆栈信息 } if len (stack) > 0 { m.HasInlineFrames = true } } }
maps数据的base计算
pprof-go的默认实现使用的addr2line是这样的
https://github.com/google/pprof/blob/9e5a51aed1e8fb135a04db444b671cd8256cccf4/internal/binutils/addr2liner.go#L90-L120
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 const ( defaultAddr2line = "addr2line" ) func newAddr2Liner (cmd, file string , base uint64 ) (*addr2Liner, error ) { if cmd == "" { cmd = defaultAddr2line } j := &addr2LinerJob{ cmd: exec.Command(cmd, "-aif" , "-e" , file), } var err error if j.in, err = j.cmd.StdinPipe(); err != nil { return nil , err } outPipe, err := j.cmd.StdoutPipe() if err != nil { return nil , err } j.out = bufio.NewReader(outPipe) if err := j.cmd.Start(); err != nil { return nil , err } a := &addr2Liner{ rw: j, base: base, } return a, nil }
没有用-j指定section,我翻阅了addr2line的源码,会对传入的符号搜索全部的section,返回第一个section(一般是.text)的结果
而MCJIT生成的ELF文件,pprof又会使用section的模式来传入参数
例如某个Mapping,Start = 0x100, limit = 0x200,而需要解析的地址0x101被传递给addr2line的时候,实际写入的地址是0x101 - 0x100 = 0x1
在不指定section的情况下,解析必然是出错的,这是怎么回事呢?
fileAddr2Line::SourceLine
这个Base就是在每个文件进行addrline的时候,都会初始化一次计算得到这个文件的Base,计算base的核心代码如下
https://github.com/google/pprof/blob/9e5a51aed1e8fb135a04db444b671cd8256cccf4/internal/binutils/binutils.go#L592-L617
1 2 3 4 5 6 7 8 func (f *file) computeBase(addr uint64 ) error { ef, err := elfOpen(f.name) ...省略 ph, err := f.m.findProgramHeader(ef, addr) ...省略 f.base, err := elfexec.GetBase(&ef.FileHeader, ph, f.m.kernelOffset, f.m.start, f.m.limit, f.m.offset) ...省略 }
而GetBase的代码在这里
https://github.com/google/pprof/blob/9e5a51aed1e8fb135a04db444b671cd8256cccf4/internal/elfexec/elfexec.go#L216-L283
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 func GetBase (fh *elf.FileHeader, loadSegment *elf.ProgHeader, stextOffset *uint64 , start, limit, offset uint64 ) (uint64 , error ) { ...省略 switch fh.Type { case elf.ET_EXEC: ...省略 return start - offset + loadSegment.Off - loadSegment.Vaddr, nil case elf.ET_REL: ...省略 return start, nil case elf.ET_DYN: ...省略 return start - offset + loadSegment.Off - loadSegment.Vaddr, nil } return 0 , fmt.Errorf("don't know how to handle FileHeader.Type %v" , fh.Type) }
ET_EXEC
就是主程序,例如
1 2 MAPPED_LIBRARIES: 00400000-04b62000 r-xp 00001000 00:00 20054728 /data/app/taf/tafnode/data/LEAFGZ.ServerLessNode/data/ScriptEngineServerBin/899766/ScriptEngineServer
这个地址范围内的base是多少?
首先查看程序头,只有LOAD的数据,会在/proc/{pid}/maps里面展示
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ~ readelf -lW /data/app/taf/tafnode/data/LEAFGZ.ServerLessNode/data/ScriptEngineServerBin/899766/ScriptEngineServer Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000040 0x00000000003ff040 0x00000000003ff040 0x000310 0x000310 R 0x8 GNU_STACK 0x001000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RWE 0x10 LOAD 0x000000 0x00000000003ff000 0x00000000003ff000 0x001000 0x001000 RW 0x1000 DYNAMIC 0x000350 0x00000000003ff350 0x00000000003ff350 0x0002d0 0x0002d0 RW 0x8 LOAD 0x001000 0x0000000000400000 0x0000000000400000 0x03d620 0x03d620 R 0x1000 INTERP 0x00eba8 0x000000000040dba8 0x000000000040dba8 0x00001c 0x00001c R 0x1 [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2] NOTE 0x00ebc8 0x000000000040dbc8 0x000000000040dbc8 0x000020 0x000020 R 0x4 NOTE 0x00ebe8 0x000000000040dbe8 0x000000000040dbe8 0x000024 0x000024 R 0x4 LOAD 0x03f000 0x000000000043e000 0x000000000043e000 0x36c9321 0x36c9321 R E 0x1000 LOAD 0x3709000 0x0000000003b08000 0x0000000003b08000 0x1059f9d 0x1059f9d R 0x1000 GNU_EH_FRAME 0x3f998b0 0x00000000043988b0 0x00000000043988b0 0x0a2f14 0x0a2f14 R 0x4 LOAD 0x4763b80 0x0000000004b62b80 0x0000000004b62b80 0x2d3d08 0x5cdae0 RW 0x1000
maps里面的00400000-04b62000的offset是0x1000,因此对应LOAD段中第二个LOAD,也就是
1 2 Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x001000 0x0000000000400000 0x0000000000400000 0x03d620 0x03d620 R 0x1000
那么Base = 0x400000(maps中的实际地址start) - 0x1000(maps中的offset) + 0x1000(LOAD中的offset)- 0x400000(Load中的VirtAddr)
如何理解这个公式?简单来说,maps中的offset和LOAD中的offset一般是相等的
那么Base =0x400000(maps中的实际地址start) - 0x400000(Load中的VirtAddr)
对于ET_EXEC主程序而言,Base = 0,意味着地址都是绝对地址,直接交给addr2line就行了
ET_DYN
就是动态库,例如
1 2 3 4 5 6 MAPPED_LIBRARIES: 7fe49f210000-7fe49f211000 r-xp 00000000 00:00 777981 /usr/lib/x86_64-linux-gnu/libdl-2.31.so 7fe49f211000-7fe49f213000 r-xp 00001000 00:00 777981 /usr/lib/x86_64-linux-gnu/libdl-2.31.so 7fe49f213000-7fe49f214000 r-xp 00003000 00:00 777981 /usr/lib/x86_64-linux-gnu/libdl-2.31.so 7fe49f214000-7fe49f215000 r-xp 00003000 00:00 777981 /usr/lib/x86_64-linux-gnu/libdl-2.31.so 7fe49f215000-7fe49f216000 rwxp 00004000 00:00 777981 /usr/lib/x86_64-linux-gnu/libdl-2.31.so
那么对于7fe49f21134b这个地址,它的base是多少呢?
他在7fe49f211000-7fe49f213000,offset是1000,对应LOAD的
1 2 3 4 5 ~ readelf -lW /usr/lib/x86_64-linux-gnu/libdl-2.31.so Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000e48 0x000e48 R 0x1000 LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x001189 0x001189 R E 0x1000
就是第二个。那么Base = 7fe49f211000(maps中的实际地址start) - 0x0000000000001000(Load中的VirtAddr)
对于ET_DYN动态库而言,Base = 7fe49f211000,意味着交给addr2line解析的地址7fe49f21134b实际上是34B
ET_REL
分析MCJIT产物会发现他就是这个类型的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ~ readelf -hW P00332.module.media.test.MediaRenderRun.elf ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: REL (Relocatable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x0 Start of program headers: 0 (bytes into file) Start of section headers: 29846648 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 0 (bytes) Number of program headers: 0 Size of section headers: 64 (bytes) Number of section headers: 43484 Section header string table index: 1
这个类型是没有LOAD数据的
1 2 3 ~ readelf -lW P00332.module.media.test.MediaRenderRun.elf There are no program headers in this file.
这是因为正常来说这个文件不会被加载到内存,这只是个obj文件,MCJIT在加载obj文件以后,并没有修改他的elf头
因为没有必要,链接时生成的LOAD信息,对于MCJIT来说不需要通过elf格式来沟通,一步完成了
对pprof而言,这个格式的base是
1 2 3 case elf.ET_REL: ...省略 return start, nil
因此每个address,都指定了section的start作为base去指定
实际上,这里的address应该是ET_EXEC格式的,也就是base为0
实现
旧方案
思路是读取输出的文件,写入到一个ExtendMapping结构,符号匹配的时候,优先去和ExtendMapping的数据匹配
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 func (p *Profile) ParseMemoryMapFromScanner(s *bufio.Scanner) error { mapping, err := parseProcMapsFromScanner(s) if err != nil { return err } p.ExtendMapping = p.parseExtProcMapsFromScanner() p.Mapping = append (p.Mapping, mapping...) p.massageMappings() p.remapLocationIDs() p.remapFunctionIDs() p.remapMappingIDs() return nil } func (p *Profile) parseExtProcMapsFromScanner() []*Mapping { var mappings []*Mapping files, err := filepath.Glob("*.meta" ) if err != nil { fmt.Println("Error finding .meta files:" , err) return nil } if len (files) == 0 { files, err = filepath.Glob("/pprof_meta/*.meta" ) if err != nil { fmt.Println("Error finding .meta files in /pprof_meta:" , err) return nil } } for _, metaFile := range files { data, err := ioutil.ReadFile(metaFile) if err != nil { fmt.Println("Error reading file:" , metaFile, err) return nil } var sections PPROFSections if err := json.Unmarshal(data, §ions); err != nil { fmt.Println("Error parsing JSON in file:" , metaFile, err) return nil } elfFile := strings.TrimSuffix(metaFile, ".meta" ) + ".elf" for _, sec := range sections.Sections { if !strings.HasPrefix(sec.Name, ".text" ) { continue } if sec.Start == 0 { fmt.Fprintf(os.Stderr, "Skipping section with start address 0 in %s, section: %s\n" , metaFile, sec.Name) continue } m := &Mapping{ Start: sec.Start, Limit: sec.Start + sec.Size, Section: sec.Name, File: elfFile, } mappings = append (mappings, m) } } return mappings }
这一段代码,优先去和ExtendMapping的数据匹配
例如p.Mapping的数据是Start = 0, Limit = 100, File = ""
而p.ExtendMapping则是Start = 0, Limit = 100, File = "xxx.meta"
因此先和ExtendMapping进行匹配,就可以拦截之前解析不到文件的vma
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 func (p *Profile) remapMappingIDs() { ... nextLocation: for _, l := range p.Location { a := l.Address if l.Mapping != nil || a == 0 { continue } for _, m := range p.ExtendMapping { if m.Start <= a && a < m.Limit { l.Mapping = m continue nextLocation } } for _, m := range p.Mapping { if m.Start <= a && a < m.Limit { l.Mapping = m continue nextLocation } } if fake == nil { fake = &Mapping{ ID: 1 , Limit: ^uint64 (0 ), } p.Mapping = append (p.Mapping, fake) } l.Mapping = fake } p.Mapping = append (p.Mapping, p.ExtendMapping...) }
在doLocalSymbolize中,如果某个map存在Section字段(说明是来自ExtendMapping解析出来的),就用特殊的addr2line逻辑去解析
1 2 3 4 5 6 7 8 9 10 11 func symbolizeOneMappingWithSection (m *profile.Mapping, locs []*profile.Location, addFunction func (*profile.Function) *profile.Function) { for _, l := range locs { cmd := fmt.Sprintf("addr2line -if -j %s -e %s %x" , m.Section, m.File, l.Address-m.Start) out, err := exec.Command("sh" , "-c" , cmd).Output() if err != nil { fmt.Println("Error executing command:" , cmd, err) continue } ...省略 } }
由于每个section都要执行一次addr2line,因此对于一个大型项目而言,需要15分钟才出结果
新方案
根据GetBase的代码注释,有一些工具的start,offset,limit都为0的时候,说明已经adjusted好地址了,base直接返回0就行了
1 2 3 4 5 6 7 8 9 func GetBase (fh *elf.FileHeader, loadSegment *elf.ProgHeader, stextOffset *uint64 , start, limit, offset uint64 ) (uint64 , error ) { if start == 0 && offset == 0 && (limit == ^uint64 (0 ) || limit == 0 ) { return 0 , nil } ...省略 }
扫描解析elf文件的代码还是一致,但是匹配代码让他们指向一个start,offset,limit为0的map(指向同一个以后,addr2line就只会加载一次数据,否则有几个map就需要对每个map的elf格式文件都加载数据)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 var extendFakeMappings map [string ]*Mapping extendFakeMappings = make (map [string ]*Mapping) nextLocation: for _, l := range p.Location { a := l.Address if l.Mapping != nil || a == 0 { continue } for _, m := range p.ExtendMapping { if m.Start <= a && a < m.Limit { var extendFakeMapping *Mapping var ok bool if extendFakeMapping, ok = extendFakeMappings[m.File]; !ok { extendFakeMapping = &Mapping{ File: m.File, } extendFakeMappings[m.File] = extendFakeMapping } l.Mapping = extendFakeMapping continue nextLocation } } for _, m := range p.Mapping { if m.Start <= a && a < m.Limit { l.Mapping = m continue nextLocation } } for _, v := range extendFakeMappings { p.Mapping = append (p.Mapping, v) } }
细心的同学可能发现,这么改完以后,ExtendMapping匹配的代码就不再是基于Section维度的了,而是类似于普通Mapping里面的VMA维度
那为什么不一开始就找到所有Section的min和max,归并成一整个vma来进行扫码呢?
举例来说:如果4万个section是,[0, 1],[1, 2],[2,3],...,[999,1000],那么直接合并成[0, 1000]来再进行匹配不就高效的多?而且在同一个Mapping里面,都不需要额外指向同一个extendFakeMapping了
我一开始也是这么想的,实测下来发现,MCJIT分配的内存地址不是连续的,而是这样的
1 2 3 [0, 1000] --> 这一个vma属于MCJIT [1000, 2000] --> 这一个vma属于/usr/lib/x86_64-linux-gnu/libc-2.31.so [2000, 3000] --> 这一个vma属于MCJIT
直接取MCJIT地址的min和max,就会变成[0, 3000],从而错误的把其他动态库的数据也交给MCJIT的elf文件进行解析
由于MCJIT没有提供VMA的范围,所以只能拿4万个section依次进行匹配,最后合并成extendFakeMapping这一个VMA
总结
本文整理了我遇到的MCJIT无法使用pprof的问题,通过为MCJIT编写JITRegistrationListener代码,并修改pprof-go源码
让MCJIT输出一些额外的数据,然后pprof-go解析虚拟地址的时候去读这些数据就可以了
读数据的使用了两套方案,其中新方案只需要解析一次elf格式的文件,因此一个复杂项目的dump耗时从15分钟优化到18秒
附录
pprof-go导出火焰图和节点图问题
节点图
和pprof-perl不一样的是,pprof-go生成pdf依赖的是Graphviz生成工具,不再是Ghostscript工具了
可以在https://graphviz.org/download/下载源码进行编译
生成pdf依赖gd包,完整的编译脚本如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 cur=`pwd ` apt install -y libgd-dev fontconfig libcairo2-dev libpango1.0-dev libgts-dev tar xvf graphviz-12.0.0.tar.gz cd graphviz-12.0.0./configure --prefix=/tmp/gv make -j && make install cd /tmpapt download fontconfig-config mkdir -pv gv/deb/cp *.deb gv/deb/ls -lh gv/deb/ldd gv/lib/graphviz/*so*|\ grep -oP '/[^ )]+' |grep -v ':' |grep -v 'gv/lib' |\ grep -v libc.so|grep -v libm.so|grep -v librt.so|\ grep -v libdl.so|grep -v libglib|grep -v libpthread.so|\ xargs -r -d '\n' -I{} cp -L '{}' gv/lib/
使用脚本
1 2 3 4 5 6 7 8 9 export PATH=$tooldir /gv/bin:$PATH export LD_LIBRARY_PATH=$tooldir /gv/lib:$LD_LIBRARY_PATH dpkg -x gv/deb/fontconfig* /opt/fontconfig export FONTCONFIG_PATH=/opt/fontconfig/etc/fontspprof --pdf --lines $binary 1.pprof > 1.pdf
火焰图
pprof-go是支持通过打开http端口,提供web服务生成火焰图的,不支持其他任何形式的直接导出方式
而且生成的火焰图,说实话也不如FlameGraph 的好看
经过摸索,可以通过FlameGraph 里面的stackcollapse-go.pl这个为go特定的分析工具配合pprof-go生成火焰图
1 2 3 4 5 6 7 8 pprof --raw --lines $binary 1.pprof > tmp/out.data stackcollapse-go.pl tmp/out.data > tmp/out.collapse flamegraph.pl tmp/out.collapse 1.svg
但是这里的stackcollapse-go.pl还存在一点问题
简单了解下stackcollapse-go.pl的工作范畴,对于以下的例子
这里Samples的的一行数据,由冒号分割以后,前面数字代表指标,后面数字指向Locations里面的堆栈
1 10000000: 1 2,指的是GC调用scanblock后,scanblock被记录了1次,花费了10000000ns
Mappings里面数据就没啥用了,因为Locations里面已经有了函数名称
samples的解析代码如下:
https://github.com/brendangregg/FlameGraph/blob/v1.0/stackcollapse-go.pl#L120-L125
1 2 3 4 5 6 7 8 9 10 11 12 13 if ($state eq "sample" ) { if (/^\s*([0-9]+)\s*[0-9]+: ([0-9 ]+)/ ) { my $samples = $1; my $stack = $2; remember_stack($stack, $samples); } }
但是实际的例子是这样的,tcmalloc的MallocExtension::instance()->GetHeapSample(&s)生成的数据:
1 2 3 4 5 6 7 8 9 10 PeriodType: space bytes Period: 524288 Samples: objects/count space/bytes 1522 205791384: 2 3 4 5 6 7 bytes:[135168] 1 1557983: 10 11 12 13 14 15 16 17 18 19 20 21 bytes:[1462185] 1524 524460: 23 4 5 6 7 bytes:[344
对于这个例子,samples只收集了objects/count,而不是关心的space/bytes
也可能是这样的,tcmalloc的GetHeapProfile()生成的数据:
1 2 3 4 5 6 7 8 9 10 PeriodType: space bytes Period: 1 Samples: alloc_objects/count alloc_space/bytes inuse_objects/count inuse_space/bytes 10 10485760 10 10485760: 1 2 3 4 5 6 bytes:[1048576] 1 1048576 1 1048576: 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 bytes:[1048576] 1 16920 0 0: 23 24 25 26 27 28 29 30 bytes:[16920]
对于这个例子,由于冒号前面存在四列,因此正则直接无法匹配了,最终修改代码如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 if ($state eq "sample" ) { if (/^\s*((?:\d+\s+)*\d+)\s*:\s*([0-9 ]+)/ ) { my @v = split /\s+/ , $1; my $samples = $v[-1 ]; my $stack = $2; remember_stack($stack, $samples); } }