MCJIT使用pprof踩坑

MCJIT是无法进行pprof的,原因先要了解下pprof原理

pprof原理

pprof一般读取一个叫hprof后缀名的文件

一个典型的例子是https://github.com/tedcy/tcmalloc_hook_debug/blob/master/tcmalloc_miss_mmap_hook/main.cpp

dump出来的allbin.hprof格式如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
heap profile:      1:  1048576 [     1:  1048576] @ heapprofile
1: 1441792 [ 1: 1441792] @ 0x00407135 0x0040765e 0x0042ae86 0x0041ae5f 0x0040643a 0x004059e4 0x7f86e47ad830 0x00405899
10: 41943040 [ 10: 41943040] @ 0x7f86e5a38179 0x00405a00 0x7f86e47ad830 0x00405899
1: 1048576 [ 1: 1048576] @ 0x004215f6 0x00421576 0x00421a66 0x0041f65e 0x0041f9c3 0x004105a6 0x0042cfeb 0x0040624f 0x00405a1d 0x7f86e47ad830 0x00405899
1: 1114112 [ 1: 1114112] @ 0x00407135 0x0040765e 0x0042ae86 0x004063ff 0x004059e4 0x7f86e47ad830 0x00405899
1: 131072 [ 1: 131072] @ 0x00407135 0x0040765e 0x0042ae86 0x0041cf07 0x0041c0b8 0x00406100 0x00409f82 0x0042cf86 0x0040624f 0x00405a1d 0x7f86e47ad830 0x00405899
1: 1048576 [ 1: 1048576] @ 0x0040624f 0x00405a1d 0x7f86e47ad830 0x00405899

MAPPED_LIBRARIES:
00400000-00404000 r--p 00000000 00:00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main
00404000-0042e000 r-xp 00004000 00:00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main
0042e000-0043d000 r--p 0002e000 00:00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main
0043d000-0043e000 r--p 0003d000 00:00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main
0043e000-0043f000 rw-p 0003e000 00:00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main
0043f000-005f6000 rw-p 00000000 00:00 0
01347000-01d48000 rw-p 00000000 00:00 0 [heap]
7f86e1d1d000-7f86e478d000 rw-p 00000000 00:00 0
7f86e478d000-7f86e494d000 r-xp 00000000 00:00 1237829 /lib/x86_64-linux-gnu/libc-2.23.so
7f86e494d000-7f86e4b4d000 ---p 001c0000 00:00 1237829 /lib/x86_64-linux-gnu/libc-2.23.so
7f86e4b4d000-7f86e4b51000 r--p 001c0000 00:00 1237829 /lib/x86_64-linux-gnu/libc-2.23.so
7f86e4b51000-7f86e4b53000 rw-p 001c4000 00:00 1237829 /lib/x86_64-linux-gnu/libc-2.23.so
7f86e4b53000-7f86e4b57000 rw-p 00000000 00:00 0
7f86e4b57000-7f86e4b6d000 r-xp 00000000 00:00 1237850 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f86e4b6d000-7f86e4d6c000 ---p 00016000 00:00 1237850 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f86e4d6c000-7f86e4d6d000 rw-p 00015000 00:00 1237850 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f86e4d6d000-7f86e4e75000 r-xp 00000000 00:00 1237861 /lib/x86_64-linux-gnu/libm-2.23.so
7f86e4e75000-7f86e5074000 ---p 00108000 00:00 1237861 /lib/x86_64-linux-gnu/libm-2.23.so
7f86e5074000-7f86e5075000 r--p 00107000 00:00 1237861 /lib/x86_64-linux-gnu/libm-2.23.so
7f86e5075000-7f86e5076000 rw-p 00108000 00:00 1237861 /lib/x86_64-linux-gnu/libm-2.23.so
7f86e5076000-7f86e51e8000 r-xp 00000000 00:00 1082942627 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f86e51e8000-7f86e53e8000 ---p 00172000 00:00 1082942627 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f86e53e8000-7f86e53f2000 r--p 00172000 00:00 1082942627 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f86e53f2000-7f86e53f4000 rw-p 0017c000 00:00 1082942627 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f86e53f4000-7f86e53f8000 rw-p 00000000 00:00 0
7f86e53f8000-7f86e53fb000 r-xp 00000000 00:00 1237842 /lib/x86_64-linux-gnu/libdl-2.23.so
7f86e53fb000-7f86e55fa000 ---p 00003000 00:00 1237842 /lib/x86_64-linux-gnu/libdl-2.23.so
7f86e55fa000-7f86e55fb000 r--p 00002000 00:00 1237842 /lib/x86_64-linux-gnu/libdl-2.23.so
7f86e55fb000-7f86e55fc000 rw-p 00003000 00:00 1237842 /lib/x86_64-linux-gnu/libdl-2.23.so
7f86e55fc000-7f86e5614000 r-xp 00000000 00:00 1237897 /lib/x86_64-linux-gnu/libpthread-2.23.so
7f86e5614000-7f86e5813000 ---p 00018000 00:00 1237897 /lib/x86_64-linux-gnu/libpthread-2.23.so
7f86e5813000-7f86e5814000 r--p 00017000 00:00 1237897 /lib/x86_64-linux-gnu/libpthread-2.23.so
7f86e5814000-7f86e5815000 rw-p 00018000 00:00 1237897 /lib/x86_64-linux-gnu/libpthread-2.23.so
7f86e5815000-7f86e5819000 rw-p 00000000 00:00 0
7f86e5819000-7f86e583f000 r-xp 00000000 00:00 1237745 /lib/x86_64-linux-gnu/ld-2.23.so
7f86e588e000-7f86e5a35000 rw-p 00000000 00:00 0
7f86e5a37000-7f86e5a38000 r--p 00000000 00:00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so
7f86e5a38000-7f86e5a39000 r-xp 00001000 00:00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so
7f86e5a39000-7f86e5a3a000 r--p 00002000 00:00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so
7f86e5a3a000-7f86e5a3b000 r--p 00002000 00:00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so
7f86e5a3b000-7f86e5a3c000 rw-p 00003000 00:00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so
7f86e5a3c000-7f86e5a3e000 rw-p 00000000 00:00 0
7f86e5a3e000-7f86e5a3f000 r--p 00025000 00:00 1237745 /lib/x86_64-linux-gnu/ld-2.23.so
7f86e5a3f000-7f86e5a40000 rw-p 00026000 00:00 1237745 /lib/x86_64-linux-gnu/ld-2.23.so
7f86e5a40000-7f86e5a41000 rw-p 00000000 00:00 0
7ffd52f83000-7ffd52fa5000 rw-p 00000000 00:00 0 [stack]
7ffd52fbf000-7ffd52fc2000 r--p 00000000 00:00 0 [vvar]
7ffd52fc2000-7ffd52fc3000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]

第一行约定了是heap profile格式

从第二行开始,每一行都是一个统计到的调用栈

从MAPPED_LIBRARIES开始实际上是/proc/{pid}/maps,例如随便dump一个1号进程

1
2
3
4
5
6
7
8
9
10
11
~ cat /proc/1/maps|head
5600e0612000-5600e0636000 r-xp 00000000 08:c1 1612585013 /bin/dash
5600e0835000-5600e0837000 r--p 00023000 08:c1 1612585013 /bin/dash
5600e0837000-5600e0838000 rw-p 00025000 08:c1 1612585013 /bin/dash
5600e0838000-5600e083a000 rw-p 00000000 00:00 0
5600e0e51000-5600e0e72000 rw-p 00000000 00:00 0 [heap]
7fcc8832e000-7fcc884ee000 r-xp 00000000 08:c1 1237829 /lib/x86_64-linux-gnu/libc-2.23.so
7fcc884ee000-7fcc886ee000 ---p 001c0000 08:c1 1237829 /lib/x86_64-linux-gnu/libc-2.23.so
7fcc886ee000-7fcc886f2000 r--p 001c0000 08:c1 1237829 /lib/x86_64-linux-gnu/libc-2.23.so
7fcc886f2000-7fcc886f4000 rw-p 001c4000 08:c1 1237829 /lib/x86_64-linux-gnu/libc-2.23.so
7fcc886f4000-7fcc886f8000 rw-p 00000000 00:00 0

所以pprof就是把第二行的虚拟内存地址,通过查询maps信息,定位到是主程序或者动态库,然后用addr2line把他解析出来

以上面的10: 41943040 [ 10: 41943040] @ 0x7f86e5a38179 0x00405a00 0x7f86e47ad830 0x00405899为例

这说明这个堆栈调用了10次,一共使用了41943040字节(也就是40MB)

1
2
3
4
5
~ addr2line -fe main 0x7f86e5a38179 0x00405a00 0x7f86e47ad830 0x00405899
??:0
/root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main.cpp:23
??:0
??:?

只解析出来一个堆栈信息,因为其他符号在动态库里面

例如0x7f86e5a38179就在7f86e5a38000-7f86e5a39000 r-xp 00001000 00:00 51726046999 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/libdynamic.so

这个r-xp的x指的就是代码段也能对应上,所以要解析的地址是0x7f86e5a38179-7f86e5a38000=0x179

然后注意这里还有动态库本身的偏移0x1000,因此完整的地址是0x1179

1
2
3
~ addr2line -fe ./libdynamic.so 0x1179
test
/root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/dynamic_lib.cpp:7

同理,0x7f86e47ad830对应了7f86e478d000-7f86e494d000 r-xp 00000000 00:00 1237829 /lib/x86_64-linux-gnu/libc-2.23.so,也就是0x20830

1
2
3
~ addr2line -fe /lib/x86_64-linux-gnu/libc-2.23.so 0x20830
__libc_start_main
/build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:325

而0x00405a00和0x00405899对应了主程序00404000-0042e000 r-xp 00004000 00:00 51726047003 /root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main,也就是

1
2
3
4
5
~ addr2line -fe main 0x00405a00 0x00405899
main
/root/tcmalloc_hook_debug/tcmalloc_miss_mmap_hook/main.cpp:23
_start
??:?

所以完整的调用栈就是_start->__libc_start_main->main->test

pprof-go解析错乱bug

由于MCJIT的链接过程是运行中进行的,因此没有一个可执行的ELF文件对其映射,它的dump结果大致是这样的

1
2
3
4
5
6
7
8
9
10
7ff4d5415000-7ff4d5418000 r-xp 00000000 00:00 156754      /lib/x86_64-linux-gnu/libdl-2.23.so
7ff4d5418000-7ff4d5617000 ---p 00003000 00:00 156754 /lib/x86_64-linux-gnu/libdl-2.23.so
7ff4d5617000-7ff4d5618000 r-xp 00002000 00:00 156754 /lib/x86_64-linux-gnu/libdl-2.23.so
7ff4d5618000-7ff4d5619000 rwxp 00003000 00:00 156754 /lib/x86_64-linux-gnu/libdl-2.23.so
7ff4d5619000-7ff4d563f000 r-xp 00000000 00:00 156749 /lib/x86_64-linux-gnu/ld-2.23.so
7ff4d5647000-7ff4d5662000 r-xp 00000000 00:00 0
7ff4d5662000-7ff4d5736000 rwxp 00000000 00:00 0
7ff4d5736000-7ff4d5753000 r-xp 00000000 00:00 3277415 /data/app/taf/tafnode/data/libleafInterface.so
7ff4d5753000-7ff4d5754000 r-xp 0001c000 00:00 3277415 /data/app/taf/tafnode/data/libleafInterface.so
7ff4d5754000-7ff4d5755000 rwxp 0001d000 00:00 3277415 /data/app/taf/tafnode/data/libleafInterface.so

其中7ff4d5647000-7ff4d5662000 r-xp 00000000 00:00 0是MCJIT的代码段

pprof-go在10年前兼容过有一些工具的bug,https://github.com/google/pprof/commit/5509abdceb968dbea918e8f6c83e14cffb81230d

Combine adjacent mappings even if offsets are unavailable

将相邻的映射合并,即使偏移不可用

Some profile handlers will split mappings into multiple adjacent entries. This interferes with pprof when overriding the main binary.

一些性能分析处理器会把映射拆分成多个相邻的条目。这在覆盖主二进制文件时会干扰 pprof。

Currently there is code that attempts to combine these mappings, but it fails if the mapping offsets aren't available. Add a check to combine them in that situation.

当前已有尝试合并这些映射的代码,但如果映射的偏移量不可用则会失败。添加一项检查,以便在这种情况下也能将它们合并。

这个合并功能,会导致把MCJIT的代码段映射到/data/app/taf/tafnode/data/libleafInterface.so去,从而解析出奇奇怪怪的符号来

修复这个问题需要删除https://github.com/google/pprof/blob/main/profile/legacy_profile.go#L229-L238和https://github.com/google/pprof/blob/main/profile/profile.go#L258-L277

MCJIT配合pprof-go正确解析

上一个部分的修复解决了解析错乱的问题,下一步是尽量提供可用的符号,让pprof-go可以正确解析

MCJIT的实例在core的时候,gdb是可以dump出正确的脚本堆栈的,研究了一下gdb是怎么拿到虚拟地址对应二进制的

主要流程是:llvm会把编译代码的虚拟地址段对应代码的二进制,写到gdb规定的__jit_debug_descriptor里面去,gdb就可以解析了

gdb的pprof实现

https://github.com/llvm/llvm-project/blob/llvmorg-7.1.0/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L24-L62

先声明了gdb的c语言数据结构,和全局变量__jit_debug_descriptor

这个数据结构的核心是jit_code_entry链表,包含了每个elf格式代码段的地址symfile_addr,和大小symfile_size,有了这个信息,gdb就可以把/proc/{pid}/maps中缺失elf文件的内存段映射到正确的位置上

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
extern "C" {

typedef enum {
JIT_NOACTION = 0,
JIT_REGISTER_FN,
JIT_UNREGISTER_FN
} jit_actions_t;

struct jit_code_entry {
struct jit_code_entry *next_entry;
struct jit_code_entry *prev_entry;
const char *symfile_addr;
uint64_t symfile_size;
};

struct jit_descriptor {
uint32_t version;
// This should be jit_actions_t, but we want to be specific about the
// bit-width.
uint32_t action_flag;
struct jit_code_entry *relevant_entry;
struct jit_code_entry *first_entry;
};

// We put information about the JITed function in this global, which the
// debugger reads. Make sure to specify the version statically, because the
// debugger checks the version before we can set it during runtime.
struct jit_descriptor __jit_debug_descriptor = { 1, 0, nullptr, nullptr };
}

随后在NotifyDebugger中,会把总的JITCodeEntry注册进gdb的这个地址里面

1
2
3
4
5
6
7
8
9
10
11
12
13
14
void NotifyDebugger(jit_code_entry* JITCodeEntry) {
__jit_debug_descriptor.action_flag = JIT_REGISTER_FN;

// Insert this entry at the head of the list.
JITCodeEntry->prev_entry = nullptr;
jit_code_entry* NextEntry = __jit_debug_descriptor.first_entry;
JITCodeEntry->next_entry = NextEntry;
if (NextEntry) {
NextEntry->prev_entry = JITCodeEntry;
}
__jit_debug_descriptor.first_entry = JITCodeEntry;
__jit_debug_descriptor.relevant_entry = JITCodeEntry;
__jit_debug_register_code();
}

而NotifyDebugger是被GDBJITRegistrationListener::NotifyObjectEmitted调用的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
void GDBJITRegistrationListener::NotifyObjectEmitted(
const ObjectFile &Object,
const RuntimeDyld::LoadedObjectInfo &L) {

OwningBinary<ObjectFile> DebugObj = L.getObjectForDebug(Object);

// Bail out if debug objects aren't supported.
if (!DebugObj.getBinary())
return;

const char *Buffer = DebugObj.getBinary()->getMemoryBufferRef().getBufferStart();
size_t Size = DebugObj.getBinary()->getMemoryBufferRef().getBufferSize();

const char *Key = Object.getMemoryBufferRef().getBufferStart();

assert(Key && "Attempt to register a null object with a debugger.");
llvm::MutexGuard locked(*JITDebugLock);
assert(ObjectBufferMap.find(Key) == ObjectBufferMap.end() &&
"Second attempt to perform debug registration.");
jit_code_entry* JITCodeEntry = new jit_code_entry();

if (!JITCodeEntry) {
llvm::report_fatal_error(
"Allocation failed when registering a JIT entry!\n");
} else {
JITCodeEntry->symfile_addr = Buffer;
JITCodeEntry->symfile_size = Size;

ObjectBufferMap[Key] = RegisteredObjectInfo(Size, JITCodeEntry,
std::move(DebugObj));
NotifyDebugger(JITCodeEntry);
}
}

当MCJIT的重定向完成以后,就会回调注册上的每个RegistrationListener,把内存中制作好的elf格式的内存类通过RuntimeDyld::LoadedObjectInfo传入

  • 初始化,MCJIT::MCJIT()

    • RegisterJITEventListener(JITEventListener::createGDBRegistrationListener())

      把gdb的GDBJITRegistrationListener创建了注册到MCJIT里面

      • EventListeners.push_back(L);
  • 重定向,MCJIT::generateCodeForModule()

    • Dyld.loadObject(*LoadedObject.get())

      对目标文件进行重定位

    • NotifyObjectEmitted(LoadedObject.get(), L);

      • for (unsigned I = 0, S = EventListeners.size(); I < S; ++I)

        • EventListeners[I]->NotifyObjectEmitted(Obj, L);

          这里的EventListeners就是GDBJITRegistrationListener

我的MCJIT的pprof实现

可以实现相似的流程,让MCJIT输出一些额外的数据,然后解析虚拟地址的时候去读这些数据就可以了

MCJIT输出额外数据

这里需要输出两种数据:

  • elf格式的数据
  • 每个section的start和size,用于传递给addr2line进行符号解析
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
class PPROFJITRegistrationListener : public llvm::JITEventListener {
public:
static vector<string>& dumpPaths() {
static thread_local vector<string> vs;
return vs;
}

private:
void NotifyObjectEmitted(
const llvm::object::ObjectFile& Obj,
const llvm::RuntimeDyld::LoadedObjectInfo& L) override {
llvm::object::OwningBinary<llvm::object::ObjectFile> DebugO =
L.getObjectForDebug(Obj);

if (!DebugO.getBinary()) return;

auto DebugObj = DebugO.getBinary();

const char* Buffer = DebugObj->getMemoryBufferRef().getBufferStart();
size_t Size = DebugObj->getMemoryBufferRef().getBufferSize();

for (auto & dumpPath : dumpPaths()){
taf::TC_File::save2file(dumpPath + ".elf", string(Buffer, Size));
}

internal_jce::PPROFSections pprofSections;
for (auto si = DebugObj->section_begin(), se = DebugObj->section_end(); si != se;
++si) {
internal_jce::PPROFSection pprofSection;
auto& section = *si;
llvm::StringRef name;
section.getName(name);
pprofSection.name = name;
pprofSection.start = section.getAddress();
pprofSection.size = section.getSize();
pprofSections.sections.push_back(move(pprofSection));
}
for (auto & dumpPath : dumpPaths()) {
taf::TC_File::save2file(dumpPath + ".meta",
pprofSections.writeToJsonString());
}
}
};

输出的文件如下,存在大量的section(因为c++每个模板都是一个group section,在静态链接阶段会合并到同一个section,MCJIT没做这个)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
~ readelf -S /pprof_meta/P00332.module.media.test.MediaRenderRun.elf |head -n 100
There are 43626 section headers, starting at offset 0x1c8a7e8:

Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .strtab STRTAB 0000000000000000 01a67cf8
0000000000222ae9 0000000000000000 0 0 1
[ 2] .text PROGBITS 00007ff1abd5f000 00000040
00000000000ffc52 0000000000000000 AX 0 0 16
[ 3] .rela.text RELA 0000000000000000 011b0010
0000000000102cc0 0000000000000018 43625 2 8
[ 4] .text.startup PROGBITS 00007ff1c526a000 000ffca0
000000000000541a 0000000000000000 AX 0 0 16
[ 5] .rela.text.startu RELA 0000000000000000 012b2cd0
0000000000007830 0000000000000018 43625 4 8
[ 6] .gcc_except_table PROGBITS 00007ff1abe82000 001050bc
000000000002a840 0000000000000000 A 0 0 4
[ 7] .rela.gcc_except_ RELA 0000000000000000 012ba500
0000000000001668 0000000000000018 43625 6 8
[ 8] .group GROUP 0000000000000000 010a6da0
000000000000000c 0000000000000004 43625 34890 4
[ 9] .text._ZNSt9excep PROGBITS 00007ff1abbcefd0 0012f900
000000000000001f 0000000000000000 AXG 0 0 16
[10] .rela.text._ZNSt9 RELA 0000000000000000 012bbb68
0000000000000018 0000000000000018 G 43625 9 8
[11] .group GROUP 0000000000000000 010a6dac
000000000000000c 0000000000000004 43625 37011 4
[12] .text._ZStplIcSt1 PROGBITS 00007ff1abbaf4c0 0012f920
0000000000000053 0000000000000000 AXG 0 0 16
[13] .rela.text._ZStpl RELA 0000000000000000 012bbb80
0000000000000048 0000000000000018 G 43625 12 8
[14] .group GROUP 0000000000000000 010a6db8
000000000000000c 0000000000000004 43625 37017 4
[15] .text._ZStplIcSt1 PROGBITS 00007ff1abbaf820 0012f980
000000000000008d 0000000000000000 AXG 0 0 16

pprof-go实现

pprof-go源码分析

数据结构

pprof-go会把pprof格式的文件封装成Profile

其中堆栈信息,封装成Location

而/proc/{pid}/maps的数据,封装成Mapping

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
type Mapping struct {
ID uint64
Start uint64
Limit uint64
Offset uint64
File string
}

type Location struct {
ID uint64
Mapping *Mapping
Address uint64
}

type Profile struct {
...
Mapping []*Mapping
Location []*Location
...
}
函数地址映射到ELF文件解析

Profile的解析逻辑比较深,调用栈如下

  • main()
    • driver.PProf()
      • internaldriver.PProf()
        • fetchProfiles()
          • grabSourcesAndBases()
            • chunkedGrab()
              • concurrentGrab()
                • grabProfile()
                  • fetch()
                    • profile.Parse()
                      • ParseData()
                        • parseLegacy()
                          • parseHeap()
                            • parseAdditionalSections()
                              • ParseMemoryMapFromScanner()

ParseMemoryMapFromScanner会把解析到的Location映射到对应的Mapping上

https://github.com/google/pprof/blob/9e5a51aed1e8fb135a04db444b671cd8256cccf4/profile/legacy_profile.go#L1031-L1042

1
2
3
4
5
6
7
8
9
10
11
12
func (p *Profile) ParseMemoryMapFromScanner(s *bufio.Scanner) error {
mapping, err := parseProcMapsFromScanner(s) //从文件里面读取有哪些Map
if err != nil {
return err
}
p.Mapping = append(p.Mapping, mapping...)
p.massageMappings() //把main程序放到Mapping的第一个,且赋值ID
p.remapLocationIDs() //对Location去重,且赋值ID
p.remapFunctionIDs() //这里tcmalloc的pprof用不到
p.remapMappingIDs() //对Location遍历,找到对应的Mapping
return nil
}

核心的映射代码就是remapMappingIDs了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
func (p *Profile) remapMappingIDs() {
...
nextLocation:
for _, l := range p.Location {
a := l.Address
if l.Mapping != nil || a == 0 {
continue
}
for _, m := range p.Mapping {
if m.Start <= a && a < m.Limit {
//如果这个函数Address在Mapping的Start和Limit内的,就对应到这个Mapping
l.Mapping = m
//找到了就跳过后续逻辑,执行下一个Location
continue nextLocation
}
}
//兜底到一个Fake的Mapping上
if fake == nil {
fake = &Mapping{
ID: 1,
Limit: ^uint64(0),
}
p.Mapping = append(p.Mapping, fake)
}
l.Mapping = fake
}
}
把函数地址和ELF文件传递给addr2line处理
  • main()
    • driver.PProf()
      • internaldriver.PProf()
        • fetchProfiles()
          • Symbolize()
            • localSymbolize()
              • doLocalSymbolize()
                • symbolizeOneMapping()

https://github.com/google/pprof/blob/9e5a51aed1e8fb135a04db444b671cd8256cccf4/internal/symbolizer/symbolizer.go#L207-L244

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
func symbolizeOneMapping(m *profile.Mapping, locs []*profile.Location, obj plugin.ObjFile, addFunction func(*profile.Function) *profile.Function) {
for _, l := range locs {
//SourceLine的默认实现就是addr2line,addr2line实现了一个管道,往里面填address就可以返回解析数据
stack, err := obj.SourceLine(l.Address)
if err != nil || len(stack) == 0 {
// No answers from addr2line.
continue
}

l.Line = make([]profile.Line, len(stack))
l.IsFolded = false
for i, frame := range stack {
...省略,处理堆栈信息
}

if len(stack) > 0 {
m.HasInlineFrames = true
}
}
}

实现

数据结构

MCJIT的ELF文件在mcjit输出额外数据展示了,MCJIT有大量0x0的函数地址,是通过不同的Section区分的

新增了一个Section的字段,用来存储不同Section

1
2
3
4
5
6
7
8
type Mapping struct {
ID uint64
Start uint64
Limit uint64
Offset uint64
File string
Section string
}
函数地址映射到ELF文件解析

思路是读取输出的文件,写入到一个ExtendMapping结构,符号匹配的时候,优先去和ExtendMapping的数据匹配

这一段代码用来生成ExtendMapping,ExtendMapping的Mapping数据和普通的最大区别,是他的Section字段不为空

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
func (p *Profile) ParseMemoryMapFromScanner(s *bufio.Scanner) error {
mapping, err := parseProcMapsFromScanner(s)
if err != nil {
return err
}
//这里新增一个函数
p.ExtendMapping = p.parseExtProcMapsFromScanner()
p.Mapping = append(p.Mapping, mapping...)
p.massageMappings()
p.remapLocationIDs()
p.remapFunctionIDs()
p.remapMappingIDs()
return nil
}

func (p *Profile) parseExtProcMapsFromScanner() []*Mapping {
var mappings []*Mapping

// 遍历当前目录下所有 .meta 文件
files, err := filepath.Glob("*.meta")
if err != nil {
fmt.Println("Error finding .meta files:", err)
return nil
}

//files为空,那么扫描/pprof_meta目录
if len(files) == 0 {
files, err = filepath.Glob("/pprof_meta/*.meta")
if err != nil {
fmt.Println("Error finding .meta files in /pprof_meta:", err)
return nil
}
}

for _, metaFile := range files {
data, err := ioutil.ReadFile(metaFile)
if err != nil {
fmt.Println("Error reading file:", metaFile, err)
return nil
}

var sections PPROFSections
if err := json.Unmarshal(data, &sections); err != nil {
fmt.Println("Error parsing JSON in file:", metaFile, err)
return nil
}

// 把 .meta 改成 .elf
elfFile := strings.TrimSuffix(metaFile, ".meta") + ".elf"

for _, sec := range sections.Sections {
m := &Mapping{
Start: sec.Start,
Limit: sec.Start + sec.Size,
Section: sec.Name,
File: elfFile,
}
mappings = append(mappings, m)
}
}

return mappings
}

这一段代码,进行优先符号匹配

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
func (p *Profile) remapMappingIDs() {
...
nextLocation:
for _, l := range p.Location {
a := l.Address
if l.Mapping != nil || a == 0 {
continue
}
//新增这里的代码
for _, m := range p.ExtendMapping {
if m.Start <= a && a < m.Limit {
//如果这个函数Address在Mapping的Start和Limit内的,就对应到这个Mapping
l.Mapping = m
//找到了就跳过后续逻辑,执行下一个Location
continue nextLocation
}
}
for _, m := range p.Mapping {
if m.Start <= a && a < m.Limit {
//如果这个函数Address在Mapping的Start和Limit内的,就对应到这个Mapping
l.Mapping = m
//找到了就跳过后续逻辑,执行下一个Location
continue nextLocation
}
}
//兜底到一个Fake的Mapping上
if fake == nil {
fake = &Mapping{
ID: 1,
Limit: ^uint64(0),
}
p.Mapping = append(p.Mapping, fake)
}
l.Mapping = fake
}

//最后把ExtendMapping合并入Mapping,以避免影响其他逻辑
p.Mapping = append(p.Mapping, p.ExtendMapping...)
}
把函数地址和ELF文件传递给addr2line处理

pprof-go的默认实现使用的addr2line是这样的

https://github.com/google/pprof/blob/9e5a51aed1e8fb135a04db444b671cd8256cccf4/internal/binutils/addr2liner.go#L90-L120

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
const (
defaultAddr2line = "addr2line"
)
func newAddr2Liner(cmd, file string, base uint64) (*addr2Liner, error) {
if cmd == "" {
cmd = defaultAddr2line
}

j := &addr2LinerJob{
cmd: exec.Command(cmd, "-aif", "-e", file),
}

var err error
if j.in, err = j.cmd.StdinPipe(); err != nil {
return nil, err
}

outPipe, err := j.cmd.StdoutPipe()
if err != nil {
return nil, err
}

j.out = bufio.NewReader(outPipe)
if err := j.cmd.Start(); err != nil {
return nil, err
}

a := &addr2Liner{
rw: j,
base: base,
}

return a, nil
}

没有用-j指定section,我翻阅了addr2line的源码,会对传入的符号搜索全部的section,返回第一个section(一般是.text)的结果

因此是用来定位单个.text的section的,不符合MCJIT的需求

因为MCJIT的ELF文件在mcjit输出额外数据展示了,MCJIT有大量0x0的函数地址,在其他section里面,会直接匹配到.text

因此只能改写代码,在doLocalSymbolize中,如果某个map存在Section字段(说明是来自ExtendMapping解析出来的),就用特殊的addr2line逻辑去解析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
func doLocalSymbolize(prof *profile.Profile, fast, force bool, obj plugin.ObjTool, ui plugin.UI) error {
...省略
for midx, m := range prof.Mapping {
locs := mappingLocs[m]
if len(locs) == 0 {
// The mapping is dangling and has no locations pointing to it.
continue
}
if m.Section != "" {
symbolizeOneMappingWithSection(m, locs, addFunction)
continue
}
...省略
}
}

func symbolizeOneMappingWithSection(m *profile.Mapping, locs []*profile.Location, addFunction func(*profile.Function) *profile.Function) {
for _, l := range locs {
cmd := fmt.Sprintf("addr2line -if -j %s -e %s %x", m.Section, m.File, l.Address-m.Start)
out, err := exec.Command("sh", "-c", cmd).Output()
if err != nil {
fmt.Println("Error executing command:", cmd, err)
continue
}
lines := strings.Split(strings.TrimSpace(string(out)), "\n")
if len(lines)%2 != 0 {
fmt.Println("Unexpected output from addr2line:", string(out))
continue
}
var stack []plugin.Frame
for i := 0; i < len(lines); i += 2 {
funcname := strings.TrimSpace(lines[i])
fileline := strings.TrimSpace(lines[i+1])
fmt.Fprintf(os.Stderr, "addr2line command: cmd=%s\n Address=%x, Section=%x, Func=%s, Fileline=%s\n", cmd, l.Address, m.Start, funcname, fileline)
if strings.HasPrefix(funcname, "0x") {
continue
}
frame := parseFrame(funcname, fileline)
stack = append(stack, frame)
}
if len(stack) > 0 {
//处理堆栈信息,实现省略
symbolizeFrames(m, l, stack, addFunction)
}
}
}

总结

本文整理了我遇到的MCJIT无法使用pprof的问题,通过为MCJIT编写JITRegistrationListener代码,并修改pprof-go源码

让MCJIT输出一些额外的数据,然后pprof-go解析虚拟地址的时候去读这些数据就可以了

附录

pprof-go导出火焰图和节点图问题

节点图

和pprof-perl不一样的是,pprof-go生成pdf依赖的是Graphviz生成工具,不再是Ghostscript工具了

可以在https://graphviz.org/download/下载源码进行编译

生成pdf依赖gd包,完整的编译脚本如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
cur=`pwd`

# pprof-go生成pdf的dot -Tpng 依赖
# 去掉这一行,那么生成的graphviz包二进制不支持生成pdf
apt install -y libgd-dev fontconfig libcairo2-dev libpango1.0-dev libgts-dev

# 最后一个C++11支持的版本
tar xvf graphviz-12.0.0.tar.gz
cd graphviz-12.0.0
./configure --prefix=/tmp/gv
make -j && make install
cd /tmp

# graphviz依赖fontconfig-config配置
apt download fontconfig-config
mkdir -pv gv/deb/
cp *.deb gv/deb/
ls -lh gv/deb/

# 把graphviz动态库的依赖动态库全部一起拷贝进来,从而避免使用的时候存在外部依赖
ldd gv/lib/graphviz/*so*|\
grep -oP '/[^ )]+'|grep -v ':'|grep -v 'gv/lib'|\
grep -v libc.so|grep -v libm.so|grep -v librt.so|\
grep -v libdl.so|grep -v libglib|grep -v libpthread.so|\
xargs -r -d '\n' -I{} cp -L '{}' gv/lib/

使用脚本

1
2
3
4
5
6
7
8
9
#$tooldir是graphviz工具的解压目录
export PATH=$tooldir/gv/bin:$PATH
export LD_LIBRARY_PATH=$tooldir/gv/lib:$LD_LIBRARY_PATH

#安装设置fontconfig-config配置的目录
dpkg -x gv/deb/fontconfig* /opt/fontconfig
export FONTCONFIG_PATH=/opt/fontconfig/etc/fonts

pprof --pdf --lines $binary 1.pprof > 1.pdf

火焰图

pprof-go是支持通过打开http端口,提供web服务生成火焰图的,不支持其他任何形式的直接导出方式

而且生成的火焰图,说实话也不如FlameGraph的好看

经过摸索,可以通过FlameGraph里面的stackcollapse-go.pl这个为go特定的分析工具配合pprof-go生成火焰图

1
2
3
4
5
6
7
8
#让pprof-go输出原始数据
pprof --raw --lines $binary 1.pprof > tmp/out.data

#让FlameGraph的工具把原始数据堆栈收集汇总起来
stackcollapse-go.pl tmp/out.data > tmp/out.collapse

#让FlameGraph的工具把汇总的数据生成火焰图
flamegraph.pl tmp/out.collapse 1.svg

但是这里的stackcollapse-go.pl还存在一点问题

简单了解下stackcollapse-go.pl的工作范畴,对于以下的例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Example Input:
# ...
# Samples:
# samples/count cpu/nanoseconds
# 1 10000000: 1 2
# 2 10000000: 3 2
# 1 10000000: 4 2
# ...
# Locations
# 1: 0x58b265 scanblock :0 s=0
# 2: 0x599530 GC :0 s=0
# 3: 0x58a999 flushptrbuf :0 s=0
# 4: 0x58d6a8 runtime.MSpan_Sweep :0 s=0
# ...
# Mappings
# ...

这里Samples的的一行数据,由冒号分割以后,前面数字代表指标,后面数字指向Locations里面的堆栈

1 10000000: 1 2,指的是GC调用scanblock后,scanblock被记录了1次,花费了10000000ns

Mappings里面数据就没啥用了,因为Locations里面已经有了函数名称

samples的解析代码如下:

https://github.com/brendangregg/FlameGraph/blob/v1.0/stackcollapse-go.pl#L120-L125

1
2
3
4
5
6
7
8
9
10
11
12
13
if ($state eq "sample") {
if (/^\s*([0-9]+)\s*[0-9]+: ([0-9 ]+)/) {
#^\s* :开头可以存在0到多个空字符串
#([0-9]+) :匹配多个数字,并且进入捕获组1(也就是下面的$samples)
#\s* :匹配0到多个空字符串
#[0-9]+ :匹配多个数字
#: :冒号分割
#([0-9 ]+) :匹配多个数字和空格,并且进入捕获组2(也就是下面的$stack)
my $samples = $1;
my $stack = $2;
remember_stack($stack, $samples);
}
}

但是实际的例子是这样的,tcmalloc的MallocExtension::instance()->GetHeapSample(&s)生成的数据:

1
2
3
4
5
6
7
8
9
10
PeriodType: space bytes
Period: 524288
Samples:
objects/count space/bytes
1522 205791384: 2 3 4 5 6 7
bytes:[135168]
1 1557983: 10 11 12 13 14 15 16 17 18 19 20 21
bytes:[1462185]
1524 524460: 23 4 5 6 7
bytes:[344

对于这个例子,samples只收集了objects/count,而不是关心的space/bytes

也可能是这样的,tcmalloc的GetHeapProfile()生成的数据:

1
2
3
4
5
6
7
8
9
10
PeriodType: space bytes
Period: 1
Samples:
alloc_objects/count alloc_space/bytes inuse_objects/count inuse_space/bytes
10 10485760 10 10485760: 1 2 3 4 5 6
bytes:[1048576]
1 1048576 1 1048576: 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
bytes:[1048576]
1 16920 0 0: 23 24 25 26 27 28 29 30
bytes:[16920]

对于这个例子,由于冒号前面存在四列,因此正则直接无法匹配了,最终修改代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
if ($state eq "sample") {
if (/^\s*((?:\d+\s+)*\d+)\s*:\s*([0-9 ]+)/) {
#例如:"2 2097152 2 2097152: 11 12 13 14 15 16 17 18"
# ^\s* 行首后可有若干空白
# ( 捕获组1:冒号前的所有数值(至少一个):
# (?:\d+\s+)* 非捕获组,重复0或多次:“一串数字(\d+)后接至少一个空白(\s+)”
# \d+ 最后再跟一个数字,确保以数字结尾而非空白
# )
# \s*:\s* 可选空白 + 冒号 + 可选空白(定位分隔符 :)
# ([0-9 ]+) 捕获组2:冒号后的栈ID列表(由数字和空格组成)
# 该写法允许样本行在冒号前出现多列(如 4 列 heap 指标),兼容 profile/heap 的文本格式
my @v = split /\s+/, $1; # 冒号前的多列数值
my $samples = $v[-1]; # 默认取最后一列作为计数(需要第一列就用 $v[0])
my $stack = $2; # 冒号后的栈ID列表
remember_stack($stack, $samples);
}
}