Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bus error #1

Open
brendangregg opened this issue Apr 6, 2017 · 10 comments
Open

bus error #1

brendangregg opened this issue Apr 6, 2017 · 10 comments
Assignees

Comments

@brendangregg
Copy link

Just mucking around with an existing Xen server and Linux guests:

# ./uniprof out.prof 27
Bus error (core dumped)
# gdb ./uniprof cone
[...]
Program terminated with signal SIGBUS, Bus error.
#0  __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/../multiarch/memmove-vec-unaligned-erms.S:292
292	../sysdeps/x86_64/multiarch/../multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0  __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/../multiarch/memmove-vec-unaligned-erms.S:292
#1  0x000055df7f056024 in walk_stack_fp ()
#2  0x000055df7f056175 in do_stack_trace_fp ()
#3  0x000055df7f0554f0 in main ()

Of course, I also had to unpause the domain to let it continue:

# xl list
Name                                        ID   Mem VCPUs	State	Time(s)
Domain-0                                     0  2438     4     r-----    9222.0
vm0                                          1   512     1     r-----     167.5
vm2hvm                                      27   512     3     --p---     157.2
vm1hvm                                      28   512     1     r-----     277.9
# xl unpause vm2hvm

If it's crashing when trying to walk frame pointers, then sure, Linux is likely running things that aren't using them...

It doesn't crash all the time. It does sometimes work.

@fajs
Copy link
Contributor

fajs commented Apr 6, 2017

Thanks for the bug report!

I see you're running a Linux VM, not a unikernel, and guessing from the name, an HVM one. Since the tool is primarily designed as a unikernel profiler, I'm not surprised it isn't working all that well. Especially when you sample userspace application stacks, I don't expect uniprof to produce much useful output because of the missing symbol resolution.

Obviously, crashing isn't the best behavior though. ;-) I'll try to see how to gracefully handle that situation. You're probably right that the crash is due to going into stacks that don't have frame pointers, and consequently trying to map or read bogus memory addresses.

I could try to add an option, too, that allows you to only walk kernel stacks (since those are the ones one would likely have symbol information for). Have to think of the best way to do that, though...

@fajs fajs self-assigned this Apr 6, 2017
@fajs
Copy link
Contributor

fajs commented Apr 6, 2017

So, after thinking about this:

  • This happens when uniprof assumes a stack with frame pointers, but the code whose stack is being walked was compiled with -fomit-frame-pointer (or doesn't have frame pointers for whatever other reason)
  • From the outside, I don't think it's possible to check whether a program has been compiled with frame pointers or not; it's effectively an application-internal ABI. It's also not really possible to look at a value taken from the stack and decide whether it's a memory address or not, save maybe for heuristics.
  • So the best we can do is probably make sure that we don't crash from trying to map and access bad memory.

I have a patch for this ready, which seems to work well on x86. This also considers the arm version though, and I don't have a test platform for it right now. I hope I'll have time today or tomorrow to test this on arm before pushing it out.

@fajs
Copy link
Contributor

fajs commented Apr 8, 2017

Could you try commit fab6d1e? It should fix the bus error, and instead show warnings whenever it tries to walk a stack and ends up with invalid addresses.

@brendangregg
Copy link
Author

Thanks; it doesn't crash anymore. It prints this:

# ./uniprof -F 10 -T 5 out.prof2 2
failed to allocate memory mapping page.
failed to allocate memory mapping page.
failed to allocate memory mapping page.
failed to allocate memory mapping page.
failed to allocate memory mapping page.
failed to allocate memory mapping page.
failed to allocate memory mapping page.
failed to allocate memory mapping page.
failed to allocate memory mapping page.
failed to allocate memory mapping page.
failed to allocate memory mapping page.
[...]

and

# cat out.prof2 
#unikernel stack tracer using libxencall hypercall interface
#tracing domid 2 on 2017-04-09 12:13:12 PDT (-0700)

0x55ffd74e2828
0x7f7876cf93f1
0x56e258d4c544155
0

0xffffffff96e64236
0

0xffffffff96e64236
0

0x55ffd74e2828
0x7f7876cf93f1
0x56e258d4c544155
0

0xffffffff96e64236
0

0xffffffff96e64236
0

0x55ffd74e2828
0x7f7876cf93f1
0x56e258d4c544155
0

0xffffffff96e64236
0

0xffffffff96e64236
0
[...]

What's the meaning of the 0 after the stack trace?

@fajs
Copy link
Contributor

fajs commented Apr 9, 2017

The 0 effectively invalidates the printed stack trace, because it says "this trace occurred 0 times". The idea is that, since this stack walk didn't finish successfully, it's not trustworthy and most likely contains bogus addresses. So it still prints it out (in case the user is interested in it, for example, for debugging), but further processing or aggregation tools (like your stackcollapse.pl) should not consider them.

The good thing is that it now works as intended if it can't successfully walk stacks, and doesn't crash any more. The bad thing is that it's a bit underwhelming: I would have expected it to at least pick up stack traces when its sampling hits stack for code that was compiled with frame pointers. Are all of them invalidated with a 0?

@brendangregg
Copy link
Author

ok, thanks. All 0:

# cat out.prof2 | ../FlameGraph/stackcollapse.pl 
0x56e258d4c544155;0x7f7876cf93f1;0x55ffd74e2818 0
0x56e258d4c544155;0x7f7876cf93f1;0x55ffd74e281b 0
0x56e258d4c544155;0x7f7876cf93f1;0x55ffd74e281e 0
0x56e258d4c544155;0x7f7876cf93f1;0x55ffd74e2825 0
0x56e258d4c544155;0x7f7876cf93f1;0x55ffd74e2828 0
0x56e258d4c544155;0x7f7876cf93f1;0x55ffd74e282f 0
0x56e258d4c544155;0x7f7876cf93f1;0x55ffd74e2832 0
0x56e258d4c544155;0x7f7876cf93f1;0x55ffd74e2837 0
0xffffffff96e64236 0

@fajs
Copy link
Contributor

fajs commented Apr 9, 2017

So it does seem to hit only two different stacks, I wonder whether those both happen to be framepointer-less?

@brendangregg
Copy link
Author

I was running a microbenchmark I'd compiled with -fno-omit-frame-pointer.

@fajs
Copy link
Contributor

fajs commented Apr 9, 2017

Alright, then I'll have to dig deeper into the issue. Sounds like there might be some issue, maybe with translating the addresses, or traversing the stack correctly. The tool's really been mostly used as a unikernel profiler so far, so I have done little testing on "real" OSs. At least the crash-bug seems resolved.

Out of curiosity, did you test uniprof on a Linux VM because that's what you had available most conveniently? Because I've been wondering how useful the tool actually is for full-blown VMs, where you could profile your applications from inside the VM with all the kinds of applications that are already available for application profiling. That's one of the reasons I never looked much into it for uniprof. For unikernels, the need is much more obvious because there's a lack of tools, and you generally can't simply log into the VM and run perf.

@brendangregg
Copy link
Author

I just happened to have a Linux VM handy, and tried it out. You're right in that this is not the main use case.

A number of us have speculated about what it would take for a dom0 profiler to work on all OSes, without needing to login and run a profiler in the guest (although we will need its symbol tables). But that's secondary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants