Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Integrate our fork #11

Open
jonashaag opened this issue Aug 4, 2016 · 28 comments
Open

Discussion: Integrate our fork #11

jonashaag opened this issue Aug 4, 2016 · 28 comments

Comments

@jonashaag
Copy link
Contributor

Intro and motivation

As already noted here vmprof/vmprof-python#90 (comment), we have implemented our own vmprof server, for the following reasons:

  • At that time, vmprof-server was very slow on large profiles (multiple hours of runtime) as it stored the profiles in the SQL database. (I'm not sure how the current implementation compares to ours.)
  • We wanted to have a good memory profile viewer in the server
  • We don't really need user accounts etc.

Features of our implementation

I can't share the source code of our server just yet, for bureaucratic reasons, but I can share some information and a few screenshots here.

Properties and differences to vmprof-server:

  • About 1000 LOC
  • Uses vmprof-server CPU viewer (no jitlog integration yet)
  • Much improved memory viewer based on Plotly:
    • Shows time and date on X axis
    • Data points aren't simply sampled from the complete data, but binned (mean), so that you don't miss spikes due to too-low sampling interval
    • Show memory usage mean, max or both
    • Nice interaction with the graph due to Plotly
    • Shows absolute or relative runtime on X axis
  • Allows to search for projects and functions/callables
  • Stores profiles as gzipped msgpack files, no decoding done in the server whatsoever: Data is encoded to .msgpack.gz in the client once, and delivered to the browser UI as-is. The only exception to this is for memory profile resampling.

We have also implemented a new client:

  • About 100 LOC

  • Interface isn't a script runner like python -m vmprof yourscript.py but a decorator that is applied to to-be-profiled callables, like

    @profile
    def somefunc():
        ...
  • Allows to tag your submissions with a project name

  • Automatically tags your submissions with the top-level function/callable name (somefunc)

  • Client can upload normal vmprof profile files

  • Client protocol not compatible to the vmprof-python protocol (but very similar)

Screenshots

bildschirmfoto vom 2016-08-04 12 40 08
Landing page with project names and top-level function names (in red). Search filters may be shared using the arrow on the right

bildschirmfoto vom 2016-08-04 12 41 36
Integration of vmprof-server CPU viewer

bildschirmfoto vom 2016-08-04 12 41 04
The memory viewer, showing max memory usage for each bin (no hidden spikes!). On the right: Upper stacktrace shows the largest common denominator of all stack traces of the bin. Lower stacktrace extends the upper one by the most common stack trace of the bin (28% of the stack traces in the bin were equal to the "concatenation" of the two stack trace parts).

bildschirmfoto vom 2016-08-04 12 41 19
Memory viewer showing mean + max of each bin

Future of our server, integration in vmprof-server

I think our server has some nice properties, mainly the memory viewer and the storage system (although I'm not sure how it compares to the current vmprof-server JSON/Gzip storage system in terms of performance). We'd love the contribute most of it back to vmprof-server proper.

Possibility A: Integrate vmprof-server into our server

  • Integrate jitlog into our server
  • Maybe integrate user accounts into our server
  • Make our server the new official server

Possibility B: Integrate our memory viewer into vmprof-server

  • Add memory view to vmprof-server
  • Change protocol accordingly

What do you guys think?

@jonashaag
Copy link
Contributor Author

cc @StephanErb

@planrich
Copy link
Contributor

planrich commented Aug 4, 2016

Hi, looks like a solid enhancement to vmprof-server. As I see things now: I would stand up for possibilty B. We already maintain and run the service and this will continue that way.

How did you solve the SQL storage issue? Store the gzipped profiles on the file system? If you ask me we should not continue to store json in the relational database. For jitviewer the gzipped file is stored on the file system which is good enough as far as I can tell.

@planrich
Copy link
Contributor

planrich commented Aug 4, 2016

In any case, is the source available on the web? I would to look at it for a bit to estimate how much work it would be to merge it. Or would you just open a pull request then?

@jonashaag
Copy link
Contributor Author

How did you solve the SQL storage issue? Store the gzipped profiles on the file system?

Yes

In any case, is the source available on the web? I would to look at it for a bit to estimate how much work it would be to merge it. Or would you just open a pull request then?

As I said I can't make it available at the moment but I'll send you a private, confidential copy in a few seconds.

@planrich
Copy link
Contributor

Any news? I'm planning to release a bug fix version today, so we could aim for the next major release

@jonashaag
Copy link
Contributor Author

Have you had a look at the code? I haven't put in the effort of making our server open source yet; I expected your "complexity of integration assessment" first. But we can also make it open source first.

@planrich
Copy link
Contributor

planrich commented Sep 7, 2016

Yes I did, nothing that takes tremendous amount of re-engineering. If you can make it open source, we will integrate it into the public service. After all this is a feature many people want to have!

@jonashaag
Copy link
Contributor Author

Cool, we are through with the internal open source process, I'll release the source by the end of the week.

@jonashaag
Copy link
Contributor Author

@planrich
Copy link
Contributor

Great! I assume that the licence of both is compatible to MIT? I'll probably find some time soon to pull in your fork!

@jonashaag
Copy link
Contributor Author

Awesome! I'm happy to assist. License is MIT, yes. The features that are most important for us are memory profiles with good stack traces and higher-performance profile storage (not in the SQL database)

@jonashaag
Copy link
Contributor Author

What's the current state of this? Is there anything I can do to help get the merge done?

@planrich
Copy link
Contributor

There are a view commits I have made, the migrations are ready but I need to apply some changes here and there (model names have changed, ...). We also need to copy and modify the client in the vmprof package. That would be a small project on it's own.

@jonashaag
Copy link
Contributor Author

jonashaag commented Oct 26, 2016

OK, let me know if when should help out.

@planrich
Copy link
Contributor

I'm planning to work on vmprof this friday (Munich, pycon.de) and maybe I find time to push this forward. It would be great if you have a look on integrating the client side code to github.com/vmprof/vmprof-python

@rongekuta
Copy link

@jonashaag , yesterda. I set up vmprof-viewer-server according as https://github.com/blue-yonder/vmprof-viewer-server

but meet a mistake:
(env) [root@localhost vmprof-viewer-server]# vmprof_viewer/manage.py runserver
Performing system checks...

System check identified no issues (0 silenced).
November 22, 2016 - 03:03:11
Django version 1.10.3, using settings 'vmprof_viewer.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Unhandled exception in thread started by <function wrapper at 0x31c92a8>
Traceback (most recent call last):
File "/root/vmprof-viewer-server/env/lib/python2.7/site-packages/django/utils/autoreload.py", line 226, in wrapper
fn(*args, **kwargs)
File "/root/vmprof-viewer-server/env/lib/python2.7/site-packages/django/core/management/commands/runserver.py", line 142, in inner_run
handler = self.get_handler(*args, **options)
File "/root/vmprof-viewer-server/env/lib/python2.7/site-packages/django/contrib/staticfiles/management/commands/runserver.py", line 27, in get_handler
handler = super(Command, self).get_handler(*args, **options)
File "/root/vmprof-viewer-server/env/lib/python2.7/site-packages/django/core/management/commands/runserver.py", line 64, in get_handler
return get_internal_wsgi_application()
File "/root/vmprof-viewer-server/env/lib/python2.7/site-packages/django/core/servers/basehttp.py", line 59, in get_internal_wsgi_application
sys.exc_info()[2])
File "/root/vmprof-viewer-server/env/lib/python2.7/site-packages/django/core/servers/basehttp.py", line 49, in get_internal_wsgi_application
return import_string(app_path)
File "/root/vmprof-viewer-server/env/lib/python2.7/site-packages/django/utils/module_loading.py", line 20, in import_string
module = import_module(module_path)
File "/usr/lib64/python2.7/importlib/init.py", line 37, in import_module
import(name)
django.core.exceptions.ImproperlyConfigured: WSGI application 'vmprof_viewer.wsgi.application' could not be loaded; Error importing module: 'No module named wsgi'

What can I do to fix it ?

@jonashaag
Copy link
Contributor Author

jonashaag commented Nov 22, 2016

Create a file vmprof_viewer/wsgi.py with the following contents

"""
WSGI config for myproject project.
It exposes the WSGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/1.7/howto/deployment/wsgi/
"""

import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "vmprof_viewer.settings")

from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()

@planrich
Copy link
Contributor

I have been working on the server to display the memory graph. Since we agreed to not have separate files, I wonder what the file format for the mem.msgpack.gz and addr_name_map.msgpack.gz is.

What I'm currently trying to do is reconstruct mem and addr_name_map on ther server from the vmprof profile.

@jonashaag
Copy link
Contributor Author

@planrich
Copy link
Contributor

planrich commented Dec 22, 2016

the resampling is now again done on the server (as shortly discussed in the commit comment), I got most of it working I think. That is missing now on master (just to summarize):

  • peak memory is not displayed (top right)
  • duration is not displayed (top right)
  • relative and absolute time is not yet working
  • profiles cannot be grouped to projects
  • function profiling

for the last point (function profiling) I think there is some special decorator, can you maybe point me to an API or example how it should be used? I'm unsure how top_level_function name is sent to the server.

@jonashaag
Copy link
Contributor Author

Cool! We can leave out the function profiling stuff and grouping for now. Function profiling is something that can also live in a separate project; it's a mere convenience wrapper.

@criemen
Copy link
Contributor

criemen commented Apr 28, 2017

Hi,
as Jonas is no longer with us, I was assigned to this project.
I think that #11 (comment) is still an accurate summary as of now.
What can we do in order to get the missing features into the server? Maybe without function profiling, I guess we could provide that externally, and if you're interested, integrate that later.

@planrich
Copy link
Contributor

planrich commented May 3, 2017

Sorry for the late response! I think a first step it would be good to test the current setup as it is. One problem was and is that like the flamegraph visualisation there is no documentation about what it means. E.g. what is the difference between absolute and relative in the memory view (where is the origin of 'relative').

Here are some issues I remember:

  • we use PyPy for generating the output sent to the browser (that contains the memory view data), it sometimes occurs that a numpy array is resized which is not supported on pypy
  • Absolute/Relative button are not working

I have been thinking to extend vmprof.com to make a short tutorial as an overlay to the flamegraph that shows the essential details (as it is done in the jitlog). I think that would be a good addition to the memory view as well.

@jonashaag
Copy link
Contributor Author

Hi guys, if you have any questions, I'm happy to help.

Flamegraph: Not sure what you mean. The memory graph should be pretty obvious.

@Corni most of the differences between our internal vmprof frontend and the vmprof.com frontend should be easy to add, except for the grouping/project structure. Not sure if you guys @planrich are interested in that at all?

@planrich
Copy link
Contributor

planrich commented May 4, 2017

'pretty obvious' is a stretchable term. at least I have experienced that some people have no clue what the flame graph means. Usually you get no feedback at all and from time to time you find out that they kind of guess how it "should" be (profiling is not about how it should be, but how it is IMHO). So my idea is to make the docs easily accessible (preferable in the same view as the profiling visualisation).

@jonashaag
Copy link
Contributor Author

Flame graph = the blue line? I'm confused by the term "flame graph" here... it's a completely different kind of visualisation that the CPU flame graph.

If flame graph = blue line, do you recall what confuses people about it?

@planrich
Copy link
Contributor

planrich commented May 4, 2017

no, I was taking about the CPU visualisation (=>CPU flame graph view). What confused me about the memory view is relative/absolute. does relative mean, "relative" to the minium heap size (= subtract the minimum heap size)?

@jonashaag
Copy link
Contributor Author

Um, it was relative/absolute TIME. The reasoning here was that some people know "this must have been somewhen around 11:30 yesterday" and other people look for "about 20 minutes into the program run".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants