Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http/web.py performance issues #305

Closed
tleb opened this issue Jul 24, 2024 · 6 comments
Closed

http/web.py performance issues #305

tleb opened this issue Jul 24, 2024 · 6 comments
Assignees
Labels

Comments

@tleb
Copy link
Member

tleb commented Jul 24, 2024

Issue for tracking http/web.py performance work.

On the official instance, a directory load (without caching) is about 500ms. A small file load (without caching) is about 750ms. That is a lot of time spent for only reading data. I expect low-hanging fruits.

  1. See if these sort of timescales are also seen in local environment.
  2. Profile http/web.py to see where time is spent.
  3. Make decisions based on that.
@tleb tleb changed the title Performance issues http/web.py performance issues Jul 24, 2024
@tleb tleb added the bug label Jul 24, 2024
@fstachura
Copy link
Collaborator

I decided to investigate on #304. I know this may be controversial, but a) major culprits seem to be in things that didn't change much, b) since that revision is split into more funcitons, the flamegraph contains more interesting information. It's also easier to choose what to measure manually since code is refactored into more functions.

Workstation: 16GB DDR4 (~50% free), NVMe, i5-8250U (most cores were idle).

I used two methods:

CProfile and flamegraphs

Elixir modification that uses cProfile for profiling can be seen here.

Note that /usr/local/elixir/profiles/ directory needs to be writeable by www-data for this to work.

Flamegraphs can be created from prof files using flameprof and FlameGraph.
Example: flameprof --format=log 30931.prof | perl ~/apps/src/FlameGraph/flamegraph.pl > 30931.svg.

I created three flamegraphs, one for the biggest identifier, one for a relatively big source file and one for a big directory tree:
https://fbstc.org/scrap/uboot-u32-ident.svg
https://fbstc.org/scrap/uboot-lib-kconfig.svg
https://fbstc.org/scrap/uboot-include-tree.svg

In all three, a lot happens before web.py - I think these are imports. In the source framegraph, guess_lexer_for_filename takes the majority of time spent in web.py.
Of course these are just three random flamegraph samples - not enough information to make a fully informed decision. Still, some patterns can be seen. I guess one could ask whether the flamegraphs would be similar on production.

My own profiler

I wrote my own simple profiler that logs execution time of selected blocks and functions. Statistical methodology is probably dubious (results are averages of percentages of total request time), but I mainly needed this as a sanity check for the flamegraphs (some information in there was suspicious).

Elixir version that uses this profiler can be seen here.

Results from a short, pseudo-random ("clicking around") request sample can be seen here

Results are divided into categories based on route (and file type in case of the source route).
See calls to measure_function and measure_block in web.py for meaning of names of the measure points.
Numbers represent the average percentage of total script execution time (measured from the first import to the end of handle_request) over all requests from that category.
The report was generated using analyze_simple_profiler_logs.py

Conclusions

We could of course dig deeper and make more accurate measurements, but I think the conclusions are clear and actionable:

  1. CGI forces us to redo all imports for each request. This is not only redundant, imports take a significant execution time. AFAIK this shouldn't be a problem with WSGI/UWSGI/ASGI because the architecture is different - the application is initialized once, and then it just listens for requests. Requests are processed by a long-running process.
  2. guess_lexer_for_filename takes more time than code formatting itself. It seems that it's because Pygments imports some lexers that didn't match the filename, until it finds the matching lexer. Maybe this could be reported and fixed, if we investigate a bit more. But again, imports are cached, so this shouldn't be a problem if we move to WSGI.

@tleb
Copy link
Member Author

tleb commented Jul 24, 2024

Ah, so even the guess_lexer_for_filename that looks like slow code is actually slow imports as well. Let's wait for the switch to WSGI then we'll do a second round of profiling. Cool insight you found here, thanks!

@tpetazzoni
Copy link

Thanks for all the investigation. What is holding us from switching over to WSGI ?

@fstachura
Copy link
Collaborator

What is holding us from switching over to WSGI ?

Elixir still uses global state and we need to get rid of that first. Most, if not all, should be gone with recent web.py refactorings.

@tpetazzoni
Copy link

Amazing! Then I'd say keep up the good work in this direction, so that we can move to WSGI in the near future.

@tleb
Copy link
Member Author

tleb commented Nov 8, 2024

I'd say WSGI and the follow-up perf improvements (00a160a, d946818, 7df3543) solved this issue. The prod server hasn't crashed since then, and response times are much lower than before (to the point that it can be hard to guess if pages come from the cache or are live generated when using the website).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants