Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conduit-lwt-unix: se accept_n on the server #387

Closed
wants to merge 11 commits into from
Closed

conduit-lwt-unix: se accept_n on the server #387

wants to merge 11 commits into from

Conversation

mseri
Copy link

@mseri mseri commented Apr 22, 2021

In benchmarks for cohttp-lwt-unix' latency we see evidence of connections starvation.
Using accept_n seems to improve the situation. You can see here an example of the artificial benchmarks that I used to check this

Ping @avsm

@@ -10,6 +10,7 @@ val process_accept :
unit Lwt.t

val init :
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably document this, I was first waiting to see if you are interested at all and what the review says about the implementation

@mseri
Copy link
Author

mseri commented Apr 22, 2021

Adding the yield call greatly improves the latency, but apparently at a cost on lost connections.
In the picture below, the green line is accept_n+yield from e490592, the red one is the current conduit, the blue is accept_n where I delete all code related to set_max_active and the orange one is accept_n without the extra yielding (cdeda3c).

latency

| `Accepted connections ->
Lwt_list.iter_p
(fun v ->
connected ();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may also be a race to update connected here since I am calling iter_p, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite see the race you're referring to here -- where is the iter_p?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken iter_p runs the iteration in parallel, and connected () increments the active connection count without any protection (let connected () = incr active). My fear was that multiple connections executed in parallel could try to increment the variable simultaneously (and I don't want to put a mutex there).

@mseri
Copy link
Author

mseri commented May 17, 2021

I have updated the test, using the test server suggested in inhabitedtype/httpaf#200 and the wrk2 call used there.
The new results are in https://gist.github.com/mseri/d40130fef7c8673e55b9e337f411c56d and summarized below

latency

In practice, it seems that using accept_n does improve the latency with respect both to the current implementation and the one with the extra yield call.

I would be curious to see if this makes any difference on linux (I am testing on a macbook pro)

Copy link
Member

@avsm avsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an improvement to me for sure. I'd like to run tests on Linux before merging it just to be quantify the win.

@mseri
Copy link
Author

mseri commented May 21, 2021

If somebody can benchmark it on linux, can you also check what difference it does with and without cf7c62f ?

@anuragsoni
Copy link

@mseri If somebody can benchmark it on linux, can you also check what difference it does with and without cf7c62f ?

I gave this a run on linux today. I did a set of different tests. I first tried running the tests through cohttp + conduit and took the example server from your gist. I didn't notice a big difference between the current released conduit and the version with accept_n. I then tried to use conduit for my own WIP http 1.1 server implementation and tried the following combinations:

All servers run a service of the form req -> res Lwt.t

  • Server with upstream conduit where handler returns ASAP

  • Server with accept_n where handler returns ASAP
    With these two servers there wasn't a big difference between accept_n and accept.

  • Server with upstream conduit where handler uses Lwt.pause before responding

  • Server with accept_n where handler uses Lwt.pause before responding
    In these two servers I noticed improvement in the accept_n branch vs the current released conduit.

All of the data can be found at https://gist.github.com/anuragsoni/33ae4a750dd7e110cbe94ed69614b7d5
Do let me know if something looks missing there and i'd be happy to run more tests.

As an aside (and this is most likely off-topic for conduit and maybe more a discussion about how lwt scheduler works?), another potential data point was when i tried running a test that can simulate handlers that do a little bit of work. I added a sleep of 1 millisecond in the handler before it responds. With this change i noticed a big spike in latency and the numbers went up from milliseconds to seconds. https://gist.github.com/anuragsoni/33ae4a750dd7e110cbe94ed69614b7d5#file-h1_conduit_accept_n_1ms_delay-txt

I've been running some benchmarks against servers in other languages and i don't see a similar spike when i add a millisecond delay in a handler for a tokio based rust web server, and I don't see a similar spike when I run my h1 server on async instead of lwt. The rust and h1-async results can be seen at https://github.com/anuragsoni/h1/tree/3c16f5eccf4a17c1009814a5178537fb50e5aaad/wrk_bench
I'm still early in my experiments so I'm not a 100% sure about what causes these spikes for Lwt, but i'll spend some more time to see if I can get more information about what's happening!

@mseri
Copy link
Author

mseri commented May 31, 2021

Dear @anuragsoni thanks a lot for this!

The idea of adding the millisecond pause is brilliant.

There are a number of tickets on this latency issue, if would be great if we can figure out how to improve/fix it.

@anuragsoni
Copy link

@mseri There are a number of tickets on this latency issue, if would be great if we can figure out how to improve/fix it.

For sure! I remember seeing some tickets on cohttp and httpaf, are those the tickets you are referring to?

I'm wondering if some of the latency related investigation/work will be needed at a layer below that? I've only recently started using Lwt seriously, but the fact that I'm seeing the large latency spike only on lwt and not on async leads me to believe that it'd be worthwhile to see if we can come up with a small reproduction case that's independent of cohttp/conduit etc to see if the issue lies somewhere in Lwt. When i tested the 1ms delay on async with my h1 library I saw a similar spike in latency when i was using the higher level Reader/Writer modules that do their own buffering and scheduling for writes. I noticed pretty significant improvements when i switched to using async's file_descr/socket apis directly to orchestrate my own read/writes and do my own buffering. I was hoping to see similar improvements on my lwt backend and I do see improvements when the handler doesn't do any work, but with the 1ms delay going down to read/write calls with Lwt_unix.file_descr's still shows spikes in latency that are similar to when I was testing with the Lwt_io managed buffered channels.

Do note that everything I said might very well be wrong as I'm still exploring the lwt api and learning more about it :D

@mseri
Copy link
Author

mseri commented Jun 1, 2021

What you see is reasonable and in line with some of the comments in the tickets I am referring to. Give me a day and I will link them here

@mseri
Copy link
Author

mseri commented Jun 1, 2021

Here we go, the latency issue is discussed in

as well as httpaf#200 and eliom#270 -- ping also @smorimoto that made some tests there --

It can be mitigated a bit manipulating

    Conduit_lwt_unix.set_max_active 100;
    Lwt_io.set_default_buffer_size 0x10000

and, as you observed, by using accept_n and yielding there.
There may also be a relation to Possible memory leak in client.read_response cohttp#688 – which, in fact, goes hand in hand with another issue in Lwt, closed for lack of manpower.

@mseri
Copy link
Author

mseri commented Jun 30, 2021

There is a new benchmark showing how httpaf crushes cohttp on unilernels: robur-coop/unipi#4 (comment)

@anuragsoni
Copy link

There is a new benchmark showing how httpaf crushes cohttp on unilernels: roburio/unipi#4 (comment)

Some more benchmarks comparing various http server implementations at https://github.com/ocaml-multicore/retro-httpaf-bench/

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants