Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Workaround: avoid sleep(1) call by crypt_r
Using an off-cpu flamegraph I identified that concurrent PAM calls are slow due to a call to `sleep(1)`. `pam_authenticate` calls `crypt_r` which calls `NSSLOW_Init` which on first use will try to initialize the just `dlopen`-ed library. If it encounters a race condition it does a `sleep(1)`. This race condition can be quite reliably reproduced when performing a lot of PAM authentications from multiple threads in parallel. GDB can also be used to confirm this by putting a breakpoint on `sleep`: ``` #0 __sleep (seconds=seconds@entry=1) at ../sysdeps/unix/sysv/linux/sleep.c:42 #1 0x00007ffff1548e22 in freebl_RunLoaderOnce () at lowhash_vector.c:122 #2 0x00007ffff1548f31 in freebl_InitVector () at lowhash_vector.c:131 #3 NSSLOW_Init () at lowhash_vector.c:148 #4 0x00007ffff1b8f09a in __sha512_crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=0x7ffff31e17b8 "dIJbsXKc0", xapi-project#5 0x00007ffff1b8d070 in __crypt_r (key=key@entry=0x7fffd8005a60 "pamtest-edvint", salt=<optimized out>, xapi-project#6 0x00007ffff1dc9abc in verify_pwd_hash (p=p@entry=0x7fffd8005a60 "pamtest-edvint", hash=<optimized out>, nullok=nullok@entry=0) at passverify.c:111 xapi-project#7 0x00007ffff1dc9139 in _unix_verify_password (pamh=pamh@entry=0x7fffd8002910, name=0x7fffd8002ab0 "pamtest-edvint", p=0x7fffd8005a60 "pamtest-edvint", ctrl=ctrl@entry=8389156) at support.c:777 xapi-project#8 0x00007ffff1dc6556 in pam_sm_authenticate (pamh=0x7fffd8002910, flags=<optimized out>, argc=<optimized out>, argv=<optimized out>) at pam_unix_auth.c:178 xapi-project#9 0x00007ffff7bcef1a in _pam_dispatch_aux (use_cached_chain=<optimized out>, resumed=<optimized out>, h=<optimized out>, flags=1, pamh=0x7fffd8002910) at pam_dispatch.c:110 xapi-project#10 _pam_dispatch (pamh=pamh@entry=0x7fffd8002910, flags=1, choice=choice@entry=1) at pam_dispatch.c:426 xapi-project#11 0x00007ffff7bce7e0 in pam_authenticate (pamh=0x7fffd8002910, flags=flags@entry=1) at pam_auth.c:34 xapi-project#12 0x00000000005ae567 in XA_mh_authorize (username=username@entry=0x7fffd80028d0 "pamtest-edvint", password=password@entry=0x7fffd80028f0 "pamtest-edvint", error=error@entry=0x7ffff31e1be8) at xa_auth.c:83 xapi-project#13 0x00000000005adf20 in stub_XA_mh_authorize (username=<optimized out>, password=<optimized out>) at xa_auth_stubs.c:42 xapi-project#14 0x00000000004a0a6a in camlDune__exe__Bench_pam__pam_authenticate$27_320 () at ocaml/tests/bench/pam/bench_pam.ml:63 xapi-project#15 0x00000000004a1113 in camlEzbechamel_concurrent__worker_loop_359 () at ocaml/tests/bench/lib/concurrent/ezbechamel_concurrent.ml:36 xapi-project#16 0x00000000005935b9 in camlStdlib__Fun__protect_317 () xapi-project#17 0x00000000004a1955 in camlThread__fun_850 () xapi-project#18 0x00000000005d6401 in caml_start_program () xapi-project#19 0x00000000005cd0fd in caml_callback_exn () xapi-project#20 0x00000000005af810 in caml_thread_start () xapi-project#21 0x00007ffff79b7e25 in start_thread (arg=0x7ffff31e2700) at pthread_create.c:308 xapi-project#22 0x00007ffff71dbbad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 ``` `pam_start` and `pam_end` doesn't help here, because on `pam_end` the library is `dlclose`-ed, so on next `pam_authenticate` it will have to go through the initialization code again. (This initialization code would've belonged into `pam_start`, not `pam_authenticate`, but there are several layers here including a call to `crypt_r`). To avoid this link with `libcrypt` and call `crypt_r` once ourselves (and ensure it loads `libfreeblpriv3` by using the sha512 prefix). That way the library will stay loaded (we'll hold a reference count on it), and the `dlclose` done by PAM won't unload it. Confirmed that there are no `sleep` calls now, and the results are also visible when running the benchmark targeted to the with and without fix code: ``` ╭─────────────────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ concurrent authenticate (sleep fix, actual):8 │ 0.0000 mjw/run│ 50.0000 mnw/run│ 27043467.0000 ns/run│ ╰─────────────────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ ╭────────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├────────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ concurrent authenticate (no reuse):8 │ 0.0000 mjw/run│ 50.0000 mnw/run│ 1029831372.0000 ns/run│ ╰────────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ ``` Without this fix using 2 threads to perform PAM authentication would result in a 38x slowdown compared to using no threads at all (which is what XAPI currently does). Signed-off-by: Edwin Török <[email protected]>
- Loading branch information