Skip to content

Commit

Permalink
Benchmark: add a concurrent fixed work benchmark
Browse files Browse the repository at this point in the history
As long as you use number of threads <= number of CPUs the amount of time taken by 'fixed work' should be the same.
However it may be more if there is overhead in dispatching work from the OCaml side.

Results on `Intel(R) Xeon(R) CPU E3-1230 v6 @ 3.50GHz` (8 CPUs):
```
│  fixedwork/concurrent fixedwork:1             │             0.0000 mjw/run│             3.6585 mnw/run│       12831863.4328 ns/run│
│  fixedwork/concurrent fixedwork:16            │             0.0000 mjw/run│             6.5217 mnw/run│       45015232.3024 ns/run│
│  fixedwork/concurrent fixedwork:2             │             0.0000 mjw/run│             3.8462 mnw/run│       14234923.1372 ns/run│
│  fixedwork/concurrent fixedwork:4             │             0.0000 mjw/run│             4.2857 mnw/run│       16573979.6790 ns/run│
│  fixedwork/concurrent fixedwork:8             │             0.0000 mjw/run│             4.8387 mnw/run│       21940491.7677 ns/run│
│  fixedwork/fixedwork                          │             0.0000 mjw/run│             2.6316 mnw/run│       12746205.5882 ns/run│
```

Overhead with 8 is quite significant already: ~70%, and even 4 threads has 30% overhead.
This machine had turbo enabled.

After disabling turbo (and working around the bug in `xenpm` which requires rerunning `set-scaling-governor` after `disable-turbo-mode`):
```
╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮
│name                                 │  major-allocated          │  minor-allocated          │  monotonic-clock          │
├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤
│  fixedwork/concurrent fixedwork:1   │             0.0000 mjw/run│             3.8462 mnw/run│       13525498.9640 ns/run│
│  fixedwork/concurrent fixedwork:16  │             0.0000 mjw/run│             7.1429 mnw/run│       49291752.0987 ns/run│
│  fixedwork/concurrent fixedwork:2   │             0.0000 mjw/run│             3.8462 mnw/run│       14284943.0644 ns/run│
│  fixedwork/concurrent fixedwork:4   │             0.0000 mjw/run│             4.5455 mnw/run│       19029750.6638 ns/run│
│  fixedwork/concurrent fixedwork:8   │             0.0000 mjw/run│             5.1724 mnw/run│       24823535.6315 ns/run│
│  fixedwork/fixedwork                │             0.0000 mjw/run│             2.7273 mnw/run│       13468350.0593 ns/run│
╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯
```

This machine isn't really suitable for benchmarking how XAPI scales: thread switching overhead is too high, and is not what is seen on the other machine.

Using 16 workers results in a massive slowdown as expected.

Results on `Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz`:
```
│  fixedwork/concurrent fixedwork:1             │             0.0000 mjw/run│             4.2857 mnw/run│       18232376.7910 ns/run│
│  fixedwork/concurrent fixedwork:16            │             0.0000 mjw/run│             4.5455 mnw/run│       19158780.9472 ns/run│
│  fixedwork/concurrent fixedwork:2             │             0.0000 mjw/run│             4.2857 mnw/run│       18165547.7630 ns/run│
│  fixedwork/concurrent fixedwork:4             │             0.0000 mjw/run│             4.5455 mnw/run│       18204930.7199 ns/run│
│  fixedwork/concurrent fixedwork:8             │             0.0000 mjw/run│             4.5455 mnw/run│       18293562.4699 ns/run│
│  fixedwork/fixedwork                          │             0.0000 mjw/run│             3.0612 mnw/run│       18095615.8910 ns/run│
```

Using 16 workers is fine here, this Dom0 has 16 vCPUs, and the overhead is ~5% compared to the single threaded case.

Signed-off-by: Edwin Török <[email protected]>
  • Loading branch information
edwintorok committed Oct 9, 2023
1 parent e4a5004 commit d303123
Showing 1 changed file with 12 additions and 2 deletions.
14 changes: 12 additions & 2 deletions ocaml/tests/bench/test_basics/ezbechamel_basics.ml
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,19 @@ let parallel_c_work () =

let args = [1; 2; 4; 8; 16]

open Ezbechamel_concurrent

let () =
Ezbechamel_alcotest_notty.run
[
Test.make ~name:"overhead" (Staged.stage ignore)
; Test.make ~name:"fixedwork" (Staged.stage parallel_c_work)
; Test.make_grouped ~name:"fixedwork"
[
Test.make ~name:"fixedwork" (Staged.stage parallel_c_work)
; test_concurrently ~allocate:ignore ~free:ignore
~name:"concurrent fixedwork"
Staged.(stage parallel_c_work)
]
; Test.make_indexed ~name:"Thread create/join" ~args (fun n ->
Staged.stage @@ fun () ->
let threads = Array.init n @@ Thread.create ignore in
Expand All @@ -60,7 +68,9 @@ let () =
; test_barrier (module BarrierBinary)
; test_barrier (module BarrierCounting)
; test_barrier (module BarrierBinaryArray)
; Ezbechamel_concurrent.test_concurrently ~allocate:ignore ~free:ignore ~name:"concurrent workers" Staged.(stage ignore)
; test_concurrently ~allocate:ignore ~free:ignore
~name:"concurrent workers"
Staged.(stage ignore)
; test_barrier (module BarrierYield)
]
)
Expand Down

0 comments on commit d303123

Please sign in to comment.