Benchmark: add a concurrent fixed work benchmark · edwintorok/xen-api@d303123

Commit

Benchmark: add a concurrent fixed work benchmark

As long as you use number of threads <= number of CPUs the amount of time taken by 'fixed work' should be the same.
However it may be more if there is overhead in dispatching work from the OCaml side.

Results on `Intel(R) Xeon(R) CPU E3-1230 v6 @ 3.50GHz` (8 CPUs):
```
│  fixedwork/concurrent fixedwork:1             │             0.0000 mjw/run│             3.6585 mnw/run│       12831863.4328 ns/run│
│  fixedwork/concurrent fixedwork:16            │             0.0000 mjw/run│             6.5217 mnw/run│       45015232.3024 ns/run│
│  fixedwork/concurrent fixedwork:2             │             0.0000 mjw/run│             3.8462 mnw/run│       14234923.1372 ns/run│
│  fixedwork/concurrent fixedwork:4             │             0.0000 mjw/run│             4.2857 mnw/run│       16573979.6790 ns/run│
│  fixedwork/concurrent fixedwork:8             │             0.0000 mjw/run│             4.8387 mnw/run│       21940491.7677 ns/run│
│  fixedwork/fixedwork                          │             0.0000 mjw/run│             2.6316 mnw/run│       12746205.5882 ns/run│
```

Overhead with 8 is quite significant already: ~70%, and even 4 threads has 30% overhead.
This machine had turbo enabled.

After disabling turbo (and working around the bug in `xenpm` which requires rerunning `set-scaling-governor` after `disable-turbo-mode`):
```
╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮
│name                                 │  major-allocated          │  minor-allocated          │  monotonic-clock          │
├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤
│  fixedwork/concurrent fixedwork:1   │             0.0000 mjw/run│             3.8462 mnw/run│       13525498.9640 ns/run│
│  fixedwork/concurrent fixedwork:16  │             0.0000 mjw/run│             7.1429 mnw/run│       49291752.0987 ns/run│
│  fixedwork/concurrent fixedwork:2   │             0.0000 mjw/run│             3.8462 mnw/run│       14284943.0644 ns/run│
│  fixedwork/concurrent fixedwork:4   │             0.0000 mjw/run│             4.5455 mnw/run│       19029750.6638 ns/run│
│  fixedwork/concurrent fixedwork:8   │             0.0000 mjw/run│             5.1724 mnw/run│       24823535.6315 ns/run│
│  fixedwork/fixedwork                │             0.0000 mjw/run│             2.7273 mnw/run│       13468350.0593 ns/run│
╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯
```

This machine isn't really suitable for benchmarking how XAPI scales: thread switching overhead is too high, and is not what is seen on the other machine.

Using 16 workers results in a massive slowdown as expected.

Results on `Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz`:
```
│  fixedwork/concurrent fixedwork:1             │             0.0000 mjw/run│             4.2857 mnw/run│       18232376.7910 ns/run│
│  fixedwork/concurrent fixedwork:16            │             0.0000 mjw/run│             4.5455 mnw/run│       19158780.9472 ns/run│
│  fixedwork/concurrent fixedwork:2             │             0.0000 mjw/run│             4.2857 mnw/run│       18165547.7630 ns/run│
│  fixedwork/concurrent fixedwork:4             │             0.0000 mjw/run│             4.5455 mnw/run│       18204930.7199 ns/run│
│  fixedwork/concurrent fixedwork:8             │             0.0000 mjw/run│             4.5455 mnw/run│       18293562.4699 ns/run│
│  fixedwork/fixedwork                          │             0.0000 mjw/run│             3.0612 mnw/run│       18095615.8910 ns/run│
```

Using 16 workers is fine here, this Dom0 has 16 vCPUs, and the overhead is ~5% compared to the single threaded case.

Signed-off-by: Edwin Török <[email protected]>

Loading branch information

edwintorok committed Oct 9, 2023

1 parent e4a5004 commit d303123

ocaml/tests/bench/test_basics/ezbechamel_basics.ml

-Original file line number
+Diff line change
@@ Expand Up / @@ -32,11 +32,19 @@ let parallel_c_work () = @@
     let args = [1; 2; 4; 8; 16]
+    open Ezbechamel_concurrent
     let () =
       Ezbechamel_alcotest_notty.run
         [
           Test.make ~name:"overhead" (Staged.stage ignore)
-        ; Test.make ~name:"fixedwork" (Staged.stage parallel_c_work)
+        ; Test.make_grouped ~name:"fixedwork"
+            [
+              Test.make ~name:"fixedwork" (Staged.stage parallel_c_work)
+            ; test_concurrently ~allocate:ignore ~free:ignore
+                ~name:"concurrent fixedwork"
+                Staged.(stage parallel_c_work)
+            ]
         ; Test.make_indexed ~name:"Thread create/join" ~args (fun n ->
               Staged.stage @@ fun () ->
               let threads = Array.init n @@ Thread.create ignore in
@@ Expand All / @@ -60,7 +68,9 @@ let () = @@
             ; test_barrier (module BarrierBinary)
             ; test_barrier (module BarrierCounting)
             ; test_barrier (module BarrierBinaryArray)
-            ; Ezbechamel_concurrent.test_concurrently ~allocate:ignore ~free:ignore ~name:"concurrent workers" Staged.(stage ignore)
+            ; test_concurrently ~allocate:ignore ~free:ignore
+                ~name:"concurrent workers"
+                Staged.(stage ignore)
             ; test_barrier (module BarrierYield)
             ]
           )
@@ Expand Down @@

0 comments on commit `d303123`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `d303123`

Commit

There are no files selected for viewing

0 comments on commit d303123

0 comments on commit `d303123`