Skip to content

Commit

Permalink
worker: Provide a sensible default for "critical load threshold"
Browse files Browse the repository at this point in the history
Based on our experiences various issues with tests can appear if the
system load is exceeding certain sane levels, for example typing issues
such as mistyping, hanging keys, lost characters, but also various
timeouts and even network outages.

This commit introduces a new default value enabling the recently
introduced check for "critical load" on an openQA worker to prevent an
openQA worker from starting new jobs until the system load is again
below a threshold.

The value selection is of course based on empirical data gathered so far
and can and should be tweaked if we know more.

Related progress issue: https://progress.opensuse.org/issues/158125
  • Loading branch information
okurz committed Apr 8, 2024
1 parent 4dff9b3 commit 34ed70d
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 1 deletion.
5 changes: 4 additions & 1 deletion etc/openqa/workers.ini
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,10 @@
# Specifies the threshold to consider the load on the machine critical. If the
# average load (over the last 15 minutes) exceeds the specified value the worker
# will not accept new jobs (until the load decreases again).
#CRITICAL_LOAD_AVG_THRESHOLD = 10
# The default value is 40 to prevent system overload based on experiences with
# system stability so far.
# Set to 0 to disable.
#CRITICAL_LOAD_AVG_THRESHOLD = 40

# The section ids are the instance of the workers.
# The key/value pairs will appear in vars.json
Expand Down
4 changes: 4 additions & 0 deletions lib/OpenQA/Worker/Settings.pm
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,10 @@ sub new ($class, $instance_number = undef, $cli_options = {}) {
}
}

# Select sensible system CPU load15 threshold to prevent system overload
# based on experiences with system stability so far
$global_settings{CRITICAL_LOAD_AVG_THRESHOLD} //= 40;

# set some environment variables
# TODO: This should be sent to the scheduler to be included in the worker's table.
if (defined $instance_number) {
Expand Down
3 changes: 3 additions & 0 deletions t/24-worker-settings.t
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ my $settings = OpenQA::Worker::Settings->new;
is_deeply(
$settings->global_settings,
{
CRITICAL_LOAD_AVG_THRESHOLD => 40,
GLOBAL => 'setting',
WORKER_HOSTNAME => '127.0.0.1',
LOG_LEVEL => 'test',
Expand Down Expand Up @@ -82,6 +83,7 @@ subtest 'instance-specific settings' => sub {
is_deeply(
$settings1->global_settings,
{
CRITICAL_LOAD_AVG_THRESHOLD => 40,
GLOBAL => 'setting',
WORKER_HOSTNAME => '127.0.0.1',
WORKER_CLASS => 'qemu_i386,qemu_x86_64',
Expand All @@ -96,6 +98,7 @@ subtest 'instance-specific settings' => sub {
is_deeply(
$settings2->global_settings,
{
CRITICAL_LOAD_AVG_THRESHOLD => 40,
GLOBAL => 'setting',
WORKER_HOSTNAME => '127.0.0.1',
WORKER_CLASS => 'qemu_aarch64',
Expand Down

0 comments on commit 34ed70d

Please sign in to comment.