Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid repeated os.SetEnv calls in container init. #1983

Closed

Conversation

brandon-mabey
Copy link

In a Kubernetes cluster with a large number of services (>5000, causing a large amount of environment variables for the container) with guaranteed pods, the runc init process can take a long time to run. This is caused by two issues:

  1. runc init calls os.SetEnv for every environment variable in the bundle, which can be expensive due to the underlying mutex locking and syscall logic.
  2. The cgroup is limited before the runc init process runs. This is outside the perview of runc I believe.

This PR targets the former by removing the os.SetEnv calls, which causes the runc init process to run much faster in cases where there are a large number of configured environment variables in the bundle.

For containers with a large number of configured environment variables
(>5000), os.SetEnv can make container creation take a much longer time
than in normal circumstances. This is due to os.SetEnv locking and
unlocking a mutex on every environment variable added.

By removing the os.SetEnv calls during
container creation, and instead controlling the list of environment
variables manually, this overhead can be removed.

Signed-off-by: Brandon Mabey <[email protected]>

// Certain functions that are used later, such as exec.LookPath, require that the PATH
// environment variable is setup with the container's PATH.
os.Setenv("PATH", mappedEnv["PATH"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you be more specific? Wondering it may cause security issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This does not causes any security issues because this is what was done before the patch, too.
  2. This appears to be an error because mappedEnv["PATH"] contains, for example, PATH=/usr/bin:/usr/sbin (rather than /usr/bin:/usr/sbin. As a result, the first PATH element will be wrong.

Comment on lines +352 to +353
if _, ok := environmentMap["HOME"]; !ok {
environmentMap["HOME"] = "HOME" + "=" + execUser.Home
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, it's easier to just set HOME the old way (like it's done with PATH).

@kolyshkin
Copy link
Contributor

This looks interesting. Would be nice to add a benchmark.

@kolyshkin
Copy link
Contributor

Did some quick benchmark with 5000 env vars, comparing os.Setenv with passing env via cmd.Env. The latter is indeed faster!

name     old time/op    new time/op    delta
Exec-20    4.71ms ±33%    2.08ms ±13%  -55.88%  (p=0.000 n=10+9)

name     old alloc/op   new alloc/op   delta
Exec-20    1.43MB ± 0%    0.46MB ± 0%  -67.83%  (p=0.000 n=10+9)

name     old allocs/op  new allocs/op  delta
Exec-20     5.19k ± 0%     0.03k ± 0%  -99.46%  (p=0.000 n=10+10)

Will carry this one.

kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jun 24, 2024
Current implementation sets all the environment variables passed in
Process.Env in the current process, then uses os.Environ to read
those back.

As pointed out in [1], this is slow, as runc calls os.Setenv for
every variable, and there may be a few thousands of those.

Looking into why it was implemented, I found commit 9744d72 and traced
it to [2], which discusses the actual reasons. At the time were:

 - HOME is not passed into container as it is set in setupUser by
   os.Setenv and has no effect on config.Env;
 - there is no deduplication of environment variables.

Yet it was decided to not go ahead with this patch, but later [3] was
merged with the carry of this patch.

Now, from what I see:

1. Passing environment to exec is way faster than using os.Setenv and
   os.Environment() (tests show ~20x faster in simple Go test, and 2x
   faster in real-world test, see below).
2. Setting environment variables in the runc context can result is ugly
   side effects (think GODEBUG).
3. Nothing in runtime spec says that the environment needs to be
   deduplicated, or the order of preference (whether the first or the
   last value of a variable with the same name is to be used). In C
   (Linux/glibc), the first value is used. In Go, it's the last one.
   We should probably stick to what we have in order to maintain
   backward compatibility.

This patch:
 - switches to passing env directly to exec;
 - adds deduplication mechanism to retain backward compatibility;
 - sets PATH from process.Env in the current process;
 - adds HOME to process.Env if not set;
 - removes os.Clearenv call as it's no longer needed.

The benchmark added by the previous commit shows 2x improvement:

name             old time/op    new time/op    delta
ExecInBigEnv-20    60.2ms ± 2%    27.4ms ±20%  -54.42%  (p=0.000 n=8+9)

The remaining questions are:
 - are there any potential regressions (for example, from not setting
   values from process.Env to the current process);
 - should deduplication show warnings (maybe promoted to errors later);
 - whether a default for PATH (e.g "/bin:/usr/bin" should be added,
   when PATH is not set.

[1] opencontainers#1983
[2] docker-archive/libcontainer#418
[3] docker-archive/libcontainer#432

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jun 24, 2024
The current implementation sets all the environment variables passed in
Process.Env in the current process, one by one, then uses os.Environ to
read those back.

As pointed out in [1], this is slow, as runc calls os.Setenv for every
variable, and there may be a few thousands of those. Looking into how
os.Setenv is implemented, it is indeed slow, especially when cgo is
enabled.

Looking into why it was implemented, I found commit 9744d72 and traced
it to [2], which discusses the actual reasons. At the time were:

 - HOME is not passed into container as it is set in setupUser by
   os.Setenv and has no effect on config.Env;
 - there is no deduplication of environment variables.

Yet it was decided to not go ahead with this patch, but later [3] was
merged with the carry of this patch.

Now, from what I see:

1. Passing environment to exec is way faster than using os.Setenv and
   os.Environment() (tests show ~20x faster in simple Go test, and 2x
   faster in real-world test, see below).
2. Setting environment variables in the runc context can result is ugly
   side effects (think GODEBUG).
3. Nothing in runtime spec says that the environment needs to be
   deduplicated, or the order of preference (whether the first or the
   last value of a variable with the same name is to be used). In C
   (Linux/glibc), the first value is used. In Go, it's the last one.
   We should probably stick to what we have in order to maintain
   backward compatibility.

This patch:
 - switches to passing env directly to exec;
 - adds deduplication mechanism to retain backward compatibility;
 - sets PATH from process.Env in the current process;
 - adds HOME to process.Env if not set;
 - removes os.Clearenv call as it's no longer needed.

The benchmark added by the previous commit shows 2x improvement:

name             old time/op    new time/op    delta
ExecInBigEnv-20    60.2ms ± 2%    27.4ms ±20%  -54.42%  (p=0.000 n=8+9)

The remaining questions are:
 - are there any potential regressions (for example, from not setting
   values from process.Env to the current process);
 - should deduplication show warnings (maybe promoted to errors later);
 - whether a default for PATH (e.g "/bin:/usr/bin" should be added,
   when PATH is not set.

[1]: opencontainers#1983
[2]: docker-archive/libcontainer#418
[3]: docker-archive/libcontainer#432

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor

Closing in favor of #4325

@kolyshkin kolyshkin closed this Jun 24, 2024
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jun 27, 2024
The current implementation sets all the environment variables passed in
Process.Env in the current process, one by one, then uses os.Environ to
read those back.

As pointed out in [1], this is slow, as runc calls os.Setenv for every
variable, and there may be a few thousands of those. Looking into how
os.Setenv is implemented, it is indeed slow, especially when cgo is
enabled.

Looking into why it was implemented, I found commit 9744d72 and traced
it to [2], which discusses the actual reasons. At the time were:

 - HOME is not passed into container as it is set in setupUser by
   os.Setenv and has no effect on config.Env;
 - there is no deduplication of environment variables.

Yet it was decided to not go ahead with this patch, but later [3] was
merged with the carry of this patch.

Now, from what I see:

1. Passing environment to exec is way faster than using os.Setenv and
   os.Environment() (tests show ~20x faster in simple Go test, and 2x
   faster in real-world test, see below).
2. Setting environment variables in the runc context can result is ugly
   side effects (think GODEBUG).
3. Nothing in runtime spec says that the environment needs to be
   deduplicated, or the order of preference (whether the first or the
   last value of a variable with the same name is to be used). In C
   (Linux/glibc), the first value is used. In Go, it's the last one.
   We should probably stick to what we have in order to maintain
   backward compatibility.

This patch:
 - switches to passing env directly to exec;
 - adds deduplication mechanism to retain backward compatibility;
 - sets PATH from process.Env in the current process;
 - adds HOME to process.Env if not set;
 - removes os.Clearenv call as it's no longer needed.

The benchmark added by the previous commit shows 2x improvement:

> name             old time/op    new time/op    delta
> ExecInBigEnv-20    61.7ms ± 4%    24.9ms ±14%  -59.73%  (p=0.000 n=10+10)

The remaining questions are:
 - are there any potential regressions (for example, from not setting
   values from process.Env to the current process);
 - should deduplication show warnings (maybe promoted to errors later);
 - whether a default for PATH (e.g "/bin:/usr/bin" should be added,
   when PATH is not set.

[1]: opencontainers#1983
[2]: docker-archive/libcontainer#418
[3]: docker-archive/libcontainer#432

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jun 27, 2024
The current implementation sets all the environment variables passed in
Process.Env in the current process, one by one, then uses os.Environ to
read those back.

As pointed out in [1], this is slow, as runc calls os.Setenv for every
variable, and there may be a few thousands of those. Looking into how
os.Setenv is implemented, it is indeed slow, especially when cgo is
enabled.

Looking into why it was implemented, I found commit 9744d72 and traced
it to [2], which discusses the actual reasons. At the time were:

 - HOME is not passed into container as it is set in setupUser by
   os.Setenv and has no effect on config.Env;
 - there is no deduplication of environment variables.

Yet it was decided to not go ahead with this patch, but later [3] was
merged with the carry of this patch.

Now, from what I see:

1. Passing environment to exec is way faster than using os.Setenv and
   os.Environment() (tests show ~20x faster in simple Go test, and 2x
   faster in real-world test, see below).
2. Setting environment variables in the runc context can result is ugly
   side effects (think GODEBUG).
3. Nothing in runtime spec says that the environment needs to be
   deduplicated, or the order of preference (whether the first or the
   last value of a variable with the same name is to be used). In C
   (Linux/glibc), the first value is used. In Go, it's the last one.
   We should probably stick to what we have in order to maintain
   backward compatibility.

This patch:
 - switches to passing env directly to exec;
 - adds deduplication mechanism to retain backward compatibility;
 - sets PATH from process.Env in the current process;
 - adds HOME to process.Env if not set;
 - removes os.Clearenv call as it's no longer needed.

The benchmark added by the previous commit shows 2x improvement:

> name             old time/op    new time/op    delta
> ExecInBigEnv-20    61.7ms ± 4%    24.9ms ±14%  -59.73%  (p=0.000 n=10+10)

The remaining questions are:
 - are there any potential regressions (for example, from not setting
   values from process.Env to the current process);
 - should deduplication show warnings (maybe promoted to errors later);
 - whether a default for PATH (e.g "/bin:/usr/bin" should be added,
   when PATH is not set.

[1]: opencontainers#1983
[2]: docker-archive/libcontainer#418
[3]: docker-archive/libcontainer#432

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jul 3, 2024
The current implementation sets all the environment variables passed in
Process.Env in the current process, one by one, then uses os.Environ to
read those back.

As pointed out in [1], this is slow, as runc calls os.Setenv for every
variable, and there may be a few thousands of those. Looking into how
os.Setenv is implemented, it is indeed slow, especially when cgo is
enabled.

Looking into why it was implemented, I found commit 9744d72 and traced
it to [2], which discusses the actual reasons. At the time were:

 - HOME is not passed into container as it is set in setupUser by
   os.Setenv and has no effect on config.Env;
 - there is no deduplication of environment variables.

Yet it was decided to not go ahead with this patch, but later [3] was
merged with the carry of this patch.

Now, from what I see:

1. Passing environment to exec is way faster than using os.Setenv and
   os.Environment() (tests show ~20x faster in simple Go test, and 2x
   faster in real-world test, see below).
2. Setting environment variables in the runc context can result is ugly
   side effects (think GODEBUG).
3. Nothing in runtime spec says that the environment needs to be
   deduplicated, or the order of preference (whether the first or the
   last value of a variable with the same name is to be used). In C
   (Linux/glibc), the first value is used. In Go, it's the last one.
   We should probably stick to what we have in order to maintain
   backward compatibility.

This patch:
 - switches to passing env directly to exec;
 - adds deduplication mechanism to retain backward compatibility;
 - sets PATH from process.Env in the current process;
 - adds HOME to process.Env if not set;
 - removes os.Clearenv call as it's no longer needed.

The benchmark added by the previous commit shows 2x improvement:

> name             old time/op    new time/op    delta
> ExecInBigEnv-20    61.7ms ± 4%    24.9ms ±14%  -59.73%  (p=0.000 n=10+10)

The remaining questions are:
 - are there any potential regressions (for example, from not setting
   values from process.Env to the current process);
 - should deduplication show warnings (maybe promoted to errors later);
 - whether a default for PATH (e.g "/bin:/usr/bin" should be added,
   when PATH is not set.

[1]: opencontainers#1983
[2]: docker-archive/libcontainer#418
[3]: docker-archive/libcontainer#432

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jul 12, 2024
The current implementation sets all the environment variables passed in
Process.Env in the current process, one by one, then uses os.Environ to
read those back.

As pointed out in [1], this is slow, as runc calls os.Setenv for every
variable, and there may be a few thousands of those. Looking into how
os.Setenv is implemented, it is indeed slow, especially when cgo is
enabled.

Looking into why it was implemented, I found commit 9744d72 and traced
it to [2], which discusses the actual reasons. At the time were:

 - HOME is not passed into container as it is set in setupUser by
   os.Setenv and has no effect on config.Env;
 - there is no deduplication of environment variables.

Yet it was decided to not go ahead with this patch, but later [3] was
merged with the carry of this patch.

Now, from what I see:

1. Passing environment to exec is way faster than using os.Setenv and
   os.Environment() (tests show ~20x faster in simple Go test, and 2x
   faster in real-world test, see below).
2. Setting environment variables in the runc context can result is ugly
   side effects (think GODEBUG).
3. Nothing in runtime spec says that the environment needs to be
   deduplicated, or the order of preference (whether the first or the
   last value of a variable with the same name is to be used). In C
   (Linux/glibc), the first value is used. In Go, it's the last one.
   We should probably stick to what we have in order to maintain
   backward compatibility.

This patch:
 - switches to passing env directly to exec;
 - adds deduplication mechanism to retain backward compatibility;
 - sets PATH from process.Env in the current process;
 - adds HOME to process.Env if not set;
 - removes os.Clearenv call as it's no longer needed.

The benchmark added by the previous commit shows 2x improvement:

> name             old time/op    new time/op    delta
> ExecInBigEnv-20    61.7ms ± 4%    24.9ms ±14%  -59.73%  (p=0.000 n=10+10)

The remaining questions are:
 - are there any potential regressions (for example, from not setting
   values from process.Env to the current process);
 - should deduplication show warnings (maybe promoted to errors later);
 - whether a default for PATH (e.g "/bin:/usr/bin" should be added,
   when PATH is not set.

[1]: opencontainers#1983
[2]: docker-archive/libcontainer#418
[3]: docker-archive/libcontainer#432

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 10, 2024
The current implementation sets all the environment variables passed in
Process.Env in the current process, one by one, then uses os.Environ to
read those back.

As pointed out in [1], this is slow, as runc calls os.Setenv for every
variable, and there may be a few thousands of those. Looking into how
os.Setenv is implemented, it is indeed slow, especially when cgo is
enabled.

Looking into why it was implemented, I found commit 9744d72 and traced
it to [2], which discusses the actual reasons. At the time were:

 - HOME is not passed into container as it is set in setupUser by
   os.Setenv and has no effect on config.Env;
 - there is no deduplication of environment variables.

Yet it was decided to not go ahead with this patch, but later [3] was
merged with the carry of this patch.

Now, from what I see:

1. Passing environment to exec is way faster than using os.Setenv and
   os.Environ (tests show ~20x faster in simple Go test, and 2x faster
   in real-world test, see below).
2. Setting environment variables in the runc context can result is ugly
   side effects (think GODEBUG, LD_PRELOAD, or _LIBCONTAINER_*).
3. Nothing in runtime spec says that the environment needs to be
   deduplicated, or the order of preference (whether the first or the
   last value of a variable with the same name is to be used). We should
   stick to what we have in order to maintain backward compatibility.

This patch:
 - switches to passing env directly to exec;
 - adds deduplication mechanism to retain backward compatibility;
 - sets PATH from process.Env in the current process;
 - adds HOME to process.Env if not set;
 - removes os.Clearenv call as it's no longer needed.

The benchmark added by the previous commit shows 2x improvement:

> name             old time/op    new time/op    delta
> ExecInBigEnv-20    61.7ms ± 4%    24.9ms ±14%  -59.73%  (p=0.000 n=10+10)

The remaining questions are:
 - are there any potential regressions (for example, from not setting
   values from process.Env to the current process);
 - should deduplication show warnings (maybe promoted to errors later);
 - whether a default for PATH (e.g "/bin:/usr/bin" should be added,
   when PATH is not set (most software does that).

[1]: opencontainers#1983
[2]: docker-archive/libcontainer#418
[3]: docker-archive/libcontainer#432

Signed-off-by: Kir Kolyshkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants