Skip to content
This repository has been archived by the owner on Jan 19, 2024. It is now read-only.

RFE: specify emptyDir sizeLimit in Job Config #170

Open
3 tasks
christian-kreuzberger-dtx opened this issue Feb 9, 2022 · 3 comments · May be fixed by #311
Open
3 tasks

RFE: specify emptyDir sizeLimit in Job Config #170

christian-kreuzberger-dtx opened this issue Feb 9, 2022 · 3 comments · May be fixed by #311
Labels
type:feature New feature or request that provides value to the stakeholders/end-users

Comments

@christian-kreuzberger-dtx
Copy link
Contributor

christian-kreuzberger-dtx commented Feb 9, 2022

As a user, I want to specify the size limit of the emptyDir for a Kubernetes Job, in order to be able to download files that exceed this limit on demand, e.g., test-data.

Technical Details

Limiting the size of emptyDir is a good practice especially considering that multiple jobs can run in parallel, and to avoid filling up a single hosts memory/disk (temporarily).

However, there’s a feature gate in beta already in K8s 1.22 which would allow to use up to half the memory of the Linux host when not specifying a sizeLimit. Therefore we could even consider not having a sizeLimit by default.

See https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

Note: If the SizeMemoryBackedVolumes feature-gate is enabled, you can specify a size for memory backed volumes. If no size is specified, memory backed volumes are sized to 50% of the memory on a Linux host.

Proposed config.yaml change

apiVersion: v2
actions:
  - name: "Print files"
    events:
      - name: "sh.keptn.event.sample.triggered"
    files:
      - my-32-megabyte-file.txt
    emptyDirSizeLimit: "64Mi" # <----- new field; defaults to 20Mi
    tasks:
      - name: "Show files"
        image: "alpine"
        workingDir: "/keptn"
        cmd:
          - ls -lh

Alternative definition (which provides more flexibility, especially towards future use-cases where we might want to share volumes between jobs)

apiVersion: v2
actions:
  - name: "Print files"
    events:
      - name: "sh.keptn.event.sample.triggered"
    files:
      - my-32-megabyte-file.txt
    volumes: # <---- new field
      - emptyDir:
          sizeLimit: 64Mi
    tasks:
      - name: "Show files"
        image: "alpine"
        workingDir: "/keptn"
        cmd:
          - ls -lh

To be decided: How should config.yaml look like

Code

It should be enough to make this part of the code configurable:

// TODO configure from outside:
quantity := resource.MustParse("20Mi")

Definition of Done

  • Docs / FEATURES.md updated
  • New field(s) in job config (defaults to 20Mi)
  • Unit test added
@christian-kreuzberger-dtx christian-kreuzberger-dtx added the type:feature New feature or request that provides value to the stakeholders/end-users label Feb 9, 2022
@christian-kreuzberger-dtx christian-kreuzberger-dtx changed the title specify emptyDir sizeLimit in Job Config RFE: specify emptyDir sizeLimit in Job Config Feb 9, 2022
@pchila
Copy link
Contributor

pchila commented Feb 9, 2022

The alternative definition looks better and leaves open the possibility to define multiple volumes with different implementations in the future.

For the moment the init container could use the first volume of the definition to copy keptn files over when initializing the job.

In the future if we support multiple volumes to be used by other (non-init) containers we may introduce a name property and a convention to know which ones need to be populated by the init container, for example:

apiVersion: v2
actions:
  - name: "Print files"
    events:
      - name: "sh.keptn.event.sample.triggered"
    files:
      - my-32-megabyte-file.txt
    volumes: # <---- new field
      - name: keptn-files  # this could be the convention for the init container volume name if we have multiple
        emptyDir:
          sizeLimit: 64Mi
      - name: persisted-project-files # some funky name chosen by the user for their own uses
        gcePersistentDisk: # some other (persistent) disk to be mounted and used by the job itself
          pdName: my-data-disk
          fsType: ext4
    tasks:
      - name: "Show files"
        image: "alpine"
        workingDir: "/keptn"
        cmd:
          - ls -lh
        volumeMounts:
          - mountPath: /test-pd
            name: persisted-project-files

@christian-kreuzberger-dtx
Copy link
Contributor Author

christian-kreuzberger-dtx commented Feb 15, 2022

After some more consultation, we figured that it might not be beneficial to provide the option to mount volumes from Kubernetes, as

  • it might be slow, depending on the Kubernetes setup
  • could pose problems (what if multiple jobs run at the same time)
  • does not always solve the use-case properly (e.g., caching needs performance, while Kubernetes volumes might be somewhere in the Cloud and it takes time to load those). It might be better to setup an external caching service (e.g., artifactory, go proxy, ...)

Primary focus should be the emptyDir size, which can then be configured using option 1

@christian-kreuzberger-dtx
Copy link
Contributor Author

A PR is available in #311, though there were some comments on what needs to be improved, which I don't have time to work on, unfortunately. Unassigning myself from this issue.

@christian-kreuzberger-dtx christian-kreuzberger-dtx removed their assignment Nov 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type:feature New feature or request that provides value to the stakeholders/end-users
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants