Improve Performance and Resource Allocation behaviour #51

ccwienk · 2024-05-29T16:27:06Z

Context / Motivation

Delivery Gear is expected to be used with different load and load patterns. Ranging from small instances with few artefacts to process and few concurrent users to larger installations. Also, periodic scans / version-updates will in either case cause load bursts.

Therefore, there are some optimisations we should pursue:

Caching

Limit (local) in-memory caching towards a configurable maximum allowed memory size. Depending on the cached data, decide whether in-memory-cache or local filesystem cache is more adequate. In doubt, prefer filesystem-cache (maybe in conjunction with pickle rather than more expensive x-serialisation via yaml/json + dacite).

Use centralised/shared cache to avoid cache-loss / redundant caching between multiple pods.

Add framework to interact with persistent db cache #234
enable caching for component-descriptors
enable caching for selected (expensive) functions
enable caching for selected routes (e.g. DORA)

Implement means for explicit cache invalidation (probably via api-route).

Resource Allocation / Auto-Scaling

Monitor actual load (amount of parallel requests, request waiting time, CPU-consumption/machine-load), use load metrics for autoscaling using k8s means (within configurable boundaries). Consider Delivery-Service separately from extensions.

async (ASGI)

Thoroughly investigate switching to async/ASGI. Specifically analyse:

bottlenecks (cpu-bound parts in plain python code - -> GIL); async might be worse than multithreading in some cases
identify io-bound code that should also be switched to async (esp. oci package)
Add async OCI and OCM packages gardener/cc-utils#1053
Use async web server #202

Monitoring / Metric-Export

Configure exporting of metrics to determine current workloads and bottlenecks. This information can and should be used to properly configure caching afterwards.

Expose prometheus metrics to /metrics endpoint #220

The text was updated successfully, but these errors were encountered:

ccwienk added area/ipcei kind/epic labels May 29, 2024

ccwienk assigned 8R0WNI3 Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Performance and Resource Allocation behaviour #51

Improve Performance and Resource Allocation behaviour #51

ccwienk commented May 29, 2024 •

edited by 8R0WNI3

Loading

Improve Performance and Resource Allocation behaviour #51

Improve Performance and Resource Allocation behaviour #51

Comments

ccwienk commented May 29, 2024 • edited by 8R0WNI3 Loading

Context / Motivation

ccwienk commented May 29, 2024 •

edited by 8R0WNI3

Loading