You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Delivery Gear is expected to be used with different load and load patterns. Ranging from small instances with few artefacts to process and few concurrent users to larger installations. Also, periodic scans / version-updates will in either case cause load bursts.
Therefore, there are some optimisations we should pursue:
Caching
Limit (local) in-memory caching towards a configurable maximum allowed memory size. Depending on the cached data, decide whether in-memory-cache or local filesystem cache is more adequate. In doubt, prefer filesystem-cache (maybe in conjunction with pickle rather than more expensive x-serialisation via yaml/json + dacite).
Use centralised/shared cache to avoid cache-loss / redundant caching between multiple pods.
Implement means for explicit cache invalidation (probably via api-route).
Resource Allocation / Auto-Scaling
Monitor actual load (amount of parallel requests, request waiting time, CPU-consumption/machine-load), use load metrics for autoscaling using k8s means (within configurable boundaries). Consider Delivery-Service separately from extensions.
async (ASGI)
Thoroughly investigate switching to async/ASGI. Specifically analyse:
bottlenecks (cpu-bound parts in plain python code - -> GIL); async might be worse than multithreading in some cases
identify io-bound code that should also be switched to async (esp. oci package)
Configure exporting of metrics to determine current workloads and bottlenecks. This information can and should be used to properly configure caching afterwards.
Context / Motivation
Delivery Gear is expected to be used with different load and load patterns. Ranging from small instances with few artefacts to process and few concurrent users to larger installations. Also, periodic scans / version-updates will in either case cause load bursts.
Therefore, there are some optimisations we should pursue:
Caching
Limit (local) in-memory caching towards a configurable maximum allowed memory size. Depending on the cached data, decide whether in-memory-cache or local filesystem cache is more adequate. In doubt, prefer filesystem-cache (maybe in conjunction with pickle rather than more expensive x-serialisation via yaml/json + dacite).
Use centralised/shared cache to avoid cache-loss / redundant caching between multiple pods.
Implement means for explicit cache invalidation (probably via api-route).
Resource Allocation / Auto-Scaling
Monitor actual load (amount of parallel requests, request waiting time, CPU-consumption/machine-load), use load metrics for autoscaling using k8s means (within configurable boundaries). Consider Delivery-Service separately from extensions.
async (ASGI)
Thoroughly investigate switching to async/ASGI. Specifically analyse:
bottlenecks (cpu-bound parts in plain python code - -> GIL); async might be worse than multithreading in some cases
identify io-bound code that should also be switched to async (esp.
oci
package)Add async OCI and OCM packages gardener/cc-utils#1053
Use async web server #202
Monitoring / Metric-Export
Configure exporting of metrics to determine current workloads and bottlenecks. This information can and should be used to properly configure caching afterwards.
/metrics
endpoint #220The text was updated successfully, but these errors were encountered: