Released on: 2024-11-07 Pinned to datadog-agent v7.59.0: CHANGELOG.
Released on: 2024-11-04 Pinned to datadog-agent v7.58.2: CHANGELOG.
Released on: 2024-10-24 Pinned to datadog-agent v7.58.1: CHANGELOG.
Released on: 2024-10-21 Pinned to datadog-agent v7.58.0: CHANGELOG.
- Added capability to tag any Kubernetes resource based on labels and annotations. This feature can be configured with kubernetes_resources_annotations_as_tags and kubernetes_resources_labels_as_tags. These feature configurations are associate group resources with annotations-to-tags (or labels-to-tags) map For example, deployments.apps can be associated with an annotations-to-tags map to configure annotations as tags for deployments. Example: {deployments.apps: {annotationKey1: tag1, annotationKey2: tag2}}
- The Kubernetes State Metrics (KSM) check can now be configured to collect pods from the Kubelet in node agents instead of collecting them from the API Server in the Cluster Agent or the Cluster check runners. This is useful in clusters with a large number of pods where emitting pod metrics from a single check instance can cause performance issues due to the large number of metrics emitted.
- Added a new option for the Cluster Agent ("admission_controller.inject_config.type_socket_volumes") to specify that injected volumes should be of type "Socket". This option is disabled by default. When set to true, injected pods will not start until the Agent creates the DogstatsD and trace-agent sockets. This ensures no traces or DogstatsD metrics are lost, but it can cause the pod to wait if the Agent has issues creating the sockets.
- Fixed an issue that prevented the Kubernetes autoscaler from evicting pods injected by the Admission Controller.
Released on: 2024-09-17 Pinned to datadog-agent v7.57.1: CHANGELOG.
Released on: 2024-09-09 Pinned to datadog-agent v7.57.0: CHANGELOG.
- The Cluster Agent now supports activating Continuous Profiling using Admission Controller.
LimitRange
andStorageClass
resources are now collected by the orchestrator check.
- The auto-instrumentation webhook (beta) uses a new injector library.
- Fixes a rare bug where some Kubernetes events would be emitted without a timestamp and would be dropped upstream as a result.
- Library package versions for auto-instrumentation are now set to the latest major
version of the library-package instead of latest.
- java:v1
- dotnet:v2
- python:v2
- ruby:v2
- js:v5
- Fix APIServer error logs generated when external metrics endpoint is activated
Released on: 2024-09-02 Pinned to datadog-agent v7.56.2: CHANGELOG.
Released on: 2024-08-29 Pinned to datadog-agent v7.56.1: CHANGELOG.
Released on: 2024-08-16 Pinned to datadog-agent v7.56.0: CHANGELOG.
- Disables default injection of the .NET profiler dependency for Kubernetes auto_instrumentation.
- Mark the NetworkPolicy collector as stable in the Cluster Agent
- Enabled language detection automatically in the injected agent sidecar on EKS Fargate when APM SSI is enabled. This is only available for users using the admission controller to automatically inject the agent sidecar.
- The orchestrator check can now scrub sensitive data from probes in pods specifications.
- Fixes issue where the external metrics server would sometimes return metrics which had not been updated for longer than the configured external_metrics_provider.max_age as valid. In connection with this fix, a new config (external_metrics_provider.query_validity_period) has been added to account for the delay between when metrics are resolved and when they are queried by the various autoscaling controllers. It is set to 30 seconds by default.
Released on: 2024-08-01 Pinned to datadog-agent v7.55.3: CHANGELOG.
Released on: 2024-07-25 Pinned to datadog-agent v7.55.2: CHANGELOG.
Released on: 2024-07-12 Pinned to datadog-agent v7.55.1: CHANGELOG.
Released on: 2024-07-11 Pinned to datadog-agent v7.55.0: CHANGELOG.
- Add support for kubernetes_namespace_annotations_as_tags. This new option is equivalent to the existing kubernetes_namespace_labels_as_tags, but it considers namespace annotations instead of namespace labels. With this new option, users can enrich tagging based on namespace annotations.
- Support namespace labels as tags on kubernetes events.
- Add
reason:backofflimitexceeded,deadlineexceeded
to thekubernetes_state.job.failed
metric to help users understand why a job failed. - Reduced the memory used to store the tags.
- The Datadog cluster-agent container image is now using Ubuntu 24.04 noble as the base image.
- Fixes an issue with large clusters where the Cluster Agent fails to collect all tags when cluster_agent.collect_kubernetes_tags is enabled.
Released on: 2024-06-18 Pinned to datadog-agent v7.54.1: CHANGELOG.
Released on: 2024-05-29 Pinned to datadog-agent v7.54.0: CHANGELOG.
- Add LimitRange and StorageClass collection in the orchestrator check.
- Added retry mechanism to language detection patcher in order to retry failed patching operations.
- Fix collection of numeric rolling update options in Kubernetes deployments and daemonsets.
- Fixed initialization of language expiration time for detected languages.
Released on: 2024-04-30 Pinned to datadog-agent v7.53.0: CHANGELOG.
- APM library injection now works on EKS Fargate when the admission controller is configured to add an Agent sidecar in EKS Fargate.
- Cluster Agent now supports activating Application Security Management, Code Vulnerabilities, and Software Composition Analysis via Helm charts.
- Add the mutation_webhook tag to admission_webhooks.webhooks_received and admission_webhooks.response_duration Cluster Agent telemetry.
- When using the admission controller to inject an Agent sidecar on EKS Fargate, shareProcessNamespace is now set to true automatically. This is to ensure that the process collection feature works.
- Add agent sidecar injection webhook in cluster-agent Kubernetes admission controller. This new webhook adds the Agent as sidecar container in applicative Pods when it is required by the environment. For example with the EKS Fargate environment.
- Introduces a new config option in the Cluster Agent to set the rebalance
period when advanced dispatching is enabled:
cluster_checks.rebalance_period
. The default value is 10 min.
- Fix an issue where the admission controller would remove the field restartPolicy from native sidecar containers, preventing pod creation on Kubernetes 1.29+.
- Fix missing kube_api_version tag on HPA and VPA resources.
Released on: 2024-02-19 Pinned to datadog-agent v7.51.0: CHANGELOG.
- Enable Horizontal Pod Autoscaler collection for the Orchestrator by default
- Add isolate command to clusterchecks to make it easier to pinpoint a check that that is causing high CPU/memory usage. Command can be run in the cluster agent with: datadog-cluster-agent clusterchecks isolate --checkID=<checkID>
- Enable CRD collection by default in the orchestrator check.
- Fixes a bug that would trigger unnecessary APIServer List requests from the Cluster Agent or Cluster Checks Runner.
- Fixes a bug introduced in 7.50.0 preventing DD_TAGS to be added to kubernetes_state.* metrics.
- Add language detection API handler to the cluster-agent.
- Report rate_limit_queries_remaining_min telemetry from external-metrics server.
- Added a new --force option to the datadog-cluster-agent clusterchecks rebalance command that allows you to force clustercheck rebalancing with utilization.
- [Beta] Enable APM library injection in cluster-agent admission controller based on automatic language detection annotations.
- Show Autodiscovery information in the output of
datadog-cluster-agent status
. - Added CreateContainerConfigError wait reason to the kubernetes_state.container.status_report.count.waiting metric reported by the kubernetes_state_core check.
- Release the Leader Election Lock on shutdown to make the initialization of future cluster-agents faster.
- The Datadog cluster-agent container image is now using Ubuntu 23.10 mantic as the base image.
- Fixed a bug in the
kubernetes_state_core
check that caused tag corruption whentelemetry
was set totrue
. - Fix stale metrics being reported by kubernetes_state_core check in some rare cases.
- Fixed a bug in the rebalancing of cluster checks. Checks that contained
secrets were never rebalanced when the Cluster Agent was configured to not
resolve check secrets (option
secret_backend_skip_checks
set to true).
- Added option to attach profiling data to a flare.
- Increment cluster agent admission controller mutation attempts metric when library is auto-injected.
- Added the
check_name
tag to thecluster_checks.configs_info
metric emitted by the Cluster Agent telemetry. - Sensitive information is now scrubbed from pod annotations.
- Skip collections for resources missing RBACs in orchestrator check
- Remove openmetrics endpoint default value from containerd check default configuration.
- Resolved a conflict between the admission controller and the AKS admissions enforcer that previously led to a loop in reconciling the webhook.
- Fixes a panic in the Cluster Agent that happens when trying to unschedule a check that has not been dispatched to any runner.
- Added the kubernetes_state.pod.tolerations metric to the KSM core check
- Add
HorizontalPodAutoscaler
collection in the orchestrator check.
- Add safeguards for orchestrator CRD collection.
- The Datadog cluster-agent container image is now using Ubuntu 23.04 lunar as the base image.
- Fixed an error in the calculations performed by the algorithm that rebalances cluster checks. Cluster checks are now more evenly distributed when advanced dispatching is enabled (
cluster_checks.advanced_dispatching_enabled
is set to true). - Service checks are no longer excluded from rebalancing decisions when advanced dispatching is enabled (
cluster_checks.advanced_dispatching_enabled
is set to true). - Fixes a rare bug in the Kubernetes State check that causes the Agent to incorrectly tag the
kubernetes_state.job.complete
service check. - Removes an incorrect warning log message that mentions that the DD_POD_NAME env var is unknown.
- Fixes the KSM check to support HPA v2beta2 again. This stopped working in Agent v7.44.0.
- Adds the kube_cluster_name tag as a static global tag to the cluster agent when the DD_CLUSTER_NAME config option is set. This should fix an issue where the tag is not being attached to metrics in certain environments, such as EKS Fargate.
- Fixed a bug in the advanced dispatching of cluster checks. All the checks scheduled since the last rebalance were being scheduled in the same node. Now they should be distributed among the available nodes.
- Add support for leases in leader election which can be enabled by setting
leader_election_default_resource
toleases
, available since Kubernetes version 1.14. If this parameter is empty, leader election automatically detects if leases are available and uses them. Setleader_election_default_resource
toconfigmap
on clusters running Kubernetes versions previous to 1.14.
- Auto-instrumentation admission controller now automatically activates crash tracking for Java applications
- Expose to cluster-agent HistogramBuckets and Events check stats. It should help the cluster-agent to define a better cluster-checks dispatching.
- The Cluster Agent Admission Controller now injects DD_DOGSTATSD_URL when used in socket mode (default), allowing DogStatsD clients to work without configuration.
- Fix persistent volume type for local volumes.
- Enable collection of Vertical Pod Autoscalers by default in the orchestrator check.
- Collect conditions for a variety of Kubernetes resources.
- Collect persistent volume source in the orchestrator check.
- Fix the timeout for idle HTTP connections.
- When the cluster-agent is started with
hostNetwork: true
, the leader election mechanism was using a node name instead of the pod name. This was breaking the “follower to leader” forwarding mechanism. This change introduce theDD_POD_NAME
environment variable as a more reliable way to set the cluster-agent pod name. It is supposed to be filled by the Kubernetes downward API.
- Add "active" tag on the telemetry datadog.cluster_agent.external_metrics.datadog_metrics tag. The label active is true if DatadogMetrics CR is used, false otherwise.
- Library injection via Admission Controller: Allow configuring the CPU and Memory requests/limits for library init containers.
- Validate the orchestration config provided by the user.
- Fix the admission controller in socket mode for pods with init containers.
- Fix resource requirements detection for containers without any request and limit set.
- The KSM core check now correctly handles labels and annotations with uppercase letters defined in the "labels_as_tags" and "annotations_as_tags" config attributes.
- Add conditions to Vertical Pod Autoscalers
- Experimental: Support Ruby library injection through the Admission Controller on Kubernetes.
- Add new metrics for the KSM Core check for extended resources: - Pod requests and limits of the network bandwidth extended resource: kubernetes_state.container.network_bandwidth_limit, kubernetes_state.container.network_bandwidth_requested - The capacity and allocatable network bandwidth extended resource of a node: kubernetes_state.node.network_bandwidth_allocatable, kubernetes_state.node.network_bandwidth_capacity
- Admission Controller: Add telemetry around auto-instrumentation via remote config.
- The UDS socket volume when using the Admission Controller is now mounted in readOnly mode.
- Starts the collecting of Vertical Pod Autoscalers within Kubernetes clusters.
- Enable orchestrator manifest collection by default
- Make the cluster-agent admission controller able to inject libraries for several languages in a single pod.
- Supports the collection of custom resource definition and custom resource manifests for the orchestrator explorer.
- Collects Unified Service Tags for the orchestrator explorer product.
- Add
Namespace
collection in the orchestrator check and enable it by default.
- Improves performance of the Cluster Agent admission controller on large pods.
- Experimental: The Datadog Admission Controller can inject the Python APM library into Kubernetes containers for auto-instrumentation.
- The orchestrator check is now able to discover resources to collect based on API groups available in the Kubernetes cluster.
- The admission controller now injects variables and volume mounts to init containers in addition to regular containers.
- Chunk orchestrator payloads by size and weight
- KSM Core check: Add the
helm_chart
tag automatically from the standard helm labelhelm.sh/chart
. - Helm check: Add a
helm_chart
tag, equivalent to the standard helm labelhelm.sh/chart
(see https://helm.sh/docs/chart_best_practices/labels/).
- Fixed an edge case in the Admission Controller when
mutateUnlabelled
is enabled andconfigMode
is set tosocket
. This combination could prevent the creation of new DaemonSet Agent pods. - Fixed a resource leak in the helm check.
- Experimental: The Datadog Admission Controller can inject the Node and Java APM libraries into Kubernetes containers for auto-instrumentation.
- When injecting env vars with the admission controller, env vars are now prepended instead of appended, meaning that Kubernetes [dependent environment variables](https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/) can now depend on these injected vars.
- The
helm
check has new configuration parameters: -extra_sync_timeout_seconds
(default 120) -informers_resync_interval_minutes
(default 10) - Improves the labelsAsTags feature of the Kubernetes State Metrics core check by performing the transformations of characters ['/' , '-' , '.'] to underscores ['_'] within the Datadog agent. Previously users had to perform these conversions manually in order to discover the labels on their resources.
- Fix the DCA
leader_election_is_leader
metric that could sometimes reportis_leader="false"
on the leader instance - Fixed an error when running
datadog-cluster-agent status
withDD_EXTERNAL_METRICS_PROVIDER_ENABLED=true
and no app key set. - The KSM Core check now handles cron job schedules with time zones.
- Align Cluster Agent version to Agent version. Cluster Agent will now be released with 7.x.y tags
Released on: 2022-07-26 Pinned to datadog-agent v7.38.0: CHANGELOG.
- Enable collection of Ingresses by default in the orchestrator check.
Released on: 2022-06-28 Pinned to datadog-agent v7.37.0: CHANGELOG.
- The Cluster Agent followers now forward queries to the Cluster Agent leaders themselves. This allows a reduction in the overall number of connections to the Cluster Agent and better spreads the load between leader and forwarders.
- Make the name of the ConfigMap used by the Cluster Agent for its leader election configurable.
- The Datadog Cluster Agent exposes a new metric
endpoint_checks_configs_dispatched
.
- Fix a panic occuring during the invocation of the check command on the Cluster Agent if the Orchestrator Explorer feature is enabled.
- Fix the node count reported for Kubernetes clusters.
Released on: 2022-05-22 Pinned to datadog-agent v7.36.0: CHANGELOG.
- The Datadog Admission Controller supports multiple configuration injection
modes through the
admission_controller.inject_config.mode
parameter or theDD_ADMISSION_CONTROLLER_INJECT_CONFIG_MODE
environment variable: -hostip
: Inject the host IP. (default) -service
: Inject Datadog's local-service DNS name. -socket
: Inject the Datadog socket path. - Collect ResourceRequirements for jobs and cronjobs for kubernetes live containers.
- Added a configuration option to admission controller to allow configuration of the failure policy. Defaults to Ignore which was the previous default. The default of Ignore means that pods will still be admitted even if the webhook is unavailable to inject them. Setting to Fail will require the admission controller to be present and pods to be injected before they are allowed to run.
- The admission controller's reinvocation policy is now set to
IfNeeded
by default. It can be changed using theadmission_controller.reinvocation_policy
parameter. - The Datadog Cluster Agent now supports internal profiling.
- KSM core check: add a new
kubernetes_state.cronjob.complete
service check that returns the status of the most recent job for a cronjob.
- Cluster Agent API (only used by Node Agents) is now only server with TLS >= 1.3 by default. Setting "cluster_agent.allow_legacy_tls" to true allows to fallback to TLS 1.0.
- Fix the node count reported for Kubernetes clusters.
- Fixed an issue that created lots of log messages when the DCA admission controller was enabled on AKS.
- Time-based metrics (for example, kubernetes_state.pod.age, kubernetes_state.pod.uptime) are now comparable in the Kubernetes state core check.
- Fix a risk of panic when multiple KSM Core check instances run concurrently.
- Remove noisy Kubernetes API deprecation warnings in the Cluster Agent logs.
- Change the default value of the external metrics provider port from 443 to 8443. This will allow to run the cluster agent with a non-root user for better security. This was already the default value in the Helm chart and in the datadog operator.
Released on: 2022-04-12 Pinned to datadog-agent v7.35.0: CHANGELOG.
- Collect ResourceRequirements on other K8s workloads as well for live containers (Deployment, StatefulSet, ReplicaSet, DaemonSet)
- Enable collection of Roles/RoleBindings/ClusterRoles/ClusterRoleBindings/ServiceAccounts by default in the orchestrator check.
- Add
Ingress
collection in the orchestrator check.
- Fix a bug that prevents scrubbing sensitive content on the DaemonSet resource.
- Fix a bug that prevents scrubbing sensitive content on the StatefulSet resource.
- Adds a new histogram metric admission_webhooks_response_duration to monitor the admission-webhook's response time. The existing metric admission_webhooks_webhooks_received is now a counter.
- The cluster agent has an external metrics provider feature to allow using Datadog queries in Kubernetes HorizontalPodAutoscalers.
- It sometimes faces issues like:
2022-01-01 01:01:01 UTC | CLUSTER | ERROR | (pkg/util/kubernetes/autoscalers/datadogexternal.go:79 in queryDatadogExternal) | Error while executing metric query ... truncated... API returned error: Query timed out
To mitigate this problem, use the new
external_metrics_provider.chunk_size
parameter to reduce the number of queries that are batched by the Agent and sent together to Datadog.
Released on: 2022-03-01 Pinned to datadog-agent v7.34.0: CHANGELOG.
- Add an
external_metrics_provider.endpoints
parameter that allows to specify a list of external metrics provider endpoints.
If the first one fails, the DCA will query the next ones. - Support file-based endpoint checks. - Enable collection of PV/PVCs by default in the orchestrator check - File-based cluster checks support Autodiscovery.
- Fix the
Admission Controller
/Webhooks info
section of the cluster agentagent status
output on Kubernetes 1.22+.
Although the cluster agent was able to register its webhook with both the v1beta1
and the v1
version of the Administrationregistration API, the agent status
command was always using the v1beta1
, which has been removed in Kubernetes 1.22.
- Improve error handling of deleted HPA objects.
- Fix an issue where scrubbing custom sensitive words would not work as intended for the orchestrator check.
- Fixed a bug that could prevent the Admission Controller from starting when the External Metrics Provider is enabled.
- Fix the caculation of orchestrator cache hits.
Released on: 2022-01-26 Pinned to datadog-agent v7.33.0: CHANGELOG.
- Collect PVC tag on pending pods
- Add the ability to filter for check names in the cluster checks output.
- Add reworked status output for orchestrator section on CLC setups.
- Fix the removal of the "kubectl.kubernetes.io/last-applied-configuration" annotation on new collected resources
- Add autoscaler resource kind (hpa,wpa) inside the DatadogMetrics status references.
Released on: 2021-11-10 Pinned to datadog-agent v7.32.0: CHANGELOG.
- Introduce the collection of the following resources: ClusterRole, ClusterRoleBinding, Role, RoleBinding, ServiceAccount.
- Fix tags for PV resources in the Orchestrator Explorer (type and phase).
- Fix an edge case in which the Cluster Agent's Admission Controller doesn't update the Webhook object according to specified configuration.
Released on: 2021-09-13 Pinned to datadog-agent v7.31.0: CHANGELOG.
- Enable
StatefulSet
collection by default in the orchestrator check. - Add
PV
andPVC
collection in the orchestrator check. - Added possibility to use the maxAge attribute defined in the datadogMetric CRD overriding the global maxAge.
Released on: 2021-08-12 Pinned to datadog-agent v7.30.0: CHANGELOG.
- Enable
DaemonSet
collection by default in the orchestrator check. AddStatefulSet
collection in the orchestrator check.
- The Cluster Agent's Admission Controller now uses the
admissionregistration.k8s.io/v1
kubernetes API when available. - The Cluster Agent can be instructed to dispatch cluster checks without decrypting secrets. The node Agent or the cluster check runner will fetch the secrets after receiving the configurations from the Cluster Agent. This can be enabled by setting
DD_SECRET_BACKEND_SKIP_CHECKS
totrue
in the Cluster Agent config. - The Cluster Agent's external metrics provider now serves an OpenAPI endpoint.
- Add the ability to change log_level at runtime. To set the log_level to
debug
the following command should be used:agent config set log_level debug
. - Improve status and flare for the Cluster Check Runners.
- Show different orchestrator status collection information between follower and leader.
- Fix an edge case where the Admission Controller doesn't update the certificate according to the Cluster Agent configuration.
Released on: 2021-07-05 Pinned to datadog-agent v7.29.0: CHANGELOG.
- Fix the embedded security policy version to match the one from the agent.
Released on: 2021-06-22 Pinned to datadog-agent v7.29.0: CHANGELOG.
- Collect the DaemonSet resources for the orchestrator explorer.
- The Cluster Agent exposes a new metric external_metrics.datadog_metrics to track the validity of DatadogMetric objects.
- Add additional status information in orchestrator section output. Whether collection works and whether cluster name is set.
- Autodetect EC2 cluster name
- Decrease the Admission Controller timeout to avoid edge cases where high timeouts can cause ignoring the
failurePolicy
(see kubernetes/kubernetes#71508). - The Cluster Agent's admission controller now requires the pod label
admission.datadoghq.com/enabled=true
to inject standard labels. This optimizes the number of mutation webhook requests.
Pinned to datadog-agent v7.28.0-rc.5
- The cluster-agent container now tries to remove any folder beginning by
..
in paths of files mounted in/conf.d
while copying them to the cluster-agent config folder - collect cluster resource for orchestrator explorer.
- It's now possible to template the kube_cluster_name tag in DatadogMetric queries Example: avg:nginx.net.request_per_s{kube_container_name:nginx,kube_cluster_name:%%tag_kube_cluster_name%%}
- It's now possible to template any environment variable (as seen by the Datadog Cluster Agent) as tag in DatadogMetric queries Example: avg:nginx.net.request_per_s{kube_container_name:nginx,kube_cluster_name:%%env_DD_CLUSTER_NAME%%}
- It is now possible to configure a custom timeout for the MutatingWebhookConfigurations objects controlled by the Cluster Agent via DD_ADMISSION_CONTROLLER_TIMEOUT_SECONDS. (Default: 30 seconds)
- The Datadog Cluster Agent's Admission Controller now uses a namespaced secrets informer. It no longer needs permissions to watch secrets at the cluster scope.
- The cluster agent now uses the same configuration than the security agent for the logs endpoints configuration. The parameters (such as logs_dd_url can be either be specified in the compliance_config.endpoints section or through environment variables (such as DD_COMPLIANCE_CONFIG_ENDPOINTS_LOGS_DD_URL).
- Improve the resilience of the connection of controllers to the External Metrics Server by moving to a dynamic client for the WPA controller.
- Change base Docker image used to build the Cluster Agent imges, moving from debian:bullseye to ubuntu:20.10. In the future the Cluster Agent will follow Ubuntu stable versions.
- Fix a potential file descriptors leak.
- The Cluster Agent can now be configured to use tls 1.2 via DD_FORCE_TLS_12=true
- Fix "Error creating expvar server" error log when running the Datadog Cluster Agent CLI commands.
- Fix a bug preventing the "DD_ORCHESTRATOR_EXPLORER_ORCHESTRATOR_ADDITIONAL_ENDPOINTS" environment variable to be read.
Released on: 2021-03-02 Pinned to datadog-agent v7.26.0: CHANGELOG.
- Support Prometheus Autodiscovery for Kubernetes Services.
- Add external_metrics_provider.api_key and external_metrics_provider.app_key parameters overriding default api_key and app_key if set.
- Add a new external_metrics_provider.endpoint config in datadog-cluster.yaml and a DD_EXTERNAL_METRICS_PROVIDER_ENDPOINT environment variable to override the default Datadog API endpoint to query external metrics from, in place of the global DATADOG_HOST. It also makes the external metrics provider respect DD_SITE if DD_EXTERNAL_METRICS_PROVIDER_ENDPOINT is not set.
- Node schedulability is now a dedicated tag on kubernetes node resources.
- Fix dual shipping for orchestrator resources in the cluster agent.
- Released on: 2021-03-02
- Pinned to datadog-agent v7.24.0: CHANGELOG..
- Add a new command 'datadog-cluster-agent health' to show the cluster agent's health, similar to the already existing agent health.
- collect node information for the orchestrator explorer
- Fill DatadogMetric AutoscalerReferences field to ease usage/investigation of DatadogMetrics
- The Cluster Agent can now collect stats from Cluster Level Check runners to optimize its dispatching logic and rebalance the scheduled checks.
- Allow providing custom tags to orchestrator resources.
- Add new configuration parameter to allow 'GroupExec' permission on the secret-backend command. The new parameter ('secret_backend_command_allow_group_exec_perm') is now enabled by default in the cluster-agent image.
- Add resolve option to endpoint checks through new annotation ad.datadoghq.com/endpoints.resolve. With ip value, it allows endpoint checks to target static pods
- Expose metrics for the cluster level checks advanced dispatching.
- Fix 'readsecret.sh' permission in Cluster-Agent dockerfiles that removes other permission.
- Fix issue in Cluster Agent when using external metrics without DatadogMetrics where multiple HPAs using the same metricName + Labels would prevent all HPAs (except 1st one) to get values from Datadog
- Ensure that leader election runs if orchestrator_explorer and leader_election are enabled.
- Rename node role tag from "node_role" to "kube_node_role" in orchestrator_explorer collection.
Released on: 2020-10-21 Pinned to datadog-agent v7.23.1: CHANGELOG..
- Support of secrets in JSON environment variables, added in 7.23.0, is reverted due to a side effect (e.g. a string value of "-" would be loaded as a list). This feature will be fixed and added again in a future release.
Released on: 2020-10-13 Pinned to datadog-agent v7.23.0: CHANGELOG..
- Collect the node and cluster resource in Kubernetes for the Orchestrator Explorer (#6297).
- Add resolve option to the endpoint checks (#5918).
- Add health command (#6144).
- Add options to configure the External Metrics Server (#6406).
- Fill DatadogMetric AutoscalerReferences field to ease usage/investigation of DatadogMetrics (#6367).
- Only run compliance checks on the Cluster Agent leader (#6311).
- Add orchestrator_explorer configuration to enable the cluster-id ConfigMap creation and Orchestrator Explorer instanciation (#6189).
- Fix transformer for gibiBytes and gigaBytes (#6437).
- Fix cluster-agent commands to allow executing the readsecret.sh script for the secret backend feature (#6445).
- Fix issue with External Metrics when several HPAs use the same query (#6412).
Released on: 2020-08-07
- Add compliance check command to the DCA CLI (#5930)
- Add clusterchecks rebalance command (#5839)
- Add collection of additional Kubernetes resource types (deployments, replicaSets and services) for Live Containers (#6082, #5999)
- Support "ignore AD tags" parameter for cluster/endpoint checks (#6115)
- Use APIserver connection retrier (#6106)
Released on: 2020-07-20
This version contains the changes released with version 7.21.0 of the core agent. Please refer to the CHANGELOG.
- Add support of DatadogMetric CRD to allow autoscaling based on arbitrary queries (#5384)
- Add Admission Controller to inject Entity ID, standard tags and agent host (useful in serverless environments)
- Add leader_election_is_leader metric to allow label joins (#5819)
Released on: 2020-06-11
This version contains the changes released with version 7.20.0 of the core agent. Please refer to the CHANGELOG.
- Wait for client-go cache to sync for endpoints/services (#5291)
- Consider check failure in advanced rebalancing (#5441)
- Autodiscover standard tags for Cluster and Endpoint Checks (#5241)
- Adds a metric to monitor the advanced dispatching algorithm (#4970)
Released on: 2020-02-11
Minor release on 1.5 branch
- Fix agent commands in DCA (always start listener) (#4870)
Released on: 2020-02-06
Minor release on 1.5 branch
- [DCA] fix cluster-agent flare panic (#4838)
- Remove setcap NET_BIND_SERVICE as we cannot make it work with user namespaces used in the CI (#4846)
- Add service listener in endpoints to watch for newly annotated services (#4816)
- Fix typo (#4831)
Released on: 2020-01-28
This version contains the changes released with version 7.17.0 of the core agent. Please refer to the CHANGELOG.
- Adding logic to show DCA status for clc (#4738)
- Introduce Rate Limiting Stats in the /metrics of the Cluster Agent (#4669)
- MetricServer generates k8s event on HPA
- Add cluster-name tag in host tags (#4558)
- Add read-secret command in cluster-agent to use as secrets backend (#4639)
- Adding logic to show DCA status for clc (#4738)
- Allow dots in cluster names (#4611)
- Check if CheckMetadata exist before iterating over it in cluster agent status page (#4728)
- Grant CAP_NET_BIND_SERVICE capability to the cluster_agent (#4439)
- Ignore invalid cluster names instead of panicking (#4549)
- Fix eventrecorder init (#4732)
- Handle NewHandler failure better in setupClusterCheck (#4447)
- Adding User-Agent to the DCA client
- Filter non-cluster-checks (#4566)
Released on: 2019-11-06
This version contains the changes released with version 6.15.0 of the core agent. Please refer to the CHANGELOG.
- Introducing the Advanced dispatching logic to rebalancing Cluster Level Checks [#4068, #4226, #4344]
- Enable the Endpoint check logic [#3853, #3704]
- HTTP proxy support for the external metrics provider #4191
- Improve External Metrics Provider resiliency [#4285, #3727]
- Revamp the Kubernetes event collection check [#4259, #4346, #4342, #4337, #4314]
- Update Gopkg.lock with new import #3837
- Fix kubernetes_apiserver default config file #3854
- Fix registration of the External Metrics Server's API #4233
- Fixing status of the Cluster Agent if the External Metrics Provider is not enabled #4277
- Fix how the endpoints check source is displayed in agent command outputs #4357
- Fix how we invalidate changed Endpoints config #4363
- Get Cluster Level Checks runner IPs from headers #4386
- Fixing output of agent status #4352
2019-07-09
- Fix Cluster-agent failure with cluster-agent flare command.
2019-06-19
- Fix "Kube Services" service: kube service tags attached to pod are not consistent.
Released on: 2019-05-07
The Datadog Cluster Agent can now auto-discover config templates for kubernetes endpoints checks and expose them to node Agents via its API. This feature is compatible with the version 6.12.0 and up of the Datadog Agent.
Refer to the official documentation to read more about this feature.
2019-05-03
- Fix race condition: immutable MetaBundle stored in DCA cache.
2019-04-30
- Fix race condition in Cluster Agent's API handler.
2019-04-24
- The Cluster Agent can now auto-discover config templates for kubernetes endpoints checks and expose them to node Agents via its API
- Add the
config
andconfigcheck
command to the cluster agent CLI - Add the
diagnose
command to the cluster agent CLI and flare - Add cluster_checks.extra_tags option to allow users to add tags globally to the cluster level checks.
- Improving Lifecycle of the External Metrics Provider
- Support milliquantities for the External Metrics Provider
- Move some logs from info to debug, in order to generates fewer noisy logs when running correctly.
Released on: 2019-02-25
The Datadog Agent now supports distributing Cluster Level Checks. This feature is compatible with the version 6.9.0 and up of the Datadog Agent.
Refer to the official documentation to read more about this feature.
2019-02-14
- Ensure dangling cluster checks can be re-scheduled
2019-02-12
- Fix re-scheduling of the same clusterchecks config on the same node
2019-02-11
- Sign docker images when pushing to Docker Hub
- Fix configcheck verbose output
- Fix AutoDiscovery rescheduling issue when no template variables
- Remove resolved configs when template are removed
- Support adding/removing the AD annotation to an existing kube service
- Only expose cluster-check prometheus metrics when leading
- Fix support for custom metrics case sensitivity
2019-02-05
The External Metrics Provider is now agnostic of the case, both on the metric name and the labels extracted from HPAs.
- Cluster Agent HPA metrics case support
- Add GetLeaderIP method to LeaderEngine
- Add kube_service config provider
- Allow to set additional Autodiscovery sources by envvars
- Add dispatching metrics in clusterchecks module
- Add a health probe in the ccheck dispatching logic
- Add kube-services AD listener
- Cluster-checks: handle leader election and follower->leader redirection
- Enable clusterchecks in DCA master
- Support /conf.d in cluster-agent image
- Fix clustercheck leader not starting its dispatching logic
- Use the appropriate port when redirecting node-agents to leader
- Cluster-checks: patch configurations on schedule
- Add configcheck/config cmd on the cluster agent
- Add clustercheck info to the cluster-agent's status and flare
- Make error in clusterchecks cmd clear when feature is disabled
2019-01-31
The release of the RC1 was dismissed to embed a fix for the CI runners used to build the image. - Go 1.11.5 compliancy + 1.11.5 for every CI The official release of the Datadog Cluster Agent 1.2.0 starts with the RC2.
The version 1.1.0 of the Cluster Agent introduces new features and enhancements around the External Metrics Provider.
2018-11-21
- Get goautoneg from github
- Fix datadog external metric query when no label is set
2018-11-20
- Migrating back to official custom metrics lib
- Change test to remove flakiness
- Disable cluster checks in cluster-agent 1.1.x
- Allow users to change the custom metric provider port, to run as non-root
- Adding rollup and fix to circumvent time aggregation
- clusterchecks: simple dispatching logic
- Honor external metrics provider settings in cluster-agent status
- Run cluster-agent as non-root, support read-only rootfs
- Only push cluster-agent-dev:master from master
- Fix folder permissions on containerd
- Adding fix for edge case in external metrics
- Fix flare if can't access APIServer
- DCA: fix custom metrics server
- Avoid panicking for missing fields in HPA
Released on: 2018-10-18
The Datadog Cluster Agent is compatible with versions 6.5.1 and up of the Datadog Agent.
- Please refer to the 6.5.0 tag on datadog-agent for the list of changes on the Datadog Agent.
It is only supported in containerized environments.
- Please find the image on our Docker Hub.
2018-10-17
- Expose telemetry metrics with the Open Metrics format instead of expvar
- add mutex logic and safe guards to avoid race condition in the Autoscalers Controller.
2018-10-15
- Leverage diff logic to only update the internal custom metrics store and Config Map with relevant changes.
- Better logging on the Autoscalers Controller
- Make sure only the leader sync Autoscalers.
- Forget keys from the informer's queue to avoid borking the Autoscalers Controller.
2018-10-11
- Support agent and datadog-cluster-agent for the CLI of the Datadog Cluster Agent
- Retrieve hostname in GCE
2018-10-04
- Implement the External Metrics Interface to allow for the Horizontal Pod Autoscalers to be based off of Datadog metrics.
- Use informers to be up to date with the Horizontal Pod Autoscalers object in the cluster.
- Implement the metadata mapper.
- Use informers to be up to date with the Endpoints and Nodes objects in the cluster.
- Serve cluster level metadata on an external endpoint, kube_service tag is available.
- Serve node labels as tags.
- Run the kube_apiserver check to collect events and run a service check against each component of the Control Plane.
- Implements the flare, status and version commands similar to the node agent.