recommender: add metrics for recommender http actions #227

iksaif · 2024-10-31T16:00:00Z

# TYPE http_client_request_duration_seconds histogram
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.1"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.3"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.6"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="1"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="3"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="6"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="9"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="20"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="+Inf"} 1
http_client_request_duration_seconds_sum{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 0.000287125
http_client_request_duration_seconds_count{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.1"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.3"} 1
100 32288    0 32288    0ect",le="0.6"} 1et{code="308",method="post",recommender="http://localhost:8089/autoscaling/redir
     0  14.5M   http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="1"} 1
  http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="3"} 1
 0http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="6"} 1
 http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="9"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="20"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="+Inf"} 1
http_client_request_duration_seconds_sum{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 0.003237708
http_client_request_duration_seconds_count{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
# HELP http_client_requests_inflight Tracks the number of client requests currently in progress.
# TYPE http_client_requests_inflight gauge
http_client_requests_inflight 0
# HELP http_client_requests_total Tracks the number of HTTP requests.
# TYPE http_client_requests_total counter
http_client_requests_total{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
http_client_requests_total{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
wpa_controller_value{metric_name="http://localhost:8089/autoscaling/redirect",resource_kind="Deployment",resource_name="metrics-server",resource_namespace="kube-system",wpa_name="example-external-recommender",wpa_namespace="kube-system"} 10```

github-actions

This pull request does not contain a valid label. Please add one of the following labels: bug, enhancement, refactoring, documentation, tooling, dependencies

datadog-datadog-prod-us1 · 2024-10-31T16:08:02Z

cmd/skeleton-recommender/main.go

@@ -25,6 +25,8 @@ func int32Ptr(i int32) *int32 {
 // Static autoscaler that always recommends to scale up by 1 replica, can be tried with
 // curl -X POST -d '{"state": {"currentReplicas": 1}, "targets": [{"type": "cpu", "targetValue": 0.5}]}' http://localhost:8089/autoscaling
 func autoscaling(w http.ResponseWriter, r *http.Request) {
+	log.Printf("Handling %s %s %s bytes\n", r.Method, r.URL.Path, r.Header.Get("Content-Length"))


🔵 Code Vulnerability

Avoid leaking errors and data (...read more)

Passing errors directly to log.Println or log.Printf functions can lead to data leakage because these functions typically write the output to a log file, console, or another output without considering the sensitivity of the error message. Error messages often contain sensitive information, such as system paths, user data, or internal application details, which should not be exposed in logs that are accessible to users or unauthorized individuals.

To avoid data leakage when logging errors, it is essential to properly handle error messages by redacting or obfuscating sensitive information before logging them. Instead of passing errors directly to log.Println or log.Printf, it is recommended to extract only the necessary information from the error and log a more generic and secure error message that does not expose sensitive details.

Additionally, consider implementing secure logging practices, such as logging error codes or identifiers rather than full error messages, implementing access controls to restrict access to logs containing sensitive information, and regularly reviewing and auditing the contents of logs to ensure they do not leak sensitive data. By following these best practices, you can improve the security and privacy of your application's logging mechanisms and prevent potential data leakage incidents.

vboulineau · 2024-11-04T13:04:21Z

cmd/skeleton-recommender/main.go

 	log.Printf("> %+v\n\n", response)
 }

+// Redirects /autoscaling/redirect to /autoscaling


What's the purpose of this redirect?

It's to test that redirects work, as they will be needed by most implementations (most of the time you end up having to redirect to the specific shard of the control plane that can give you the recommendation)

vboulineau · 2024-11-04T13:13:43Z

controllers/datadoghq/recommender.go

 	"google.golang.org/protobuf/encoding/protojson"
 	"google.golang.org/protobuf/types/known/structpb"

 	autoscaling "github.com/DataDog/agent-payload/v5/autoscaling/kubernetes"
 	"github.com/DataDog/watermarkpodautoscaler/apis/datadoghq/v1alpha1"
 )

+var (


Metrics are currently all aggregated in metrics.go.

vboulineau · 2024-11-04T13:54:41Z

controllers/datadoghq/recommender.go

+}
+
+func (r *RecommenderClientImpl) instrumentedClient(recommender string) *http.Client {
+	client := *r.client


This code can be misleading, it's definitely possible to change behaviour of original client still. In that case .Transport is only changed and you can hope that the wrapping code does create a new RT and not modify the existing one but there's no guarantee on that.

I think it warrants at least a comment.

vboulineau · 2024-11-04T13:56:15Z

controllers/datadoghq/watermarkpodautoscaler_controller.go

@@ -937,10 +938,10 @@ func updatePredicate(ev event.UpdateEvent) bool {
 }

 // SetupWithManager creates a new Watermarkpodautoscaler controller
-func (r *WatermarkPodAutoscalerReconciler) SetupWithManager(mgr ctrl.Manager, workers int) error {
+func (r *WatermarkPodAutoscalerReconciler) SetupWithManager(mgr ctrl.Manager, workers int, recoverPanic bool) error {


The change appears to be unrelated and I don't think it's a change we want to have as I don't think we can guarantee everything is working as it should after encountering a panic.

I split this out of this PR, but I think it's already the current behavior of the reconcilier.

``` http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.1"} 1 http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.3"} 1 http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.6"} 1 http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="1"} 1 http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="3"} 1 http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="6"} 1 http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="9"} 1 http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="20"} 1 http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="+Inf"} 1 http_client_request_duration_seconds_sum{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 0.000287125 http_client_request_duration_seconds_count{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1 http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.1"} 1 http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.3"} 1 100 32288 0 32288 0ect",le="0.6"} 1et{code="308",method="post",recommender="http://localhost:8089/autoscaling/redir 0 14.5M http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="1"} 1 http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="3"} 1 0http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="6"} 1 http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="9"} 1 http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="20"} 1 http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="+Inf"} 1 http_client_request_duration_seconds_sum{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 0.003237708 --:--:-- --:--:-- --:--:-- 15.3M http_client_request_duration_seconds_count{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1 http_client_requests_inflight 0 http_client_requests_total{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1 http_client_requests_total{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1 ```

iksaif · 2024-11-06T15:55:16Z

/merge

dd-devflow · 2024-11-06T15:55:32Z

🚂 MergeQueue: waiting for PR to be ready

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

Use /merge -c to cancel this operation!

dd-devflow · 2024-11-06T15:57:43Z

🚂 MergeQueue: pull request added to the queue

The median merge time in main is 12m.

Use /merge -c to cancel this operation!

iksaif requested a review from a team as a code owner October 31, 2024 16:00

github-actions bot requested changes Oct 31, 2024

View reviewed changes

iksaif added this to the 0.8.0 milestone Oct 31, 2024

datadog-datadog-prod-us1 bot reviewed Oct 31, 2024

View reviewed changes

vboulineau reviewed Nov 4, 2024

View reviewed changes

iksaif added 5 commits November 6, 2024 09:39

Update licenses

87be6bd

Move metrics, restore main

14b9530

Cleanup a bit the recommender code

09a3942

Remove recoverPAnic

8f57c17

iksaif force-pushed the corentin.chary/replica-recommender-3 branch from 1b798bc to 8f57c17 Compare November 6, 2024 08:39

Update the doc

61e8a7a

vboulineau approved these changes Nov 6, 2024

View reviewed changes

iksaif added the enhancement New feature or request label Nov 6, 2024

Merge branch 'main' into corentin.chary/replica-recommender-3

014f825

github-actions bot approved these changes Nov 6, 2024

View reviewed changes

dd-devflow bot added mergequeue-status: waiting mergequeue-status: queued and removed mergequeue-status: waiting labels Nov 6, 2024

dd-devflow bot added mergequeue-status: in_progress and removed mergequeue-status: queued labels Nov 6, 2024

dd-mergequeue bot merged commit cf7a732 into main Nov 6, 2024
34 checks passed

dd-mergequeue bot deleted the corentin.chary/replica-recommender-3 branch November 6, 2024 16:12

dd-devflow bot added mergequeue-status: done and removed mergequeue-status: in_progress labels Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recommender: add metrics for recommender http actions #227

recommender: add metrics for recommender http actions #227

iksaif commented Oct 31, 2024 •

edited

Loading

github-actions bot left a comment

datadog-datadog-prod-us1 bot Oct 31, 2024

vboulineau Nov 4, 2024

iksaif Nov 5, 2024

vboulineau Nov 4, 2024

iksaif Nov 5, 2024

vboulineau Nov 4, 2024

iksaif Nov 6, 2024

vboulineau Nov 4, 2024

iksaif Nov 5, 2024

iksaif commented Nov 6, 2024

dd-devflow bot commented Nov 6, 2024

dd-devflow bot commented Nov 6, 2024

recommender: add metrics for recommender http actions #227

recommender: add metrics for recommender http actions #227

Conversation

iksaif commented Oct 31, 2024 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

datadog-datadog-prod-us1 bot Oct 31, 2024

Choose a reason for hiding this comment

🔵 Code Vulnerability

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iksaif commented Nov 6, 2024

dd-devflow bot commented Nov 6, 2024

dd-devflow bot commented Nov 6, 2024

iksaif commented Oct 31, 2024 •

edited

Loading