Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recommender: add metrics for recommender http actions #227

Merged
merged 7 commits into from
Nov 6, 2024

Conversation

iksaif
Copy link
Collaborator

@iksaif iksaif commented Oct 31, 2024

# TYPE http_client_request_duration_seconds histogram
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.1"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.3"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.6"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="1"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="3"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="6"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="9"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="20"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="+Inf"} 1
http_client_request_duration_seconds_sum{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 0.000287125
http_client_request_duration_seconds_count{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.1"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.3"} 1
100 32288    0 32288    0ect",le="0.6"} 1et{code="308",method="post",recommender="http://localhost:8089/autoscaling/redir
     0  14.5M   http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="1"} 1
  http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="3"} 1
 0http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="6"} 1
 http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="9"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="20"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="+Inf"} 1
http_client_request_duration_seconds_sum{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 0.003237708
http_client_request_duration_seconds_count{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
# HELP http_client_requests_inflight Tracks the number of client requests currently in progress.
# TYPE http_client_requests_inflight gauge
http_client_requests_inflight 0
# HELP http_client_requests_total Tracks the number of HTTP requests.
# TYPE http_client_requests_total counter
http_client_requests_total{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
http_client_requests_total{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
wpa_controller_value{metric_name="http://localhost:8089/autoscaling/redirect",resource_kind="Deployment",resource_name="metrics-server",resource_namespace="kube-system",wpa_name="example-external-recommender",wpa_namespace="kube-system"} 10```

@iksaif iksaif requested a review from a team as a code owner October 31, 2024 16:00
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request does not contain a valid label. Please add one of the following labels: bug, enhancement, refactoring, documentation, tooling, dependencies

@iksaif iksaif added this to the 0.8.0 milestone Oct 31, 2024
@@ -25,6 +25,8 @@ func int32Ptr(i int32) *int32 {
// Static autoscaler that always recommends to scale up by 1 replica, can be tried with
// curl -X POST -d '{"state": {"currentReplicas": 1}, "targets": [{"type": "cpu", "targetValue": 0.5}]}' http://localhost:8089/autoscaling
func autoscaling(w http.ResponseWriter, r *http.Request) {
log.Printf("Handling %s %s %s bytes\n", r.Method, r.URL.Path, r.Header.Get("Content-Length"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Code Vulnerability

Avoid leaking errors and data (...read more)

Passing errors directly to log.Println or log.Printf functions can lead to data leakage because these functions typically write the output to a log file, console, or another output without considering the sensitivity of the error message. Error messages often contain sensitive information, such as system paths, user data, or internal application details, which should not be exposed in logs that are accessible to users or unauthorized individuals.

To avoid data leakage when logging errors, it is essential to properly handle error messages by redacting or obfuscating sensitive information before logging them. Instead of passing errors directly to log.Println or log.Printf, it is recommended to extract only the necessary information from the error and log a more generic and secure error message that does not expose sensitive details.

Additionally, consider implementing secure logging practices, such as logging error codes or identifiers rather than full error messages, implementing access controls to restrict access to logs containing sensitive information, and regularly reviewing and auditing the contents of logs to ensure they do not leak sensitive data. By following these best practices, you can improve the security and privacy of your application's logging mechanisms and prevent potential data leakage incidents.

View in Datadog  Leave us feedback  Documentation

log.Printf("> %+v\n\n", response)
}

// Redirects /autoscaling/redirect to /autoscaling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this redirect?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to test that redirects work, as they will be needed by most implementations (most of the time you end up having to redirect to the specific shard of the control plane that can give you the recommendation)

"google.golang.org/protobuf/encoding/protojson"
"google.golang.org/protobuf/types/known/structpb"

autoscaling "github.com/DataDog/agent-payload/v5/autoscaling/kubernetes"
"github.com/DataDog/watermarkpodautoscaler/apis/datadoghq/v1alpha1"
)

var (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics are currently all aggregated in metrics.go.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving

}

func (r *RecommenderClientImpl) instrumentedClient(recommender string) *http.Client {
client := *r.client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code can be misleading, it's definitely possible to change behaviour of original client still. In that case .Transport is only changed and you can hope that the wrapping code does create a new RT and not modify the existing one but there's no guarantee on that.

I think it warrants at least a comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -937,10 +938,10 @@ func updatePredicate(ev event.UpdateEvent) bool {
}

// SetupWithManager creates a new Watermarkpodautoscaler controller
func (r *WatermarkPodAutoscalerReconciler) SetupWithManager(mgr ctrl.Manager, workers int) error {
func (r *WatermarkPodAutoscalerReconciler) SetupWithManager(mgr ctrl.Manager, workers int, recoverPanic bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change appears to be unrelated and I don't think it's a change we want to have as I don't think we can guarantee everything is working as it should after encountering a panic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split this out of this PR, but I think it's already the current behavior of the reconcilier.

```
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.1"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.3"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.6"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="1"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="3"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="6"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="9"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="20"} 1
http_client_request_duration_seconds_bucket{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="+Inf"} 1
http_client_request_duration_seconds_sum{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 0.000287125
http_client_request_duration_seconds_count{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.1"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="0.3"} 1
100 32288    0 32288    0ect",le="0.6"} 1et{code="308",method="post",recommender="http://localhost:8089/autoscaling/redir
     0  14.5M   http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="1"} 1
  http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="3"} 1
 0http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="6"} 1
 http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="9"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="20"} 1
http_client_request_duration_seconds_bucket{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect",le="+Inf"} 1
http_client_request_duration_seconds_sum{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 0.003237708
--:--:-- --:--:-- --:--:-- 15.3M
http_client_request_duration_seconds_count{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
http_client_requests_inflight 0
http_client_requests_total{code="200",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
http_client_requests_total{code="308",method="post",recommender="http://localhost:8089/autoscaling/redirect"} 1
```
@iksaif iksaif force-pushed the corentin.chary/replica-recommender-3 branch from 1b798bc to 8f57c17 Compare November 6, 2024 08:39
@iksaif iksaif added the enhancement New feature or request label Nov 6, 2024
@iksaif
Copy link
Collaborator Author

iksaif commented Nov 6, 2024

/merge

@dd-devflow
Copy link

dd-devflow bot commented Nov 6, 2024

🚂 MergeQueue: waiting for PR to be ready

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

Use /merge -c to cancel this operation!

@dd-devflow
Copy link

dd-devflow bot commented Nov 6, 2024

🚂 MergeQueue: pull request added to the queue

The median merge time in main is 12m.

Use /merge -c to cancel this operation!

@dd-mergequeue dd-mergequeue bot merged commit cf7a732 into main Nov 6, 2024
34 checks passed
@dd-mergequeue dd-mergequeue bot deleted the corentin.chary/replica-recommender-3 branch November 6, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants