Batching the probes to spread the workload and more exact time readings #15

fmotrifork · 2021-06-21T08:53:23Z

When having many probes on a single board instance, the current implementation results in probes waiting for each other and not for the actual services.
This results in many probes that are marked as "slow" even though the problem stems from to many open probes at once.

This PR batches the probes in 8 at a time.
It also splits the http probe time into two:

Time from request is created and until first byte is received from the server
Time from request is created and until the server finishes sending bytes.

This allows us to better understand if a slow reading is from infrastructure / loadbalancers or from a slow http service.

Lesterpig

Thanks for this pull request!

It actually changes a lot of things in the project (Dockerfile, Makefile, logging, prometheus...).
I would only consider changing the manager.go and probe/http.go files in this PR.
If needed, please open new pull requests for the proposed changes.

I also have put a few comments in the modified files.
If you have some time, could you please update your pull request? Otherwise I can do it for you 😃

Lesterpig · 2021-06-25T07:48:18Z

manager.go

 	for category, services := range manager.Services {
 		for _, service := range services {
+			// Batching the probes to spread the workload
+			if math.Mod(i, 8) == 0 {


Question: is there a reason to use a float for i?
We could use integers and check with if i % 8 == 0.

Moreover, the i = 0 line might not be necessary (or the condition can just be i >= 8).

Lesterpig · 2021-06-25T07:50:36Z

probe/http.go

-		TLSClientConfig: &tls.Config{InsecureSkipVerify: !opts.VerifyCertificate},
-	}
+	tr := http.DefaultTransport.(*http.Transport).Clone()
+	tr.TLSClientConfig = &tls.Config{InsecureSkipVerify: true}


It looks like the opts.VerifyCertificate option has been forgotten?

fmotrifork added 2 commits April 1, 2020 14:27

Add /metrics endpoint

be145ce

Batching the probes to spread the workload and more exact time readings

3764518

Lesterpig requested changes Jun 25, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching the probes to spread the workload and more exact time readings #15

Batching the probes to spread the workload and more exact time readings #15

fmotrifork commented Jun 21, 2021

Lesterpig left a comment

Lesterpig Jun 25, 2021

Lesterpig Jun 25, 2021

Batching the probes to spread the workload and more exact time readings #15

Are you sure you want to change the base?

Batching the probes to spread the workload and more exact time readings #15

Conversation

fmotrifork commented Jun 21, 2021

Lesterpig left a comment

Choose a reason for hiding this comment

Lesterpig Jun 25, 2021

Choose a reason for hiding this comment

Lesterpig Jun 25, 2021

Choose a reason for hiding this comment