RHINENG-11927 - New API for uploading tsdb blocks #748

patilsuraj767 · 2024-08-26T09:03:37Z

This PR adds new /api/v1/upload API where user can upload tsdb blocks packed in .tar.gz archive.

Below is the short summary of how upload API will work.

User uploads the tsdb blocks packed in .tar.gz file. User can also add external_labels field in the multipart request. These external labels will be further added to all time series present in tsdb block.
```
curl -v -F "external_labels={\"cluster_id\":\"123\"}" -F "file=@/tmp/tmp/test.tar.gz" -H "Authorization: Bearer ${TOKEN}"  
http://localhost:8080/api/metrics/v1/test-oidc/api/v1/upload
```
Once the request is received to the obs-api it checks the file size and later extract the tar archive.
After extraction of tar, we read each tsdb blocks and creates the new blocks by adding the external_labels.
later these new blocks are uploaded to the object storage.

patilsuraj767 · 2024-08-26T09:24:27Z

All the PR checks are failing because go version has been updated. go version is auto updated by go mod tidy because the new dependency github.com/thanos-io/thanos uses go 1.22.0

saswatamcode

Thanks for your work on this!

I think usecase we are trying to enable here, is upload of tsdb blocks from "disconnected" environments. In this case, I don't think doing entire flow through Observatorium API would be correct.

Also, this does quite a lot of block relabeling during the request itself which really should be a background job based on some queue. You've limited compressed block size to 200mb but we might get larger.

So I would re-evaluate the design here a bit, and if we even need to do this in the API itself. cc: @moadz

saswatamcode · 2024-08-26T09:46:04Z

api/metrics/v1/http.go

+	if bkt != nil {
+		r.Group(func(r chi.Router) {
+			r.Use(server.StripTenantPrefix("/api/metrics/v1"))
+			r.Post(UploadRoute, func(w http.ResponseWriter, r *http.Request) {


This route is doing a lot of processing, that could be done in the background.

Yes, I agree we should do this async. We will need to decide if should we have external service like Kafka, rabbitmq where we can park these income request and process them asynchronously OR we have a simple go routine which will process the request in the background.

Problem with doing that in a goroutine is that it's not durable. If the process dies for any reason (e.g. rolling out a new version) then the goroutine goes away and the blocks were never relabeled

Yup, I meant doing this in some completely separate process instead is preferable.

Can we do something like -
When upload request is received we upload those .tar.gz archive to object store and than push a json message(containing a archive link) to a kafka topic.
Than create a separate kafka consumer service which will consume these json message, download the .tar.gz file from object store and relabel the tsdb blocks and upload to the new object store bucket where Thanos store gateway is configured.

patilsuraj767 · 2024-08-26T10:09:50Z

Thanks for the review @saswatamcode
Regarding disconnected env - In case of ROS, OpenShift cluster/cost-operator will not directly upload the tsdb blocks to obs-api. It uploads it first to an intermediate service which then uploads block to obs-api. In case of disconnected clusters user upload tsdb blocks manually to the intermediate service which is then forward to observatorium.

moadz · 2024-08-28T09:28:31Z

Hey @patilsuraj767 thanks for raising this PR!

I have a few potential contentions with this approach wrt how it impacts observatorium-api's statelessness:

Generally obs-api is completely stateless authn/authz proxy. My highest concern is that this process would make obs-api stateful as it has to store the blocks on disk during the upload process, and undertake mutations on the block before it is ready to query. These blocks can be gigabytes in size, and we would be paying for ingress to obs-api, and then uploading when ingress is generally free on direct upload.

My suggestion is as follows:

We expose a getSignedUrl endpoint under the tenant metrics blocks api e.g. {tenant_id}/metrics/api/v1/blocks/getSignedUrl?block-id={SOME_BLOCK_ID} that takes a block-id (should be consistent with the uploaded block ID and returns a single use signed URL to object storage that permits the client to upload the block independently of obs-api. The signed URL would be tied to a request, and scoped to the block-id, so a signed URL cannot be reused for uploading other blocks.
After the block is uploaded, we create an endpoint for marking a block as 'ready for querying' {tenant_id}/metrics/api/v1/blocks/mark?block-id={SOME_BLOCK_ID} that takes the block-id (which is now already uploaded) and manipulates the manifest external labels to make it queryable by Thanos. We can then use http error codes to reflect whether the block was not found, found but not relabled or relabled successfully.

With respect to compression, my intuition is that we should not upload a compressed archive, but to decompress on disk (in where the upload is happening) and upload directly using the signed URL. Ingress is generally free for s3 providers, and we would simplify the marking later on. Let me know if you'd like to schedule a call to discuss this fruther.

patilsuraj767 · 2024-08-29T08:07:55Z

Hello @moadz, Thanks for reviewing the PR.
I got the first point of your suggestion regarding the getSignedUrl endpoint but didn't completely understand the 2nd point.
Particularly I didn't understand what you mean by - manipulates the manifest external labels to make it queryable by Thanos I have very limited knowledge of how Thanos works. Can you please elaborate it for me.

Do you mean we will add external labels in meta.json file leveraging this struct - https://github.com/thanos-io/thanos/blob/main/pkg/block/metadata/meta.go#L76?
If Yes, Than I have further question on this. -

Suppose If we add cluster_id = 123 in external_labels in meta.json file than will Querier consider those labels while querying metrics? If I query any time series which is inside that tsdb block like - metrics_name{cluster: "123"} will this work?
If there are two tsdb blocks say BLOCK-1 and BLOCK-2 and BLOCK-1 meta.json file has meta.Thanos.Labels: cluster_id=111 and BLOCK-2 meta.json has external label meta.Thanos.Labels: cluster_id=222 than will compactor make sure that it won't merge these 2 blocks?

And Yes, we can schedule a call to discuss this further.

patilsuraj767 · 2024-09-02T11:03:44Z

Hello, I got the answers for above questions please ignore it.

I have doubt regarding {tenant_id}/metrics/api/v1/blocks/mark?block-id={SOME_BLOCK_ID} API. How do we mark a block 'ready for querying' OR 'not ready for querying'?
I didn't find anything under BlockMeta that we can use to mark block ready or not ready for querying.

moadz · 2024-09-05T08:47:11Z

I have doubt regarding {tenant_id}/metrics/api/v1/blocks/mark?block-id={SOME_BLOCK_ID} API. How do we mark a block 'ready for querying' OR 'not ready for querying'?

We deploy Thanos behind Observatorium (https://github.com/observatorium/api) where we manage tenancy based on the presence of the tenant_id label. Marking a block would simply add the tenant_id as an external label to the block to identify those metrics as belonging to the ROS tenant. In terms of querying - it won't impact much as we expect the tenant label to exist in the metrics. So perhaps we need to spike this an explore if simply adding the external label as tenant is sufficient to make it queryable.

@patilsuraj767

patilsuraj767 · 2024-09-05T09:59:24Z

Thanks @moadz Got your point.
Regarding Thanos Compactor, Do you think thanos compactor can cause any issue here? I mean time between the block is uploaded and the mark API is called, within that time if compactor is involved and try to compact blocks than it might happen that compactor can combine two blocks from different tenants.

moadz · 2024-09-08T13:04:26Z

Thanks @moadz Got your point. Regarding Thanos Compactor, Do you think thanos compactor can cause any issue here? I mean time between the block is uploaded and the mark API is called, within that time if compactor is involved and try to compact blocks than it might happen that compactor can combine two blocks from different tenants.

As long as the external label is set on the block for the cluster-id before you upload it should be fine. It's feasible that someone may not one them not to be compacted, if you imagine they all belong to the same tenant, they will have the tenant label marked and will be compacted together independent of the cluster label. So it's on the uploader to determine the sharding they want for blocks at rest.

RHINENG-11927 - New API for uploading tsdb blocks

441d53b

saswatamcode requested changes Aug 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHINENG-11927 - New API for uploading tsdb blocks #748

RHINENG-11927 - New API for uploading tsdb blocks #748

patilsuraj767 commented Aug 26, 2024

patilsuraj767 commented Aug 26, 2024

saswatamcode left a comment •

edited

Loading

saswatamcode Aug 26, 2024

patilsuraj767 Aug 26, 2024

squat Aug 26, 2024

saswatamcode Aug 26, 2024

patilsuraj767 Aug 26, 2024

patilsuraj767 commented Aug 26, 2024

moadz commented Aug 28, 2024

patilsuraj767 commented Aug 29, 2024

patilsuraj767 commented Sep 2, 2024

moadz commented Sep 5, 2024

patilsuraj767 commented Sep 5, 2024

moadz commented Sep 8, 2024

RHINENG-11927 - New API for uploading tsdb blocks #748

Are you sure you want to change the base?

RHINENG-11927 - New API for uploading tsdb blocks #748

Conversation

patilsuraj767 commented Aug 26, 2024

patilsuraj767 commented Aug 26, 2024

saswatamcode left a comment • edited Loading

Choose a reason for hiding this comment

saswatamcode Aug 26, 2024

Choose a reason for hiding this comment

patilsuraj767 Aug 26, 2024

Choose a reason for hiding this comment

squat Aug 26, 2024

Choose a reason for hiding this comment

saswatamcode Aug 26, 2024

Choose a reason for hiding this comment

patilsuraj767 Aug 26, 2024

Choose a reason for hiding this comment

patilsuraj767 commented Aug 26, 2024

moadz commented Aug 28, 2024

patilsuraj767 commented Aug 29, 2024

patilsuraj767 commented Sep 2, 2024

moadz commented Sep 5, 2024

patilsuraj767 commented Sep 5, 2024

moadz commented Sep 8, 2024

saswatamcode left a comment •

edited

Loading