Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver deploy: .../location_config: heatmap upload fails with noisy WARN message #9575

Open
problame opened this issue Oct 29, 2024 · 1 comment
Assignees
Labels
a/observability Area: related to observability c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug

Comments

@problame
Copy link
Contributor

Problem

During pageserver deploys, /location_config calls that request heatmap flush (flush_ms=Some(...)) create noise WARN messages because the secondary_controller shuts down before the mgmt API

2024-10-29T16:20:45.308290Z  WARN request{method=PUT path=/v1/tenant/TENANT_ID/location_config request_id=8dde6bcd-7e0d-4bc3-99cc-6757ffcfba98}: Failed to flush heatmap: Tenant TENANT_ID is not active

Code

if let Some(_flush_ms) = flush {
match state
.secondary_controller
.upload_tenant(tenant_shard_id)
.await
{
Ok(()) => {
tracing::info!("Uploaded heatmap during flush");
}
Err(e) => {
tracing::warn!("Failed to flush heatmap: {e}");
}
}

Solution

Introduce distinguished error and log at INFO level.

We can't do much better than this right now because we keep the whole mgmt API online during PS shutdown.
We could improve this fundamentally with a middleware that takes down most of the mgmt API early, except the status & healthcheck endpoints.
But such a change is bigger and should go through the RFC process.

@problame problame added a/observability Area: related to observability c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug labels Oct 29, 2024
@jcsp
Copy link
Collaborator

jcsp commented Nov 12, 2024

In #9574 we are hitting issue while calling into the specific heatmap upload API, whereas this ticket is in the implicit upload during transition to AttachedStale -- let's check we aren't accidentally doing both in the controller's live migration code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/observability Area: related to observability c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug
Projects
None yet
Development

No branches or pull requests

3 participants