Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/9.0] Make resource HealthStatus computed from HealthReports (#6368) #6458

Merged
merged 2 commits into from
Oct 24, 2024

Conversation

adamint
Copy link
Member

@adamint adamint commented Oct 23, 2024

Backport of #6368 to release/9.0

Customer Impact

  • Removes the ability to set HealthStatus directly on a resource, and removes the field from the resource service protocol
  • Fixes a bug where health reports will not be updated if the overall health of the resource does not change
  • Fixes a bug where the health status is not shown in resource details while waiting for health checks to initially run

Testing

Much manual testing, as well as additional unit tests for computing health status.

Risk

Medium. It is possible that this breaking change may impact existing projects that set HealthStatus, and may impact customer assumptions during resource updates - for example, resources now start as unhealthy if they are running and wait on a health check, instead of null, as it previously was.

Regression?

No

Microsoft Reviewers: Open in CodeFlow
Microsoft Reviewers: Open in CodeFlow

* Make resource HealthStatus computed from HealthReports
* Update health reports when they have changed but the status has not

---------

Co-authored-by: Adam Ratzman <[email protected]>
Co-authored-by: Mitch Denny <[email protected]>

(cherry picked from commit a4ef97a)
@davidfowl
Copy link
Member

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@joperezr
Copy link
Member

Thanks @adamint, I also understand this is something that ACA team needs. Before we mark it as approved though, do you mind expanding a bit on the risk? So as opposed to just saying Medium, elaborate a bit more on what the potential issues are with this?

[InlineData(KnownResourceState.Running, DiagnosticsHealthStatus.Healthy, new string[]{})]
[InlineData(KnownResourceState.Running, DiagnosticsHealthStatus.Healthy, new string?[] {"Healthy"})]
[InlineData(KnownResourceState.Running, DiagnosticsHealthStatus.Unhealthy, new string?[] {null})]
[InlineData(KnownResourceState.Running, DiagnosticsHealthStatus.Degraded, new string?[] {"Healthy", "Degraded"})]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need tests for other states like Exited, or FailedToStart, building etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t, all states other than Running are treated the same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t, all states other than Running are treated the same

Maybe tests to check for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitionally true, but I can add these tests. Assuming they just need to go into main?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this in #6461

@radical
Copy link
Member

radical commented Oct 23, 2024

Fixes a bug where health reports will not be updated if the overall health of the resource does not change
Fixes a bug where the health status is not shown in resource details while waiting for health checks to initially run

Can we have tests for these?

@adamint
Copy link
Member Author

adamint commented Oct 23, 2024

Thanks @adamint, I also understand this is something that ACA team needs. Before we mark it as approved though, do you mind expanding a bit on the risk? So as opposed to just saying Medium, elaborate a bit more on what the potential issues are with this?

@joperezr sorry, the edit didn’t save. I’ve added this information now

@adamint
Copy link
Member Author

adamint commented Oct 23, 2024

Fixes a bug where health reports will not be updated if the overall health of the resource does not change

I've added this in #6461

Fixes a bug where the health status is not shown in resource details while waiting for health checks to initially run

There is already a test for this

@joperezr
Copy link
Member

Thanks for adding the additional info, @adamint . I'm trying to parse your risk assessment and specifically to understand who might be broken by this. Am I understanding correctly that in order to be in a broken state is if I am explicitly setting a status for a resource inside my AppHost code? If they wanted to continue to do that, how can they go about it? Also, assuming there is no way to do this in a non-breaking way?

@adamint
Copy link
Member Author

adamint commented Oct 23, 2024

Am I understanding correctly that in order to be in a broken state is if I am explicitly setting a status for a resource inside my AppHost code?

It's a breaking code change; previously, you were able to directly set the HealthStatus and HealthReports for a resource yourself, either through setting the initial state or publishing updates. Now, you are no longer able to do either. Instead, you must add health checks and then the AppHost will compute the health reports and status for you.

If they wanted to continue to do that, how can they go about it? Also, assuming there is no way to do this in a non-breaking way?

We have made the decision to disallow that behavior, so they cannot continue to do that. Now that the only component that can set the HealthStatus is the AppHost, there is only one source of truth.

@joperezr

@joperezr
Copy link
Member

Got it. Last question. Do we know how common is it for people to set their own Health status? Given this is a major version, it is technically ok to make this sort of breaking change, but it does feel like the type of change you don't want to take this late after snap as there will be less time (basically none) to react to feedback. I'm wondering if we considered quirking this, i.e. allowing the users to set an app context switch that allows them to continue setting the status and marking the API call as obsolete, and then fully removing it in the next major. This essentially is to reduce friction in adoption from 8 to 9.

@davidfowl
Copy link
Member

This change needs to go in as it has API changes to both the gRPC proto file and the app model. It also fixes bugs with the current health checks implementation that affect the new dashboard in ACA.

@joperezr
Copy link
Member

Any comments around whether or not we should quirk this? How concerned are we about breaking existing customers?

@davidfowl
Copy link
Member

No we're not quirking this. This should go in as is.

Copy link
Member

@joperezr joperezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Merging as is, but let's keep an eye out for feedback on this.

@joperezr joperezr merged commit b1e81d7 into release/9.0 Oct 24, 2024
9 checks passed
@joperezr joperezr deleted the adamint/backport-computed-healthstatus branch October 24, 2024 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants