-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync PVC's Status.Phase #351
Conversation
"the PV in the super cluster is Released", what does release mean? Quoted from k8s doc: |
I'd rather double check how PV status is changed during your experiment. Note that tenant pv controller handles all pvc/pv state changes. We should be really careful about asking syncer to join the interaction. |
/retest |
In our real case, a super cluster's PVC If everything works perfectly, we should never see vPVC is bound while pPVC is not bound. Unfortunately, we did see such cases and was able to reproduce it too. If the case never happens again, the change will be a no-op. So it should be a safe change. A question I do have is whether we should reconcile a PV's |
Please see my previous comment #351 (comment). The PV controller, which sets PVC Status.Phase and VolumeName based on the PV's claimRef field is insufficient in a virtual cluster setup and can result in inconsistent status between the tenant cluster and super cluster. |
/retest-required |
No idea why |
I was thinking the claimRef which contains a UUID should be different from the newly created PVC. I never expected PV phase can change from "Released" to "Bound". |
What is the k8s version of your tenant controlplane? btw. |
The following is a real case. We have not figured out why the super cluster PVC was marked as
The Tenant PV
Tenant PVC
|
Thanks for the clarification. This makes much more sense. I didn't think about this workflow while I was validating whether pvc/pv works before. I think "removing the PV's claimRef" has not been populated down to pPV if I recall. We only have uws for pv, no dws for pv. If we have populated |
Fei, in both cases above, the PVs in the super cluster had The PR does not intend to automatically resolve the issue, as this would necessitate substantial effort. Instead, the objective is to address situations where statuses are inconsistent, particularly when a tenant cluster shows a normal condition (Bound), while the underlying super cluster does not. In such cases, we will adjust the tenant cluster's status to 'not Bound'. This will allow users/administrators to investigate and fix the issue offline. Under no circumstances should there be a scenario where the tenant cluster's PVC is bound while the super cluster is not. This represents the primary goal of the PR. I hope it makes sense. it's a great discussion. Thanks! |
"The PR does not intend to automatically resolve the issue," I agree this is a subtle problem and you are trying to provide mitigation. In this case, would you please add a feature gate at least for now so this would not cause any side effect for other users? |
Good idea! I've added a feature gate
|
Can you please fix the lint error (may not due to your change though) and the test failure? |
2b295a4
to
7ddde5a
Compare
Fixed it. |
The latest commit does not seem to be the correct one (no feature gate). Can you double check? |
Thanks for catching it. My bad. I checked in a wrong version. |
/retest-required |
/retest |
Some strange errors. I'm looking into it. |
Fix issues in nameing and format Fix klog import issue Update go.mod and go.sum Revert "Update go.mod and go.sum" This reverts commit fd7604abcf3cb81e3bf75fbc6ca166e1bc9bc34a. Fix go.mod and go.sum Add featuregate Fix lint error Fix lint errors
@Fei-Guo , @christopherhein , all tests passed. PTAL. Thanks! |
/lgtm |
/lgtm 🎉🎉 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: christopherhein, yuanchen8911 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
The current syncer doesn't sync
PVC.Status.Phase
in a tenant cluster. We have observed that a super cluster's PV and PVC are notBound
while the the tenant cluster's PV and PVC areBound
, which might be caused by bug/race condition in the underlying controllers. It should never happen.It can be reproduced by deleting a bound PVC in a tenant cluster, recreating the PVC and bound it again. The resulting PV and PVC in the tenant cluster will show
Bound
and the PV in the super cluster isReleased
and the PVC isPending
.The PR updates a tenant cluster's PVC
Status.Phase
with the super cluster'sStatus.Phase
if the tenant cluster PVC isBound
and the super cluster PVC is notBound
, e.g.,Pending
,Lost
. The other conditions are still handled and reconciled by the PV controller.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #