-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise locking strategy #4087
Revise locking strategy #4087
Conversation
Signed-off-by: John Belamaric <[email protected]>
Signed-off-by: John Belamaric <[email protected]>
Signed-off-by: John Belamaric <[email protected]>
… poll Signed-off-by: John Belamaric <[email protected]>
Signed-off-by: John Belamaric <[email protected]>
Signed-off-by: John Belamaric <[email protected]>
Note - I ran this through the Nephio e2e tests and it ran through functionally successfully. There were some timeouts because everything took much longer. I think is simply because we have reduced the "chatter" on the watches and so approvals are not happening as quickly. This is actually probably a bug in the Nephio approval controller, rather than an issue with this PR. |
@@ -65,6 +65,18 @@ func (r *watcherManager) WatchPackageRevisions(ctx context.Context, filter repos | |||
r.mutex.Lock() | |||
defer r.mutex.Unlock() | |||
|
|||
// reap any dead watchers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is needed because of the previous PR that avoided reloads when the repo is unchanged. Basically, since we are not reloading, we are sending fewer events. We were previously reaping only when a send fails. This now reaps any dead watchers whenever we start a new watch, in addition to when a send fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure these actually ended up being necessary. However, I did notice that the kpt-fn repo takes about 22s to load in my system, so I am not sure 10s wait was enough, and thus am leaving this in.
} | ||
|
||
return packages, nil | ||
return nil, fmt.Errorf("not implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably delete the Package functions. This was added as part of a plan to support Package as a separate resource, but it was never completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's my plan eventually but not in this PR.
I am unable to do partial updates of the cache. From what I can see, the way the git repo code works makes this difficult if not impossible. Instead, this PR allows reads to proceed when the cache is in the process of being reconciled, which should help with the concurrency issues we are seeing. To do that, this PR revises the locking strategy for Porch repositories, so that List operations do not block while the cache is being reconciled with the repository.
reconcileMutex
is held during reconciliation of the repo to the cache, and during any operations that mutate the underlying repository (Delete, Update)mutex
read lock is held whenever the cache map is iterated over or checked for nil.It is not necessary to hold it when checking if that map is non-nil.mutex
write lock is held whenever mutations are happening to the map, or we are change the map the cache points to.