Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise locking strategy #4087

Merged
merged 6 commits into from
Nov 21, 2023
Merged

Conversation

johnbelamaric
Copy link
Contributor

@johnbelamaric johnbelamaric commented Nov 20, 2023

I am unable to do partial updates of the cache. From what I can see, the way the git repo code works makes this difficult if not impossible. Instead, this PR allows reads to proceed when the cache is in the process of being reconciled, which should help with the concurrency issues we are seeing. To do that, this PR revises the locking strategy for Porch repositories, so that List operations do not block while the cache is being reconciled with the repository.

  • List no longer loads the cache if it is empty. Instead, it blocks until the cache is loaded via the reconciliation background go routine.
  • The reconcileMutex is held during reconciliation of the repo to the cache, and during any operations that mutate the underlying repository (Delete, Update)
  • The mutex read lock is held whenever the cache map is iterated over or checked for nil. It is not necessary to hold it when checking if that map is non-nil.
  • The mutex write lock is held whenever mutations are happening to the map, or we are change the map the cache points to.

@johnbelamaric
Copy link
Contributor Author

Note - I ran this through the Nephio e2e tests and it ran through functionally successfully. There were some timeouts because everything took much longer. I think is simply because we have reduced the "chatter" on the watches and so approvals are not happening as quickly. This is actually probably a bug in the Nephio approval controller, rather than an issue with this PR.

@@ -65,6 +65,18 @@ func (r *watcherManager) WatchPackageRevisions(ctx context.Context, filter repos
r.mutex.Lock()
defer r.mutex.Unlock()

// reap any dead watchers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is needed because of the previous PR that avoided reloads when the repo is unchanged. Basically, since we are not reloading, we are sending fewer events. We were previously reaping only when a send fails. This now reaps any dead watchers whenever we start a new watch, in addition to when a send fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure these actually ended up being necessary. However, I did notice that the kpt-fn repo takes about 22s to load in my system, so I am not sure 10s wait was enough, and thus am leaving this in.

}

return packages, nil
return nil, fmt.Errorf("not implemented")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably delete the Package functions. This was added as part of a plan to support Package as a separate resource, but it was never completed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's my plan eventually but not in this PR.

@johnbelamaric johnbelamaric merged commit 278f16a into kptdev:main Nov 21, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants