Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent hashing: consistently retry another host #119

Merged
merged 4 commits into from
Dec 15, 2023

Conversation

philandstuff
Copy link
Contributor

This implements a retry policy to consistently retry another host within the SRV set if we get a failure talking to the original host.

We still use the original fallback strategy if this retry fails. (Should we?)

This implements a retry policy to consistently retry another host within the SRV
set if we get a failure talking to the original host.

We still use the original fallback strategy if this retry fails. (Should we?)
@philandstuff philandstuff requested a review from a team December 14, 2023 18:46
Copy link
Member

@tempusfrangit tempusfrangit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with the caveat that I have not tested the code directly in the validation environment. It looks sane. I largely agree with @nickstenning 's comments.

This extracts the consistent hashing code into a new package and writes more
tests for it.

This removes a bunch of duplication from ConsistentHashingMode.

As a consequence this jumbles the consistent hashing algorithm (because I've
embedded the Key into a top-level CacheKey struct which includes Attempt for
retry support). At this stage this is safe because nothing live is using it.
rename method

avoid implicit return

remove unneeded conditional on m.DomainsToCache
Copy link
Member

@tempusfrangit tempusfrangit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I still have not tested in the validation environment. But I am willing to land this and iterate on further releases as we need for minor adjustments/critical oversights.

Comment on lines +21 to +45
func HashBucket(key any, buckets int, previousBuckets ...int) (int, error) {
if len(previousBuckets) >= buckets {
return -1, fmt.Errorf("No more buckets left: %d buckets available but %d already attempted", buckets, previousBuckets)
}
// we set IgnoreZeroValue so that we can add fields to the hash key
// later without breaking things.
// note that it's not safe to share a HashOptions so we create a fresh one each time.
hashopts := &hashstructure.HashOptions{IgnoreZeroValue: true}
hash, err := hashstructure.Hash(cacheKey{Key: key, Attempt: len(previousBuckets)}, hashstructure.FormatV2, hashopts)
if err != nil {
return -1, fmt.Errorf("error calculating hash of key: %w", err)
}

// jump is an implementation of Google's Jump Consistent Hash.
//
// See http://arxiv.org/abs/1406.2294 for details.
bucket := int(jump.Hash(hash, buckets-len(previousBuckets)))
slices.Sort(previousBuckets)
for _, prev := range previousBuckets {
if bucket >= prev {
bucket++
}
}
return bucket, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so nice to have it extracted!

Comment on lines +279 to +281
if err != nil {
// return origErr so that we can use our regular fallback strategy
return nil, origErr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're going to want this for the foreseeable future. As reliability is the real goal, I want to be durable for "cache disappears" but we still serve inference.

That said, I hear the earlier comments about "do we need this".

// retry, you can pass previousBuckets, which indicates buckets which must be
// avoided in the output. HashBucket will modify the previousBuckets slice by
// sorting it.
func HashBucket(key any, buckets int, previousBuckets ...int) (int, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice. I like the derived/wrapped key a lot.

@philandstuff philandstuff merged commit 3e641a5 into main Dec 15, 2023
4 checks passed
@philandstuff philandstuff deleted the consistent-hash-retry branch December 15, 2023 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants