KAFKA-17948: Potential issue during tryComplete and onComplete simultaneous calls to access global variables #17739

adixitconfluent · 2024-11-10T14:28:07Z

About

This PR addresses the following issues -

KAFKA-17984: Potential issue during tryComplete and onComplete simultaneous calls to access global variables
Pending minor comments on AK PR KAFKA-17743: Add minBytes implementation to DelayedShareFetch #17539 -
a. KAFKA-17743: Add minBytes implementation to DelayedShareFetch #17539 (comment)
b. KAFKA-17743: Add minBytes implementation to DelayedShareFetch #17539 (comment)
c. KAFKA-17743: Add minBytes implementation to DelayedShareFetch #17539 (comment)

Testing

Testing has been done with the help of new/already present unit tests and already present integration tests.

…ween tryComplete and onComplete in DelayedShareFetch

…rom Jun on PR#17539

core/src/main/java/kafka/server/share/DelayedShareFetch.java

…etadata

junrao

@adixitconfluent : Thanks for the PR. Left a comment.

junrao · 2024-11-12T20:01:49Z

core/src/main/java/kafka/server/share/SharePartition.java

@@ -1602,8 +1602,6 @@ protected void updateFetchOffsetMetadata(Optional<LogOffsetMetadata> fetchOffset
    protected Optional<LogOffsetMetadata> fetchOffsetMetadata() {
        lock.readLock().lock();
        try {
-            if (findNextFetchOffset.get())


Since we update fetchOffsetMetadata everytime we change endOffset, I don't think that we should have dependency on findNextFetchOffset while getting the value of fetchOffsetMetadata

Hmm, the issue is that nextFetchOffset doesn't return endOffset if findNextFetchOffset is true. Currently we only reset fetchOffsetMetadata when updating the endOffset. It's possible that findNextFetchOffset stays on for multiple fetches without changing endOffset. In that case, we will set fetchOffsetMetadata for the first fetch and keep reusing it for subsequent fetches, which will be incorrect.

Hi @junrao, so shall we do this, that when the call goes to fetchOffsetMetadata(), we check if findNextFetchOffset is true or not, in case it is true, we do a call to nextFetchOffset() which will correctly update the endOffset if it needs to be updated or not. Finally, we just return fetchOffsetMetadata. Do you think it will work?

hi @junrao, now that I think more about it, IIUC, considering the common case when all fetched data is acquirable -

acknowledgements/acquisition lock timeout/ release of records on session close are the only places where we set findNextFetchOffset to true

In all the 3 scenarios mentioned above, if there is a change to the endOffset, we update the endOffset (thereby fetchOffsetMetadata is also updated automatically with our changes)
Hence, I feel that the findNextFetchOffset shouldn't be considered when dealing with the common case.
In the not common cases, when Log Start Offset is later than the fetch offset and we need to archive records, then we set findNextFetchOffset to True. But we have done the minBytes implementation only for the common cases right now, hence i feel the current change is correct. Please correct me if I am wrong.
cc - @apoorvmittal10

Yes, I agree findNextFetchOffset=true is the uncommon case. It might be useful to at least have some kind of consistent behavior for the uncommon case. Since the minByte estimation will be off anyway in this case, we could choose to consistent satisfy the request immediately or wait for the timeout. With the logic in this PR, because fetchOffsetMetadata can be outdated in this uncommon case, it's not clear when the request will be satisfied.

@junrao, so, should I do what I suggested #17739 (comment) here, that when the call goes to fetchOffsetMetadata(), we check if findNextFetchOffset is true or not, in case it is true, we do a call to nextFetchOffset() which will correctly update the endOffset if it needs to be updated or not. Finally, we just return fetchOffsetMetadata OR for the uncommon case, I update the fetchOffsetMetadata to Optional.empty() and remove any dependency on findNextFetchOffset

Yeah that works as well @junrao, we can have such structure in SharePartition to accomodate.

but I wanted to confirm whether it's necessary

Yes it's an optimization over current implemetation where we might do unnecessary processing because of released records.

@AndrewJSchofield, in theory, AdminClient.alterShareGroupOffsets() can initialize to an arbitrary offset right?

Yes, indeed. Great point.

hi everyone, thanks for your inputs. @junrao , I think it is considerably different than the objective for this PR, hence I have created JIRA https://issues.apache.org/jira/browse/KAFKA-18022 to track this issue. I would like to address this in a new PR, if it is fine to you. Meanwhile I am reverting the code change for this code line and updating the PR description to reflect the same.

core/src/main/java/kafka/server/share/DelayedShareFetch.java

apoorvmittal10 · 2024-11-12T21:25:28Z

core/src/main/java/kafka/server/share/DelayedShareFetch.java

@@ -90,39 +90,50 @@ public void onExpiration() {
     */
    @Override
    public void onComplete() {
+        // We are utilizing lock so that onComplete doesn't do a dirty read for global variables -
+        // partitionsAcquired and partitionsAlreadyFetched, since these variables can get updated in a different tryComplete thread.
+        lock.lock();


So now for share fetch trycomplete and oncomplete will be under lock. Seems fine as anyways the execution should be sequential.

apoorvmittal10 · 2024-11-12T21:28:31Z

core/src/main/java/kafka/server/share/DelayedShareFetch.java

+            if (shareFetchData.future().isDone())
+                return;


As we have this check here for share fetch future completion, so if there are locks acquired for share partitions but the share fetch future is already completed in line 101 then how will they be released? I don't think code handles that.

yeah, I agree that's a super corner case scenario, but definitely possible. I have pushed a fix for it. Thanks for pointing it out.

Hmm, could shareFetchData.future().isDone() be true inside onComplete()? We complete the future only after DelayedOperation.completed is set to true. After that point, onComplete() is not expected to be called again.

So, if we have 2 different keys corresponding to a ShareFetch request, it could be a case that for one of those keys, we get a checkAndComplete call which could result in completing the share fetch request. Now when the purgatory entry corresponding to the other key could timeout/have checkAndComplete triggered, when the code reaches onComplete, the share fetch request's future was already complete, so it would hit shareFetchData.future().isDone() and return true.

onComplete is always called through forceComplete, right? So, only one thread could ever call onComplete.

public boolean forceComplete() { if (completed.compareAndSet(false, true)) { // cancel the timeout timer cancel(); onComplete(); return true; } else { return false; } }

Hi @junrao , you're right. There was a gap in my understanding of purgatory operation where I thought the the copy of the operation goes to multiple watch keys used for that operation, but this line in documentation cleared it out.

Note that a delayed operation can be watched on multiple keys. It is possible that an operation is completed after it has been added to the watch list for some, but not all the keys. In this case, the operation is considered completed and won't be added to the watch list of the remaining keys. The expiration reaper thread will remove this operation from any watcher list in which the operation exists.

Hence, I've removed the mentioned condition from the code now. Thanks!

junrao

@adixitconfluent : Thanks for the updated PR. LGTM. I have a minor comment and we can address that in your followup PR.

junrao · 2024-11-15T18:24:31Z

core/src/main/java/kafka/server/share/DelayedShareFetch.java

@@ -91,39 +90,47 @@ public void onExpiration() {
     */
    @Override
    public void onComplete() {
+        // We are utilizing lock so that onComplete doesn't do a dirty read for global variables -


There are instance variables, not global.

adixitconfluent added 3 commits November 10, 2024 13:40

Created branch

d3c02f9

Added lock from DelayedOperation to avoid dirty read of variables bet…

c23c085

…ween tryComplete and onComplete in DelayedShareFetch

Used linkedhashmap instead of map corresponding to pending comments f…

090c5f9

…rom Jun on PR#17539

github-actions bot added core Kafka Broker KIP-932 Queues for Kafka labels Nov 10, 2024

adixitconfluent marked this pull request as ready for review November 10, 2024 14:41

apoorvmittal10 added the ci-approved label Nov 10, 2024

AndrewJSchofield suggested changes Nov 10, 2024

View reviewed changes

core/src/main/java/kafka/server/share/DelayedShareFetch.java Outdated Show resolved Hide resolved

adixitconfluent marked this pull request as draft November 10, 2024 17:30

adixitconfluent added 2 commits November 10, 2024 23:31

Made locking and unlocking more robust + minor refactor

b0f51ac

Removed depenedency on findNextFetchOffset while getting fetchOffsetM…

8c0a69d

…etadata

adixitconfluent marked this pull request as ready for review November 10, 2024 18:24

adixitconfluent requested a review from AndrewJSchofield November 10, 2024 18:24

adixitconfluent added 2 commits November 11, 2024 10:30

Trigger build

479c1a5

Trigger build

924a57d

junrao reviewed Nov 12, 2024

View reviewed changes

apoorvmittal10 reviewed Nov 12, 2024

View reviewed changes

Addressed Apoorv's round 1 comments

3281e41

adixitconfluent requested a review from apoorvmittal10 November 13, 2024 06:22

Trigger build

dcbbb15

adixitconfluent requested a review from junrao November 13, 2024 21:16

adixitconfluent added 5 commits November 14, 2024 12:24

Addressed Jun's comment around completed future

7db5966

Trigger build

8050817

Merge remote-tracking branch 'origin/trunk' into kafka-17948

9b3a514

Fixed issues due to the merge from trunk

a27276a

Trigger build

c06e6d5

adixitconfluent mentioned this pull request Nov 15, 2024

KAFKA-18022: fetchOffsetMetadata handling for minBytes estimation in both common/uncommon cases of share fetch #17825

Open

junrao approved these changes Nov 15, 2024

View reviewed changes

junrao merged commit 77cc8ff into apache:trunk Nov 15, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-17948: Potential issue during tryComplete and onComplete simultaneous calls to access global variables #17739

KAFKA-17948: Potential issue during tryComplete and onComplete simultaneous calls to access global variables #17739

adixitconfluent commented Nov 10, 2024 •

edited

Loading

junrao left a comment

junrao Nov 12, 2024

adixitconfluent Nov 12, 2024

adixitconfluent Nov 13, 2024 •

edited

Loading

junrao Nov 13, 2024

adixitconfluent Nov 13, 2024 •

edited

Loading

apoorvmittal10 Nov 14, 2024

apoorvmittal10 Nov 14, 2024

junrao Nov 14, 2024

AndrewJSchofield Nov 14, 2024

adixitconfluent Nov 15, 2024 •

edited

Loading

apoorvmittal10 Nov 12, 2024

apoorvmittal10 Nov 12, 2024

adixitconfluent Nov 13, 2024

junrao Nov 13, 2024

adixitconfluent Nov 13, 2024

junrao Nov 13, 2024

adixitconfluent Nov 14, 2024 •

edited

Loading

junrao left a comment

junrao Nov 15, 2024

KAFKA-17948: Potential issue during tryComplete and onComplete simultaneous calls to access global variables #17739

KAFKA-17948: Potential issue during tryComplete and onComplete simultaneous calls to access global variables #17739

Conversation

adixitconfluent commented Nov 10, 2024 • edited Loading

About

Testing

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adixitconfluent Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adixitconfluent Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adixitconfluent Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adixitconfluent Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adixitconfluent commented Nov 10, 2024 •

edited

Loading

adixitconfluent Nov 13, 2024 •

edited

Loading

adixitconfluent Nov 13, 2024 •

edited

Loading

adixitconfluent Nov 15, 2024 •

edited

Loading

adixitconfluent Nov 14, 2024 •

edited

Loading