Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect binlog purged and report to Drainer #1017

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

csuzhangxc
Copy link
Member

@csuzhangxc csuzhangxc commented Nov 27, 2020

What problem does this PR solve?

detect binlog purged and report to Drainer.

fix TOOL-2236 (internal only)

What is changed and how it works?

  • check requested TS with gcTC before send binlog to Drainer
  • report an error to Drainer if binlog purged

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change
  • Has exported variable/fields change

Side effects

  • Possible performance regression
  • Increased code complexity

Related changes

  • Need to cherry-pick to the release branch
  • Need to be included in the release note

Release note

  • detect binlog purged and report to Drainer

@sre-bot
Copy link

sre-bot commented Nov 27, 2020

@csuzhangxc csuzhangxc changed the title detect binlog purged and report to Drainer detect binlog purged and report errors to Drainer Nov 27, 2020
@csuzhangxc csuzhangxc changed the title detect binlog purged and report errors to Drainer detect binlog purged and report to Drainer Nov 27, 2020
@sre-bot
Copy link

sre-bot commented Nov 27, 2020

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

p.logger.Error("some binlogs have been purged in pump")
})

time.Sleep(time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not exit directly? since isBinlogPurged never set back to false.

Copy link
Member Author

@csuzhangxc csuzhangxc Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if exit directly, then systemd will restart it again (but that's meaningless). if not exist, log and metrics can still keep working.

for isBinlogPurged, the user should handle it manually and re-establish the replication.

return
}

select {
case values <- value:
log.Debug("send value success")
case <-time.After(gcMaxBlockTime):
// do not update `last` anymore.
log.Warn("can not send the binlog for a long time, will try to read again", zap.Duration("duration", gcMaxBlockTime), zap.Int64("current TS", decodeTSKey(iter.Key())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we send error to errs here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send the binlog for a long time often caused by a slow drainer. if we send errors, the replication will be broken and try to send again, that's should make it worse.

Copy link
Contributor

@3pointer 3pointer Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, if drainer encounter the purge error , all pump will reach this warning 🤔️, but both drainer and pump still running. until the user manually fix it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? not all pump, but only this one. and pump should still be running because may other drainers can still work normally.

Copy link
Contributor

@3pointer 3pointer Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if one pump is blocked, it seems the merge source will blocked soon. and the drainer will block soon. so I think exit immeditately here is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I know what you mean. any advice?

Copy link
Contributor

@3pointer 3pointer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

drainer/pump.go Show resolved Hide resolved
@july2993 july2993 removed their request for review June 8, 2021 14:06
@ti-chi-bot
Copy link
Member

@csuzhangxc: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants