Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: pessimistic lock global id, alloc id & insert ddl job in one txn #54547

Merged
merged 17 commits into from
Jul 16, 2024

Conversation

D3Hunter
Copy link
Contributor

@D3Hunter D3Hunter commented Jul 10, 2024

What problem does this PR solve?

Issue Number: ref #54436

Problem Summary:

What changed and how does it work?

  • pessimistic lock global ID key before alloc it to avoid write conflict
  • job id allocation and job insertion are in the same transaction, as we want to make sure DDL jobs are inserted in id order, then we can query from a min job ID when scheduling DDL jobs to mitigate select very slow on an empty table from delete from xx #52905.
  • 🟥 THIS PR hurts DDL performance, it requires another 2 PR to out perform current master: 1 to combine table id allocation with job id, and 2 to query since min job id. they will be filed later.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

when len(job_meta) < 8k, with [1, 1000] routines to run TestGenIDAndInsertJobsWithRetryQPS on 1pd/3tikv environment, QPS is about [300, 400], much larger than general DDL execution QPS

  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 10, 2024
Copy link

tiprow bot commented Jul 10, 2024

Hi @D3Hunter. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

codecov bot commented Jul 10, 2024

Codecov Report

Attention: Patch coverage is 79.66102% with 24 lines in your changes missing coverage. Please review.

Project coverage is 55.9584%. Comparing base (0b9cd2f) to head (96ebfe6).
Report is 22 commits behind head on master.

Additional details and impacted files
@@                Coverage Diff                @@
##             master     #54547         +/-   ##
=================================================
- Coverage   74.7922%   55.9584%   -18.8339%     
=================================================
  Files          1549       1673        +124     
  Lines        362139     613652     +251513     
=================================================
+ Hits         270852     343390      +72538     
- Misses        71676     246719     +175043     
- Partials      19611      23543       +3932     
Flag Coverage Δ
integration 37.1709% <54.2372%> (?)
unit 71.7187% <79.6610%> (-2.0006%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9656% <ø> (-2.2339%) ⬇️
parser ∅ <ø> (∅)
br 52.5976% <ø> (+4.9234%) ⬆️

@D3Hunter
Copy link
Contributor Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 10, 2024
@D3Hunter D3Hunter changed the title ddl: alloc id & insert ddl job in one txn [WIP]ddl: alloc id & insert ddl job in one txn Jul 10, 2024
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 10, 2024
@D3Hunter
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 10, 2024
@D3Hunter D3Hunter changed the title [WIP]ddl: alloc id & insert ddl job in one txn ddl: alloc id & insert ddl job in one txn Jul 12, 2024
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 12, 2024
@D3Hunter D3Hunter changed the title ddl: alloc id & insert ddl job in one txn ddl: pessimistic lock global id, alloc id & insert ddl job in one txn Jul 12, 2024
@D3Hunter D3Hunter mentioned this pull request Jul 14, 2024
54 tasks
@D3Hunter
Copy link
Contributor Author

/hold

wait a another review from txn team

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 15, 2024
Copy link
Contributor

@GMHDBJD GMHDBJD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 15, 2024
Copy link

ti-chi-bot bot commented Jul 15, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-07-12 09:42:14.273163825 +0000 UTC m=+1356.264105308: ☑️ agreed by lcwangchao.
  • 2024-07-15 06:29:52.167155162 +0000 UTC m=+249014.158096633: ☑️ agreed by GMHDBJD.

// table with retry. job id allocation and job insertion are in the same transaction,
// as we want to make sure DDL jobs are inserted in id order, then we can query from
// a min job ID when scheduling DDL jobs to mitigate https://github.com/pingcap/tidb/issues/52905.
// so this function has side effect, it will set the job id of 'tasks'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// so this function has side effect, it will set the job id of 'tasks'.
// so this function has side effect, it will set the job id of 'jobs'.

// level in SQL executor, see doLockKeys.
// TODO maybe we can unify the lock mechanism with SQL executor in the future, or
// implement it inside TiKV client-go.
func lockGlobalIDKey(ctx context.Context, ddlSe *sess.Session, txn kv.Transaction) (uint64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big problem. We can get txn from ddlSe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will keep it, don't want to get it again, this method is internal anyway.

ver kv.Version
err error
)
waitTime := ddlSe.GetSessionVars().LockWaitTimeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this session (ddlSe) is get from session pool, not user's session. Maybe get the lock wait value from user's session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is part of internal DDL execution, internal not user setting should be used

Copy link
Contributor

@lance6716 lance6716 Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's more reasonable that the setting of DDL's SQL connection can affect the timeout of DDL.

Copy link
Contributor Author

@D3Hunter D3Hunter Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems only affects txn for DML in mysql, mysql don't have a ddl_job table like us, so it shouldn't affect DDL, a simple memory lock should be enough
https://dev.mysql.com/doc/refman/8.4/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout

Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will review soon

return ddlSe.Commit(ctx)
}()

if resErr != nil && kv.IsTxnRetryableError(resErr) {
Copy link
Contributor

@lance6716 lance6716 Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what's the meaning of ErrLockExpire. I just want to cover the case that pessimistic lock is cleaned by other readers of GlobalIDKey and it can be retryable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see https://github.com/tikv/client-go/blob/d73cc1ed6503925dfc7226e8d5677ceb4c2fd6f1/txnkv/transaction/2pc.go#L1227-L1230, and handled in tidb with below, seems no special handling

tidb/pkg/session/session.go

Lines 1029 to 1040 in 06e0e17

func (s *session) checkTxnAborted(stmt sqlexec.Statement) error {
if atomic.LoadUint32(&s.GetSessionVars().TxnCtx.LockExpire) == 0 {
return nil
}
// If the transaction is aborted, the following statements do not need to execute, except `commit` and `rollback`,
// because they are used to finish the aborted transaction.
if ok, err := isEndTxnStmt(stmt.(*executor.ExecStmt).StmtNode, s.sessionVars); err == nil && ok {
return nil
} else if err != nil {
return err
}
return kv.ErrLockExpire

i guess the reason is: T1 lock it so no one can change it before forUpdateTS, after the lock is expired & it hasn't commit, if there is T2 lock it later, T1 commit > T2 forUpdateTS > T1 forUpdateTS, one of them will report write conflict on commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this type of error ERROR 1105 (HY000): tikv aborts txn: Error(Txn(Error(Mvcc(Error(PessimisticLockNotFound which likely standing for the case that pessimistic lock is cleaned by other transactions. But I don't know if it's handled by kv.IsTxnRetryableError. Transaction is very complex 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't know if it's handled by kv.IsTxnRetryableError

tidb/pkg/kv/error.go

Lines 83 to 85 in 9044acb

if ErrTxnRetryable.Equal(err) || ErrWriteConflict.Equal(err) || ErrWriteConflictInTiDB.Equal(err) {
return true
}

only ErrTxnRetryable except write conflict

Transaction is very complex 😂

yes, that's why i ask @lcwangchao @cfzjywxk to review this part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default MaxTxnTTL: 60 * 60 * 1000, // 1hour, seems ok to let user retry it themself in this case, even DML transaction will abort

Copy link
Contributor

@lance6716 lance6716 Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see ManagedLockTTL in client-go is 20s. Does it means if pessimistic lock left in TiKV while the node crashes, other node must wait at most 20s to clean up the lock? If lock owner crashes, submitting DDL will be paused for 20s, it's not friendly.

Copy link
Contributor Author

@D3Hunter D3Hunter Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we cannot avoid this on crash for pessimistic txn now

  • create table t(id int primary key, v int); insert into t values(1,1);
  • run this
mysql -uroot -h 127.0.0.1 -P4000 test -e "select now(); begin; update t set v=2 where id=1; select sleep(100);";
+---------------------+
| now()               |
+---------------------+
| 2024-07-15 22:26:12 |
+---------------------+
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
  • kill -9 tidb, immediately after previous step
  • restart
  • run immediately after resart
mysql -uroot -h 127.0.0.1 -P4000 test -e "select now(); begin; update t set v=3 where id=1; commit; select now();"
+---------------------+
| now()               |
+---------------------+
| 2024-07-15 22:26:16 |
+---------------------+
+---------------------+
| now()               |
+---------------------+
| 2024-07-15 22:26:32 |
+---------------------+

ddlSe := sess.NewSession(se)
localMode := tasks[0].job.LocalMode
if localMode {
if err = fillJobIDs(ctx, ddlSe, tasks); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local mode jobs are not written to ddl_job table, no need to lock the global ID? I'm afraid the performance for local job is more important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to lock the global ID?

memory lock is used to reduce write conflict, after later pr to "combine table id allocation with job id", there will be no write conflict in 1 node.

with the 2 pr mentioned in the pr, we can create 100k tables in about 13 minutes, and there is very little speed degradation, i will test 1M tables later, if everything goes ok, i still suggest deprecate fast-create in later version
image

Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

pkg/ddl/ddl_worker.go Show resolved Hide resolved
// generate ID and call function runs in the same transaction.
func genIDAndCallWithRetry(ctx context.Context, ddlSe *sess.Session, count int, fn func(ids []int64) error) error {
var resErr error
for i := uint(0); i < kv.MaxRetryCnt; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that much transaction retry is needed, as the conflict error is already handled by the in-transaction statements or operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's reusing retry count of kv.RunInNewTxn, in case of other retryable error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kv.RunInNewTxn mode is executed in optimistic mode, the write conflict error needs to be handled doing commit. The transaction commit should succeed in most cases when pessimistic lock is used. Not a big problem to try more times.

pkg/ddl/ddl_worker.go Show resolved Hide resolved
// generate ID and call function runs in the same transaction.
func genIDAndCallWithRetry(ctx context.Context, ddlSe *sess.Session, count int, fn func(ids []int64) error) error {
var resErr error
for i := uint(0); i < kv.MaxRetryCnt; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kv.RunInNewTxn mode is executed in optimistic mode, the write conflict error needs to be handled doing commit. The transaction commit should succeed in most cases when pessimistic lock is used. Not a big problem to try more times.

return errors.Trace(err)
}
defer func() {
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to abstract L549 to L576 to a standalone function, using defer statement inside for loop increases the risks of mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's inside an lambda function already, the defer will run after the it returns.

Copy link
Contributor

@cfzjywxk cfzjywxk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@D3Hunter
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 16, 2024
var resErr error
for i := uint(0); i < kv.MaxRetryCnt; i++ {
resErr = func() (err error) {
if err := ddlSe.Begin(ctx); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we Begin() here, previous statements executed by ddlSe will be committed implicitly. For example, the flashback jobs checking maybe outdated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we Begin() here, previous statements executed by ddlSe will be committed implicitly.

what do you mean, we don't have transaction before this one

flashback jobs checking maybe outdated

it's the same behavior as our previous impl. if such job submitted, job scheduler will stop it from running if there is a flashback job before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i ever thought remove this checking, as job schedule will calculate dependency between jobs, but a little different with current behavior

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the same behavior as our previous impl.

OK.

job scheduler will stop it from running

I guess the purpose of checking flashback job before submitting is to prevent having potential wrong info in job.Args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we cannot make sure there is no flashback job before this job, they might be submitted on different node

now we can do it by check inside the submit transaction, but i don't want to change this part now, and it might hurt performance

Copy link

ti-chi-bot bot commented Jul 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfzjywxk, GMHDBJD, lcwangchao, tangenta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Jul 16, 2024
@D3Hunter
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 16, 2024

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 16, 2024

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 16, 2024

@D3Hunter: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot merged commit de54620 into pingcap:master Jul 16, 2024
21 checks passed
@D3Hunter D3Hunter deleted the ensure-job-id-order branch July 16, 2024 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants