Skip to content

Commit

Permalink
Update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
mymeiyi committed Nov 2, 2023
1 parent 38a3111 commit d0c1e76
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,19 @@ Group commit load does not introduce a new import method, but an extension of `I

In Doris, all methods of data loading are independent jobs which initiate a new transaction and generate a new data version. In the scenario of high-frequency writes, both transactions and compactions are under great pressure. Group commit load reduces the number of transactions and compactions by combining multiple small load tasks into one load job, and thus improve write performance.

It should be noted that the group commit is returned after the data is writed to WAL, at this time, the data is not visible for users, the default time interval is 10 seconds.
The process is roughly as follows:
1. User starts a group commit load, BE puts the data into the memory and WAL, and returns immediately. The data is not visible to users at this time;
2. BE will periodically (default is 10 seconds) commit the data in the memory, and the data is visible to users after committed;
3. If BE restarts, the data will be recovered through WAL.

## Fundamental

### Write process
1. User starts a group commit load, FE generates a plan fragment;
2. BE executes the plan. Unlike non group commit load, the processed data is not sent to each tablet, but put into a queue in the memory shared by multiple group commit load;
3. BE starts an internal load, which consumes the data in the queue, writes to WAL, and notifies that the data related load has been finished;
4. After that, the data is processed in the same way as non group commit load, send to each tablet, write memtable, and flushed to segment files.
4. After that, the data is processed in the same way as non group commit load, send to each tablet, write memtable, and flushed to segment files;
5. The internal load is finished after a fixed time interval (default is 10 seconds), and the data is visible to users when it is committed.

### WAL Introduction

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,20 @@ under the License.

在 Doris 中,所有的数据写入都是一个独立的导入作业,发起一个新的事务,产生一个新的数据版本。在高频写入的场景下,对transaction和compaction都产生了较大的压力。攒批写通过把多个小的写入合成一个写入作业,减少了transaction和compaction的次数,缓解了系统内部的压力,提高了写入的性能。

需要注意的是,攒批写入在数据写入WAL后即返回,此时不能立刻读出数据,默认为10秒后可以读出。
流程大致为:
1. 用户发起的导入,BE把处理后的数据写入内存和WAL中即返回,此时不能查询到数据;
2. 正常情况下,BE内部周期性(默认为10秒间隔)将内存中的数据提交,提交之后数据对用户可见;
3. 如果发生BE重启等,通过WAL走写入流程恢复数据。

## 原理介绍

### 写入流程

1. 用户发起攒批写入,FE生成执行计划;
2. BE执行规划,与非攒批导入不同,处理后的数据不是发给各个tablet,而是放到一个内存中的队列中,多个攒批共享这个队列;
3. BE内部发起一个导入规划,消费队列中的数据,写入WAL,并通知该数据对应的写入已完成;
4. 之后,消费后的数据和普通写入的处理流程一样,发给各个tablet,写入memtable,下刷为segment文件等;
5. BE内部发起的导入在达到固定的攒批时间(默认为10秒)后,开始提交,提交完成后,数据对用户可见。

### WAL介绍

Expand Down

0 comments on commit d0c1e76

Please sign in to comment.