Skip to content

Commit

Permalink
[fix](cache) add sql cache conf cache_result_max_data_size (#22645)
Browse files Browse the repository at this point in the history
Only the maximum number of rows in sql cache cache_result_max_row_count is not enough. If a row of data is too large, FE may OOM.
  • Loading branch information
xinyiZzz authored and xiaokang committed Aug 11, 2023
1 parent 314755f commit 01c5056
Show file tree
Hide file tree
Showing 9 changed files with 55 additions and 17 deletions.
14 changes: 12 additions & 2 deletions docs/en/docs/admin-manual/config/fe-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -818,7 +818,7 @@ IsMutable:true

MasterOnly:false

If this switch is turned on, the SQL query result set will be cached. If the interval between the last visit version time in all partitions of all tables in the query is greater than cache_last_version_interval_second, and the result set is less than cache_result_max_row_count, the result set will be cached, and the next same SQL will hit the cache
If this switch is turned on, the SQL query result set will be cached. If the interval between the last visit version time in all partitions of all tables in the query is greater than cache_last_version_interval_second, and the result set is less than cache_result_max_row_count, and the data size is less than cache_result_max_data_size, the result set will be cached, and the next same SQL will hit the cache

If set to true, fe will enable sql result caching. This option is suitable for offline data update scenarios

Expand All @@ -845,7 +845,17 @@ IsMutable:true

MasterOnly:false

In order to avoid occupying too much memory, the maximum number of rows that can be cached is 2000 by default. If this threshold is exceeded, the cache cannot be set
In order to avoid occupying too much memory, the maximum number of rows that can be cached is 3000 by default. If this threshold is exceeded, the cache cannot be set

#### `cache_result_max_data_size`

Default: 31457280

IsMutable: true

MasterOnly: false

In order to avoid occupying too much memory, the maximum data size of rows that can be cached is 10MB by default. If this threshold is exceeded, the cache cannot be set

#### `cache_last_version_interval_second`

Expand Down
12 changes: 10 additions & 2 deletions docs/en/docs/advanced/cache/partition-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,14 @@ Partition fields can also be other fields, but need to ensure that only a small

## How to use

> NOTE:
>
> In the following scenarios, the cache result is wrong
> 1. Use session variable: default_order_by_limit, sql_select_limit
> 2. Use var = cur_date(), var = random() functions that generate random values
>
> There may be other cases where the cache result is wrong, so it is recommended to enable it only in controllable scenarios such as reports.
### Enable SQLCache

Make sure cache_enable_sql_mode=true in fe.conf (default is true)
Expand Down Expand Up @@ -228,14 +236,14 @@ Other monitoring: You can view the CPU and memory indicators of the BE node, the

### Optimization parameters

The configuration item cache_result_max_row_count of FE, the maximum number of rows in the cache for the query result set, can be adjusted according to the actual situation, but it is recommended not to set it too large to avoid taking up too much memory, and the result set exceeding this size will not be cached.
The configuration item cache_result_max_row_count of FE, the maximum number of rows in the cache for the query result set, FE configuration item cache_result_max_data_size, the maximum data size of the query result set put into the cache, can be adjusted according to the actual situation, but it is recommended not to set it too large to avoid taking up too much memory, and the result set exceeding this size will not be cached.

```text
vim fe/conf/fe.conf
cache_result_max_row_count=3000
```

The maximum number of partitions in BE cache_max_partition_count refers to the maximum number of partitions corresponding to each SQL. If it is partitioned by date, it can cache data for more than 2 years. If you want to keep the cache for a longer time, please set this parameter to a larger value and modify it at the same time. Parameter of cache_result_max_row_count.
The maximum number of partitions in BE cache_max_partition_count refers to the maximum number of partitions corresponding to each SQL. If it is partitioned by date, it can cache data for more than 2 years. If you want to keep the cache for a longer time, please set this parameter to a larger value and modify it at the same time. Parameter of cache_result_max_row_count and cache_result_max_data_size.

```text
vim be/conf/be.conf
Expand Down
5 changes: 1 addition & 4 deletions docs/en/docs/query-acceleration/nereids.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,4 @@ SET enable_fallback_to_original_planner=true;

### known issues

- Cannot use query cache and partition cache to accelarate query
- Not support MTMV
- Not support MV created after version 2.0.0
- Some unsupported subquery usage will produce an error result instead of an error
- Cannot use partition cache to accelarate query
12 changes: 11 additions & 1 deletion docs/zh-CN/docs/admin-manual/config/fe-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -818,7 +818,7 @@ trace导出到 collector: `http://127.0.0.1:4318/v1/traces`

是否为 Master FE 节点独有的配置项:false

如果设置为 true,SQL 查询结果集将被缓存。如果查询中所有表的所有分区最后一次访问版本时间的间隔大于cache_last_version_interval_second,且结果集小于cache_result_max_row_count,则结果集会被缓存,下一条相同的SQL会命中缓存
如果设置为 true,SQL 查询结果集将被缓存。如果查询中所有表的所有分区最后一次访问版本时间的间隔大于cache_last_version_interval_second,且结果集行数小于cache_result_max_row_count,且数据大小小于cache_result_max_data_size,则结果集会被缓存,下一条相同的SQL会命中缓存

如果设置为 true,FE 会启用 sql 结果缓存,该选项适用于离线数据更新场景

Expand Down Expand Up @@ -847,6 +847,16 @@ trace导出到 collector: `http://127.0.0.1:4318/v1/traces`

设置可以缓存的最大行数,详细的原理可以参考官方文档:操作手册->分区缓存

#### `cache_result_max_data_size`

默认值:31457280

是否可以动态配置:true

是否为 Master FE 节点独有的配置项:false

设置可以缓存的最大数据大小,单位Bytes

#### `cache_last_version_interval_second`

默认值:900
Expand Down
4 changes: 2 additions & 2 deletions docs/zh-CN/docs/advanced/cache/partition-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,14 +228,14 @@ Partition平均数据大小 = cache_memory_total / cache_partition_total

### 优化参数

FE的配置项cache_result_max_row_count,查询结果集放入缓存的最大行数,可以根据实际情况调整,但建议不要设置过大,避免过多占用内存,超过这个大小的结果集不会被缓存。
FE的配置项cache_result_max_row_count,查询结果集放入缓存的最大行数,FE的配置项cache_result_max_data_size,查询结果集放入缓存的最大数据大小,可以根据实际情况调整,但建议不要设置过大,避免过多占用内存,超过这个大小的结果集不会被缓存。

```text
vim fe/conf/fe.conf
cache_result_max_row_count=3000
```

BE最大分区数量cache_max_partition_count,指每个SQL对应的最大分区数,如果是按日期分区,能缓存2年多的数据,假如想保留更长时间的缓存,请把这个参数设置得更大,同时修改cache_result_max_row_count的参数
BE最大分区数量cache_max_partition_count,指每个SQL对应的最大分区数,如果是按日期分区,能缓存2年多的数据,假如想保留更长时间的缓存,请把这个参数设置得更大,同时修改cache_result_max_row_count和cache_result_max_data_size的参数

```text
vim be/conf/be.conf
Expand Down
5 changes: 1 addition & 4 deletions docs/zh-CN/docs/query-acceleration/nereids.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,4 @@ SET enable_fallback_to_original_planner=true;

### 已知问题

- 不支持命中 Query Cache 和 Partition Cache
- 不支持选中多表物化视图
- 不支持选中使用 2.0 版本新创建物化视图
- 部分不支持的子查询用法会产生错误结果而不是报错
- 不支持命中 Partition Cache
10 changes: 9 additions & 1 deletion fe/fe-common/src/main/java/org/apache/doris/common/Config.java
Original file line number Diff line number Diff line change
Expand Up @@ -1174,9 +1174,17 @@ public class Config extends ConfigBase {
/**
* Set the maximum number of rows that can be cached
*/
@ConfField(mutable = true, masterOnly = false)
@ConfField(mutable = true, masterOnly = false, description = {"SQL/Partition Cache可以缓存的最大行数。",
"Maximum number of rows that can be cached in SQL/Partition Cache, is 3000 by default."})
public static int cache_result_max_row_count = 3000;

/**
* Set the maximum data size that can be cached
*/
@ConfField(mutable = true, masterOnly = false, description = {"SQL/Partition Cache可以缓存的最大数据大小。",
"Maximum data size of rows that can be cached in SQL/Partition Cache, is 3000 by default."})
public static int cache_result_max_data_size = 31457280; // 30M

/**
* Used to limit element num of InPredicate in delete statement.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,13 @@ protected boolean checkRowLimit() {
return false;
}
if (rowBatchBuilder.getRowSize() > Config.cache_result_max_row_count) {
LOG.info("can not be cached. rowbatch size {} is more than {}", rowBatchBuilder.getRowSize(),
LOG.debug("can not be cached. rowbatch size {} is more than {}", rowBatchBuilder.getRowSize(),
Config.cache_result_max_row_count);
return false;
} else if (rowBatchBuilder.getDataSize() > Config.cache_result_max_data_size) {
LOG.debug("can not be cached. rowbatch data size {} is more than {}", rowBatchBuilder.getDataSize(),
Config.cache_result_max_data_size);
return false;
} else {
return true;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ public int getRowSize() {
return rowSize;
}

public int getDataSize() {
return dataSize;
}

public RowBatchBuilder(CacheAnalyzer.CacheMode model) {
cacheMode = model;
keyIndex = 0;
Expand Down

0 comments on commit 01c5056

Please sign in to comment.