Add note for workload group when upgrade Doris (#30457)

apache · Jan 29, 2024 · b365be3 · b365be3
1 parent ef85d0d
commit b365be3
Show file tree

Hide file tree

Showing 3 changed files with 64 additions and 14 deletions.
diff --git a/docs/en/docs/admin-manual/workload-group.md b/docs/en/docs/admin-manual/workload-group.md
@@ -30,6 +30,21 @@ under the License.
 
 The workload group can limit the use of compute and memory resources on a single be node for tasks within the group. Currently, query binding to workload groups is supported.
 
+## Version Description
+Workload Group is a feature that has been supported since version 2.0. The main difference between version 2.0 and 2.1 is that the 2.0 version of Workload Group does not rely on CGroup, while the 2.1 version of Workload Group depends on CGroup. Therefore, when using the 2.1 version of Workload Group, the environment of CGroup needs to be configured.
+
+#### Upgrade to version 2.0
+If upgrading from version 1.2 to version 2.0, it is recommended to enable the WorkloadGroup after the overall upgrade of the Doris cluster is completed. Because if you only upgrade a single Follower and enable this feature, as the FE code of the Master has not been updated yet, there is no metadata information for Workload Group in the Doris cluster, which may cause queries for the upgraded Follower nodes to fail. The recommended upgrade process is as follows:
+* First, upgrade the overall code of the Doris cluster to version 2.0.
+* Start using this feature according to the section ***Workload group usage*** in the following text.
+
+#### Upgrade to version 2.1
+If the code version is upgraded from 2.0 to 2.1, there are two situations:
+
+Scenario 1: In version 2.1, if the Workload Group has already been used, you only need to refer to the process of configuring cgroup v1 in the following text to use the new version of the Workload Group.
+
+Scenario 2: If the Workload Group is not used in version 2.0, it is also necessary to upgrade the Doris cluster as a whole to version 2.1, and then start using this feature according to the section ***Workload group usage*** in the following text.
+
 ## Workload group properties
 
 * cpu_share: Optional, The default value is 1024, with a range of positive integers. used to set how much cpu time the workload group can acquire, which can achieve soft isolation of cpu resources. cpu_share is a relative value indicating the weight of cpu resources available to the running workload group. For example, if a user creates 3 workload groups rg-a, rg-b and rg-c with cpu_share of 10, 30 and 40 respectively, and at a certain moment rg-a and rg-b are running tasks while rg-c has no tasks, then rg-a can get 25% (10 / (10 + 30)) of the cpu resources while workload group rg-b can get 75% of the cpu resources. If the system has only one workload group running, it gets all the cpu resources regardless of the value of its cpu_share.
@@ -85,17 +100,28 @@ It should be noted that the current workload group does not support the deployme
 
 ## Workload group usage
 
-1. Enable the experimental_enable_workload_group configuration, set in fe.conf to
+1. Manually create a workload group named normal, which is the default workload group in the system and cannot be deleted.
+```
+create workload group if not exists normal 
+properties (
+	'cpu_share'='1024',
+	'memory_limit'='30%',
+	'enable_memory_overcommit'='true'
+);
+```
+The function of a normal group is that when you do not specify a Workload Group for a query, the query will use normal Group, thus avoiding query failures.
+
+2. Enable the experimental_enable_workload_group configuration, set in fe.conf to
 ```
 experimental_enable_workload_group=true
 ```
 The system will automatically create a default workload group named ``normal`` after this configuration is enabled. 
 
-2. To create a workload group:
+3. If you expect to use other groups for testing, you can create a custom workload group,
 ```
 create workload group if not exists g1
 properties (
-    "cpu_share"="10".
+    "cpu_share"="1024".
     "memory_limit"="30%".
     "enable_memory_overcommit"="true"
 ).
@@ -105,12 +131,12 @@ This configured CPU limit to the soft limit.
 For details on creating a workload group, see [CREATE-WORKLOAD-GROUP](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-WORKLOAD-GROUP.md), and to delete a workload group, refer to [DROP-WORKLOAD-GROUP](../sql-manual/sql-reference/Data-Definition-Statements/Drop/DROP-WORKLOAD-GROUP.md); to modify a workload group, refer to [ALTER-WORKLOAD-GROUP](../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-WORKLOAD-GROUP.md); to view the workload group, refer to: [WORKLOAD_GROUPS()](../sql-manual/sql-functions/table-functions/workload-group.md) and [SHOW-WORKLOAD-GROUPS](../sql-manual/sql-reference/Show-Statements/SHOW-WORKLOAD-GROUPS.md).
 
 
-3. turn on the pipeline execution engine, the workload group cpu isolation is based on the implementation of the pipeline execution engine, so you need to turn on the session variable:
+4. turn on the pipeline execution engine, the workload group cpu isolation is based on the implementation of the pipeline execution engine, so you need to turn on the session variable:
 ```
 set experimental_enable_pipeline_engine = true.
 ```
 
-4. Bind the workload group.
+5. Bind the workload group.
 * Bind the user to the workload group by default by setting the user property to ``normal``.
 ```
 set property 'default_workload_group' = 'g1'.
@@ -124,7 +150,7 @@ session variable `workload_group` takes precedence over user property `default_w
 
 If you are a non-admin user, you need to execute [SHOW-WORKLOAD-GROUPS](../sql-manual/sql-reference/Show-Statements/SHOW-WORKLOAD-GROUPS.md) to check if the current user can see the workload group, if not, the workload group may not exist or the current user does not have permission to execute the query. If you cannot see the workload group, the workload group may not exist or the current user does not have privileges. To authorize the workload group, refer to: [grant statement](../sql-manual/sql-reference/Account-Management-Statements/GRANT.md).
 
-5. Execute the query, which will be associated with the g1 workload group.
+6. Execute the query, which will be associated with the g1 workload group.
 
 ### Query Queue
 ```

diff --git a/docs/zh-CN/docs/admin-manual/workload-group.md b/docs/zh-CN/docs/admin-manual/workload-group.md
@@ -30,6 +30,21 @@ under the License.
 
 workload group 可限制组内任务在单个be节点上的计算资源和内存资源的使用。当前支持query绑定到workload group。
 
+## 版本说明
+Workload Group是从2.0版本开始支持的功能，Workload Group在2.0版本和2.1版本的主要区别在于，2.0版本的Workload Group不依赖CGroup，而2.1版本的Workload Group依赖CGroup，因此使用2.1版本的Workload Group时要配置CGroup的环境。
+
+#### 升级到2.0版本
+1 如果是从1.2版本升级到2.0版本时，建议Doris集群整体升级完成后，再开启WorkloadGroup功能。因为如果只升级单台Follower就开启此功能，由于Master的FE代码还没有更新，此时Doris集群中并没有Workload Group的元数据信息，这可能导致已升级的Follower节点的查询失败。建议的升级流程如下：
+* 先把Doris集群整体代码升级到2.0版本。
+* 再根据下文中***workload group使用***的章节开始使用该功能。
+
+#### 升级到2.1版本
+2 如果代码版本是从2.0升级到2.1的，分为以下两种情况：
+
+情况1：在2.1版本如果已经使用了Workload Group功能，那么只需要参考下文中配置cgroup v1的流程即可使用新版本的Workload Group功能。
+
+情况2：如果在2.0版本没有使用Workload Group功能，那么也需要先把Doris集群整体升级到2.1版本后，再根据下文的***workload group使用***的章节开始使用该功能。
+
 ## workload group属性
 
 * cpu_share: 可选，默认值为1024，取值范围是正整数。用于设置workload group获取cpu时间的多少，可以实现cpu资源软隔离。cpu_share 是相对值，表示正在运行的workload group可获取cpu资源的权重。例如，用户创建了3个workload group g-a、g-b和g-c，cpu_share 分别为 10、30、40，某一时刻g-a和g-b正在跑任务，而g-c没有任务，此时g-a可获得 25% (10 / (10 + 30))的cpu资源，而g-b可获得75%的cpu资源。如果系统只有一个workload group正在运行，则不管其cpu_share的值为多少，它都可获取全部的cpu资源。
@@ -83,17 +98,27 @@ curl -X POST http://{be_ip}:{be_http_port}/api/update_config?doris_cgroup_cpu_pa
 
 ## workload group使用
 
-1. 开启 experimental_enable_workload_group 配置项，在fe.conf中设置：
+1. 手动创建一个名为normal的Workload Group，这个Workload Group为系统默认的Workload Group，不可删除。
+```
+create workload group if not exists normal 
+properties (
+	'cpu_share'='1024',
+	'memory_limit'='30%',
+	'enable_memory_overcommit'='true'
+);
+```
+normal Group的作用在于，当你不为查询指定Workload Group时，查询会默认使用该Group，从而避免查询失败。
+
+2. 开启 experimental_enable_workload_group 配置项，在fe.conf中设置：
 ```
 experimental_enable_workload_group=true
 ```
-在开启该配置后系统会自动创建名为`normal`的默认workload group。
 
-2. 创建workload group：
+3. 如果期望使用其他group进行测试，那么可以创建一个自定义的workload group，
 ```
 create workload group if not exists g1
 properties (
-    "cpu_share"="10",
+    "cpu_share"="1024",
     "memory_limit"="30%",
     "enable_memory_overcommit"="true"
 );
@@ -102,12 +127,12 @@ properties (
 
 创建workload group详细可参考：[CREATE-WORKLOAD-GROUP](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-WORKLOAD-GROUP.md)，另删除workload group可参考[DROP-WORKLOAD-GROUP](../sql-manual/sql-reference/Data-Definition-Statements/Drop/DROP-WORKLOAD-GROUP.md)；修改workload group可参考：[ALTER-WORKLOAD-GROUP](../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-WORKLOAD-GROUP.md)；查看workload group可参考：[WORKLOAD_GROUPS()](../sql-manual/sql-functions/table-functions/workload-group.md)和[SHOW-WORKLOAD-GROUPS](../sql-manual/sql-reference/Show-Statements/SHOW-WORKLOAD-GROUPS.md)。
 
-3. 开启pipeline执行引擎，workload group cpu隔离基于pipeline执行引擎实现，因此需开启session变量：
+4. 开启pipeline执行引擎，workload group cpu隔离基于pipeline执行引擎实现，因此需开启session变量：
 ```
 set experimental_enable_pipeline_engine = true;
 ```
 
-4. 绑定workload group。
+5. 绑定workload group。
 * 通过设置user property 将user默认绑定到workload group，默认为`normal`:
 ```
 set property 'default_workload_group' = 'g1';
@@ -121,7 +146,7 @@ session变量`workload_group`优先于 user property `default_workload_group`,
 
 如果是非admin用户，需要先执行[SHOW-WORKLOAD-GROUPS](../sql-manual/sql-reference/Show-Statements/SHOW-WORKLOAD-GROUPS.md) 确认下当前用户能否看到该workload group，不能看到的workload group可能不存在或者当前用户没有权限，执行查询时会报错。给worklaod group授权参考：[grant语句](../sql-manual/sql-reference/Account-Management-Statements/GRANT.md)。
 
-5. 执行查询，查询将关联到指定的 workload group。
+6. 执行查询，查询将关联到指定的 workload group。
 
 ### 查询排队功能
 ```

diff --git a/...src/main/java/org/apache/doris/resource/workloadschedpolicy/WorkloadRuntimeStatusMgr.java b/...src/main/java/org/apache/doris/resource/workloadschedpolicy/WorkloadRuntimeStatusMgr.java
@@ -141,7 +141,6 @@ public void updateBeQueryStats(TReportWorkloadRuntimeStatusParams params) {
         } else {
             long currentTime = System.currentTimeMillis();
             for (Map.Entry<String, TQueryStatistics> entry : params.query_statistics_map.entrySet()) {
-                LOG.info("log2109 queryid={}, shuffle={}", entry.getKey(), entry.getValue().shuffle_send_bytes);
                 queryIdMap.put(entry.getKey(), entry.getValue());
                 queryLastReportTime.put(entry.getKey(), currentTime);
             }