diff --git a/docs/en/docs/admin-manual/cluster-management/upgrade.md b/docs/en/docs/admin-manual/cluster-management/upgrade.md index edf4a179878f88..00c909443b15b2 100644 --- a/docs/en/docs/admin-manual/cluster-management/upgrade.md +++ b/docs/en/docs/admin-manual/cluster-management/upgrade.md @@ -24,77 +24,285 @@ specific language governing permissions and limitations under the License. --> - # Cluster upgrade -Doris can upgrade smoothly by rolling upgrades. The following steps are recommended for security upgrade. +## Overview + +To upgrade, please use the steps recommended in this chapter to upgrade the cluster. The Doris cluster upgrade can be upgraded using the **rolling upgrade** method, which does not require all cluster nodes to be shut down for upgrade, which greatly reduces the impact on upper-layer applications. + +## Doris Release Notes + +:::tip + +For Doris upgrade, please follow the principle of **Do not upgrade across two or more key node versions**. If you want to upgrade across multiple key node versions, first upgrade to the nearest key node version, and then upgrade in turn. If it is not critical node version, it can be ignored and skipped. + +Key node version: the version that must be experienced when upgrading, it may be a single version, or a version range, such as `1.1.3 - 1.1.5`, it means that you can continue to upgrade after upgrading to any version in this range . + +::: + +| Version number | Key node version | LTS version | +| ------------------------ | ------------ | -------- | +| 0.12.x | Yes | No | +| 0.13.x | Yes | No | +| 0.14.x | Yes | No | +| 0.15.x | Yes | No | +| 1.0.0 - 1.1.2 | No | No | +| 1.1.3 - 1.1.5 | Yes | 1.1-LTS | +| 1.2.0 - 1.2.5 | Yes | 1.2-LTS | +| 2.0.0-alpha - 2.0.0-beta | Yes | 2.0-LTS | + +Example: + +The current version is `0.12`, upgrade route to `2.0.0-beta` version + +`0.12` -> `0.13` -> `0.14` -> `0.15` -> `1.1.3 - 1.1.5` any version -> `1.2.0 - 1.2.5` any version -> `2.0.0 -beta` + +:::tip + +LTS version: Long-time Support, LTS version provides long-term support and will be maintained for more than six months. Generally speaking, the version with the larger third digit of the version number is more stable**. + +Alpha version: an internal test version, the function has not been fully determined, and there may be major bugs. It is only recommended to use the test cluster for testing, ** it is not recommended to use the production cluster! ** + +Beta version: public test version, the function has been basically confirmed, there may be non-major bugs, it is only recommended to use the test cluster for testing, ** it is not recommended to use the production cluster! ** + +Release version: a public release version, which has completed the repair of basic important bugs and verification of functional defect fixes, and is recommended for production clusters. + +::: + +## Upgrade steps + +### Upgrade Instructions + +1. During the upgrade process, since Doris's RoutineLoad, Flink-Doris-Connector, and Spark-Doris-Connector have implemented a retry mechanism in the code, in a multi-BE node cluster, the rolling upgrade will not cause the task to fail . +2. The StreamLoad task requires you to implement a retry mechanism in your own code, otherwise the task will fail. +3. The cluster copy repair and balance function must be closed before and opened after the completion of a single upgrade task, regardless of whether all your cluster nodes have been upgraded. + +### Overview of the upgrade process + +1. Metadata backup +2. Turn off the cluster copy repair and balance function +3. Compatibility testing +4. Upgrade BE +5. Upgrade FE +6. Turn on the cluster replica repair and balance function + +### Upgrade pre-work + +Please perform the upgrade in sequence according to the upgrade process + +#### metadata backup (important) + +** Make a full backup of the `doris-meta` directory of the FE-Master node! ** + +#### Turn off the cluster replica repair and balance function + +There will be node restart during the upgrade process, so unnecessary cluster balancing and replica repair logic may be triggered, first close it with the following command: + +```sql +admin set frontend config("disable_balance" = "true"); +admin set frontend config("disable_colocate_balance" = "true"); +admin set frontend config("disable_tablet_scheduler" = "true"); +``` + +#### Compatibility testing + +:::tip + +**Metadata compatibility is very important, if the upgrade fails due to incompatible metadata, it may lead to data loss! It is recommended to perform a metadata compatibility test before each upgrade! ** + +::: + +##### FE Compatibility Test + +:::tip + +**important** + +1. It is recommended to do FE compatibility test on your local development machine or BE node. + +2. It is not recommended to test on Follower or Observer nodes to avoid link exceptions +3. If it must be on the Follower or Observer node, the started FE process needs to be stopped + +::: + +1. Use the new version alone to deploy a test FE process + + ```shell + sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon + ``` + +2. Modify the FE configuration file fe.conf for testing + + ```shell + vi ${DORIS_NEW_HOME}/conf/fe.conf + ``` + + Modify the following port information, set **all ports** to **different from online** + + ```shell + ... + http_port = 18030 + rpc_port = 19020 + query_port = 19030 + edit_log_port = 19010 + ... + ``` + + save and exit + +3. Add ClusterID configuration in fe.conf + + ```shell + echo "cluster_id=123456" >> ${DORIS_NEW_HOME}/conf/fe.conf + ``` + +4. Add metadata failover configuration in fe.conf + + ```shell + echo "metadata_failure_recovery=true" >> ${DORIS_NEW_HOME}/conf/fe.conf + ``` + +5. Copy the metadata directory doris-meta of the online environment Master FE to the test environment + + ```shell + cp ${DORIS_OLD_HOME}/fe/doris-meta/* ${DORIS_NEW_HOME}/fe/doris-meta + ``` + +6. Change the cluster_id in the VERSION file copied to the test environment to 123456 (that is, the same as in step 3) + + ```shell + vi ${DORIS_NEW_HOME}/fe/doris-meta/image/VERSION + clusterId=123456 + ``` + +7. In the test environment, run the startup FE + + ```shell + sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon + ``` + +8. Observe whether the startup is successful through the FE log fe.log + + ```shell + tail -f ${DORIS_NEW_HOME}/log/fe.log + ``` + +9. If the startup is successful, it means that there is no problem with the compatibility, stop the FE process of the test environment, and prepare for the upgrade + + ``` + sh ${DORIS_NEW_HOME}/bin/stop_fe.sh + ``` + +##### BE Compatibility Test + +You can use the grayscale upgrade scheme to upgrade a single BE first. If there is no exception or error, the compatibility is considered normal, and subsequent upgrade actions can be performed + +### Upgrade process + +:::tip + +Upgrade BE first, then FE + +Generally speaking, Doris only needs to upgrade `/bin` and `/lib` under the FE directory and `/bin` and `/lib` under the BE directory + +However, when a major version is upgraded, new features may be added or old functions refactored. These modifications may require **replace/add** more directories during the upgrade to ensure the availability of all new features. Please Carefully pay attention to the Release-Note of this version when upgrading the version to avoid upgrade failures + +::: + +#### Upgrade BE + +:::tip + +In order to ensure the safety of your data, please use 3 copies to store your data to avoid data loss caused by misoperation or failure of the upgrade + +::: + +1. Under the premise of multiple copies, select a BE node to stop running and perform grayscale upgrade + + ```shell + sh ${DORIS_OLD_HOME}/be/bin/stop_be.sh + ``` + +2. Rename the `/bin`, `/lib` directories under the BE directory + + ```shell + mv ${DORIS_OLD_HOME}/be/bin ${DORIS_OLD_HOME}/be/bin_back + mv ${DORIS_OLD_HOME}/be/lib ${DORIS_OLD_HOME}/be/lib_back + ``` + +3. Copy the new version of `/bin`, `/lib` directory to the original BE directory + + ```shell + cp ${DORIS_NEW_HOME}/be/bin ${DORIS_OLD_HOME}/be/bin + cp ${DORIS_NEW_HOME}/be/lib ${DORIS_OLD_HOME}/be/lib + ``` + +4. Start the BE node + + ```shell + sh ${DORIS_OLD_HOME}/be/bin/start_be.sh --daemon + ``` + +5. Link the cluster to view the node information + + ```mysql + show backends\G + ``` + + If the `alive` status of the BE node is `true`, and the value of `Version` is the new version, the node upgrade is successful -**The name of the BE binary that appears in this doc is `doris_be`, which was `palo_be` in previous versions.** +6. Complete the upgrade of other BE nodes in sequence -> **Note:** -> 1. Doris does not support upgrading across two-digit version numbers, for example: you cannot upgrade directly from 0.13 to 0.15, only through 0.13.x -> 0.14.x -> 0.15.x, and the three-digit version number can be upgraded across versions, such as from 0.13 .15 can be directly upgraded to 0.14.13.1, it is not necessary to upgrade 0.14.7 or 0.14.12.1 -> 2. The following approaches are based on highly available deployments. That is, data 3 replicas, FE high availability. +#### Upgrade FE -## Preparen +:::tip -1. Turn off the replica repair and balance operation. +Upgrade the non-Master nodes first, and then upgrade the Master nodes. - There will be node restarts during the upgrade process, so unnecessary cluster balancing and replica repair logic may be triggered. You can close it first with the following command: +::: - ``` - # Turn off the replica ealance logic. After it is closed, the balancing operation of the ordinary table replica will no longer be triggered. - $ mysql-client> admin set frontend config("disable_balance" = "true"); - - # Turn off the replica balance logic of the colocation table. After it is closed, the replica redistribution operation of the colocation table will no longer be triggered. - $ mysql-client> admin set frontend config("disable_colocate_balance" = "true"); - - # Turn off the replica scheduling logic. After shutting down, all generated replica repair and balancing tasks will no longer be scheduled. - $ mysql-client> admin set frontend config("disable_tablet_scheduler" = "true"); - ``` +1. In the case of multiple FE nodes, select a non-Master node to upgrade and stop running first - After the cluster is upgraded, just use the above command to set the corresponding configuration to the original value. + ```shell + sh ${DORIS_OLD_HOME}/fe/bin/stop_fe.sh + ``` -2. **important! ! Metadata needs to be backed up before upgrading(The entire directory needs to be backed up)! !** +2. Rename the `/bin`, `/lib` directories under the FE directory -## Test the correctness of BE upgrade + ```shell + mv ${DORIS_OLD_HOME}/fe/bin ${DORIS_OLD_HOME}/fe/bin_back + mv ${DORIS_OLD_HOME}/fe/lib ${DORIS_OLD_HOME}/fe/lib_back + ``` -1. Arbitrarily select a BE node and deploy the latest doris_be binary file. -2. Restart the BE node and check the BE log be.INFO to see if the boot was successful. -3. If the startup fails, you can check the reason first. If the error is not recoverable, you can delete the BE directly through DROP BACKEND, clean up the data, and restart the BE using the previous version of doris_be. Then re-ADD BACKEND. (**This method will result in the loss of a copy of the data, please make sure that three copies are complete, and perform this operation!!!**) -4. Install Java UDF function -Install Java UDF function: , because Java UDF function is supported from version 1.2, you need to download the JAR package of Java UDF function from the official website and put it in the lib directory of BE, otherwise it may will fail to start. +3. Copy the new version of `/bin`, `/lib` directory to the original FE directory -## Testing FE Metadata Compatibility + ```shell + cp ${DORIS_NEW_HOME}/fe/bin ${DORIS_OLD_HOME}/fe/bin + cp ${DORIS_NEW_HOME}/fe/lib ${DORIS_OLD_HOME}/fe/lib + ``` -0. **Important! Exceptional metadata compatibility is likely to cause data cannot be restored!!** -1. Deploy a test FE process (It is recommended to use your own local development machine, or BE node. If it is on the Follower or Observer node, you need to stop the started process, but it is not recommended to test on the Follower or Observer node) using the new version alone. -2. Modify the FE configuration file fe.conf for testing and set all ports to **different from online**. -3. Add configuration in fe.conf: cluster_id=123456 -4. Add configuration in fe.conf: metadata_failure_recovery=true -5. Copy the metadata directory doris-meta of the online environment master Fe to the test environment -6.The cluster_ID where copy to the doris-meta/image/VERSION file in the test environment is modified to 123456 (that is, the same as in Step 3) -7. In the test environment,running sh sh bin/start_fe.sh,start FE. -8. Observe whether the start-up is successful through FE log fe.log. -9. If the startup is successful, run sh bin/stop_fe.sh to stop the FE process of the test environment. -10. **The purpose of the above 2-6 steps is to prevent the FE of the test environment from being misconnected to the online environment after it starts.** +4. Start the BE node -**Note:** -1.1.x Before upgrading 1.2.x, you need to delete existing Native UDF ; otherwise, FE startup fails ; And since version 1.2 no longer supports Native UDF, please use [Java UDF](../../ecosystem/udf/java-user-defined-function.md). + ```shell + sh ${DORIS_OLD_HOME}/fe/bin/start_fe.sh --daemon + ``` -## Upgrade preparation +5. Link the cluster to view the node information -1. After data validation, the new version of BE and FE binary files are distributed to their respective directories. -2. In principle, the version upgrade needs to replace the lib directory and bin directory of FE and BE, and other directories except conf directory, data directory (doris-meta of FE, storage of BE), and log directory. + ```mysql + show frontends\G + ``` -## rolling upgrade + If the FE node `alive` status is `true`, and the value of `Version` is the new version, the node is upgraded successfully -1. Confirm that the new version of the file is deployed. Restart FE and BE instances one by one. -2. It is suggested that BE be restarted one by one and FE be restarted one by one. Because Doris usually guarantees backward compatibility between FE and BE, that is, the old version of FE can access the new version of BE. However, the old version of BE may not be supported to access the new version of FE. -3. It is recommended to restart the next instance after confirming the previous instance started successfully. Refer to the Installation Deployment Document for the identification of successful instance startup. +6. Complete the upgrade of other FE nodes in turn, **finally complete the upgrade of the Master node** -## About version rollback -Because the database is a stateful service, Doris cannot support version rollback (version downgrade) in most cases. In some cases, the rollback of the 3-bit or 4-bit version can be supported, but the rollback of the 2-bit version will not be supported. +#### Turn on the cluster replica repair and balance function -Therefore, it is recommended to upgrade some nodes and observe the business operation (gray upgrade) to reduce the upgrade risk. +After the upgrade is complete and all BE nodes become `Alive`, enable the cluster copy repair and balance function: -**Illegal rollback operation may cause data loss and damage.** +```sql +admin set frontend config("disable_balance" = "false"); +admin set frontend config("disable_colocate_balance" = "false"); +admin set frontend config("disable_tablet_scheduler" = "false"); +``` diff --git a/docs/zh-CN/docs/admin-manual/cluster-management/upgrade.md b/docs/zh-CN/docs/admin-manual/cluster-management/upgrade.md index 2ae98093ed2cd1..a47a481fb2a9b9 100644 --- a/docs/zh-CN/docs/admin-manual/cluster-management/upgrade.md +++ b/docs/zh-CN/docs/admin-manual/cluster-management/upgrade.md @@ -26,79 +26,284 @@ under the License. # 集群升级 -Doris 可以通过滚动升级的方式,平滑进行升级。建议按照以下步骤进行安全升级。 +## 概述 -**文中的出现的BE二进制文件名称 `doris_be`,在之前的版本中为 `palo_be`。** +升级请使用本章节中推荐的步骤进行集群升级,Doris 集群升级可使用**滚动升级**的方式进行升级,无需集群节点全部停机升级,极大程度上降低对上层应用的影响。 -> **注:** -> -> 1. Doris不支持跨两位版本号进行升级,例如:不能从0.13直接升级到0.15,只能通过0.13.x -> 0.14.x -> 0.15.x,三位版本号可以跨版本升级,比如从0.13.15可以直接升级到0.14.13.1,不必一定要升级0.14.7 或者 0.14.12.1这种版本 -> 1. 以下方式均建立在高可用部署的情况下。即数据 3 副本,FE 高可用情况下。 +## Doris 版本说明 -## 前置工作 +:::tip -1. 关闭集群副本修复和均衡功能 +Doris 升级请遵守**不要跨两个及以上关键节点版本升级**的原则,若要跨多个关键节点版本升级,先升级到最近的关键节点版本,随后再依次往后升级,若是非关键节点版本,则可忽略跳过。 - 升级过程中会有节点重启,所以可能会触发不必要的集群均衡和副本修复逻辑。可以先通过以下命令关闭: +关键节点版本:升级时必须要经历的版本,可能是单独一个版本,也可能是一个版本区间,如 `1.1.3 - 1.1.5`,则表示升级至该区间任意一版本即可继续后续升级。 +::: + +| 版本号 | 关键节点版本 | LTS 版本 | +| ------------------------ | ------------ | -------- | +| 0.12.x | 是 | 否 | +| 0.13.x | 是 | 否 | +| 0.14.x | 是 | 否 | +| 0.15.x | 是 | 否 | +| 1.0.0 - 1.1.2 | 否 | 否 | +| 1.1.3 - 1.1.5 | 是 | 1.1-LTS | +| 1.2.0 - 1.2.5 | 是 | 1.2-LTS | +| 2.0.0-alpha - 2.0.0-beta | 是 | 2.0-LTS | + +示例: + +当前版本为 `0.12`,升级到 `2.0.0-beta` 版本的升级路线 + +`0.12` -> `0.13` -> `0.14` -> `0.15` -> `1.1.3 - 1.1.5` 任意版本 -> `1.2.0 - 1.2.5` 任意版本 -> `2.0.0-beta` + +:::tip + +LTS 版本:Long-time Support,LTS 版本提供长期支持,会持续维护六个月以上,通常而言,**版本号第三位数越大的版本,稳定性越好**。 + +Alpha 版本:内部测试版本,功能还未完全确定,或许存在重大 BUG,只推荐上测试集群做测试,**不推荐上生产集群!** + +Beta 版本:公开测试版本,功能已基本确定,或许存在非重大 BUG,只推荐上测试集群做测试,**不推荐上生产集群!** + +Release 版本:公开发行版,已完成基本重要 BUG 的修复和功能性缺陷修复验证,推荐上生产集群。 + +::: + +## 升级步骤 + +### 升级说明 + +1. 在升级过程中,由于 Doris 的 RoutineLoad、Flink-Doris-Connector、Spark-Doris-Connector 都已在代码中实现了重试机制,所以在多 BE 节点的集群中,滚动升级不会导致任务失败。 +2. StreamLoad 任务需要您在自己的代码中实现重试机制,否则会导致任务失败。 +3. 集群副本修复和均衡功能在单次升级任务中务必要前置关闭和结束后打开,无论您集群节点是否全部升级完成。 + +### 升级流程概览 + +1. 元数据备份 +2. 关闭集群副本修复和均衡功能 +3. 兼容性测试 +4. 升级 BE +5. 升级 FE +6. 打开集群副本修复和均衡功能 + +### 升级前置工作 + +请按升级流程顺次执行升级 + +#### 元数据备份(重要) + +**将 FE-Master 节点的 `doris-meta` 目录进行完整备份!** + +#### 关闭集群副本修复和均衡功能 + +升级过程中会有节点重启,所以可能会触发不必要的集群均衡和副本修复逻辑,先通过以下命令关闭: + +```sql +admin set frontend config("disable_balance" = "true"); +admin set frontend config("disable_colocate_balance" = "true"); +admin set frontend config("disable_tablet_scheduler" = "true"); +``` + +#### 兼容性测试 + +:::tip + +**元数据兼容非常重要,如果因为元数据不兼容导致的升级失败,那可能会导致数据丢失!建议每次升级前都进行元数据兼容性测试!** + +::: + +##### FE 兼容性测试 + +:::tip + +**重要** + +1. 建议在自己本地的开发机,或者 BE 节点做 FE 兼容性测试。 + +2. 不建议在 Follower 或者 Observer 节点上测试,避免出现链接异常 +3. 如果一定在 Follower 或者 Observer 节点上,需要停止已启动的 FE 进程 + +::: + +1. 单独使用新版本部署一个测试用的 FE 进程 + + ```shell + sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon ``` - # 关闭副本均衡逻辑。关闭后,不会再触发普通表副本的均衡操作。 - $ mysql-client > admin set frontend config("disable_balance" = "true"); - - # 关闭 colocation 表的副本均衡逻辑。关闭后,不会再触发 colocation 表的副本重分布操作。 - $ mysql-client > admin set frontend config("disable_colocate_balance" = "true"); - - # 关闭副本调度逻辑。关闭后,所有已产生的副本修复和均衡任务不会再被调度。 - $ mysql-client > admin set frontend config("disable_tablet_scheduler" = "true"); + +2. 修改测试用的 FE 的配置文件 fe.conf + + ```shell + vi ${DORIS_NEW_HOME}/conf/fe.conf ``` - 当集群升级完毕后,在通过以上命令将对应配置设为原值即可。 + 修改以下端口信息,将**所有端口**设置为**与线上不同** -2. **重要!!在升级之前需要备份元数据(整个目录都需要备份)!!** + ```shell + ... + http_port = 18030 + rpc_port = 19020 + query_port = 19030 + edit_log_port = 19010 + ... + ``` -## 测试 BE 升级正确性 + 保存并退出 -1. 任意选择一个 BE 节点,部署最新的 doris_be 二进制文件。 +3. 在 fe.conf 添加 ClusterID 配置 -2. 重启 BE 节点,通过 BE 日志 be.INFO,查看是否启动成功。 + ```shell + echo "cluster_id=123456" >> ${DORIS_NEW_HOME}/conf/fe.conf + ``` -3. 如果启动失败,可以先排查原因。如果错误不可恢复,可以直接通过 DROP BACKEND 删除该 BE、清理数据后,使用上一个版本的 doris_be 重新启动 BE。然后重新 ADD BACKEND。(**该方法会导致丢失一个数据副本,请务必确保3副本完整的情况下,执行这个操作!!!**) +4. 在 fe.conf 添加元数据故障恢复配置 -4. 安装 Java UDF 函数 + ```shell + echo "metadata_failure_recovery=true" >> ${DORIS_NEW_HOME}/conf/fe.conf + ``` + +5. 拷贝线上环境 Master FE 的元数据目录 doris-meta 到测试环境 - 安装 Java UDF 函数: , 因为从1.2 版本开始支持Java UDF 函数,需要从官网下载 Java UDF 函数的 JAR 包放到 BE 的 lib 目录下,否则可能会启动失败。 + ```shell + cp ${DORIS_OLD_HOME}/fe/doris-meta/* ${DORIS_NEW_HOME}/fe/doris-meta + ``` +6. 将拷贝到测试环境中的 VERSION 文件中的 cluster_id 修改为 123456(即与第3步中相同) -## 测试 FE 元数据兼容性 + ```shell + vi ${DORIS_NEW_HOME}/fe/doris-meta/image/VERSION + clusterId=123456 + ``` -0. **重要!!元数据兼容性异常很可能导致数据无法恢复!!** -1. 单独使用新版本部署一个测试用的 FE 进程(建议在自己本地的开发机,或者BE节点。如果在Follower或者Observer节点上,需要停止启动的进程,但是不建议在Follower或者Observer节点上测试)。 -2. 修改测试用的 FE 的配置文件 fe.conf,将所有端口设置为**与线上不同**。 -3. 在 fe.conf 添加配置:cluster_id=123456 -4. 在 fe.conf 添加配置:metadata_failure_recovery=true -5. 拷贝线上环境 Master FE 的元数据目录 doris-meta 到测试环境 -6. 将拷贝到测试环境中的 doris-meta/image/VERSION 文件中的 cluster_id 修改为 123456(即与第3步中相同) -7. 在测试环境中,运行 sh bin/start_fe.sh 启动 FE -8. 通过 FE 日志 fe.log 观察是否启动成功。 -9. 如果启动成功,运行 sh bin/stop_fe.sh 停止测试环境的 FE 进程。 -10. **以上 2-6 步的目的是防止测试环境的FE启动后,错误连接到线上环境中。** +7. 在测试环境中,运行启动 FE + + ```shell + sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon + ``` + +8. 通过 FE 日志 fe.log 观察是否启动成功 + + ```shell + tail -f ${DORIS_NEW_HOME}/log/fe.log + ``` + +9. 如果启动成功,则代表兼容性没有问题,停止测试环境的 FE 进程,准备升级 -**注:** -1.1.x 版本升级 1.2.x 版本时,需要先删除已有的原生 UDF ;否则会导致FE启动失败;并且1.2版本开始不再对原生 UDF提供支持,请使用 [Java UDF](../../ecosystem/udf/java-user-defined-function.md)。 + ``` + sh ${DORIS_NEW_HOME}/bin/stop_fe.sh + ``` + +##### BE 兼容性测试 + +可利用灰度升级方案,先升级单个 BE,无异常和报错情况下即视为兼容性正常,可执行后续升级动作 + +### 升级流程 + +:::tip + +先升级 BE,后升级FE + +一般而言,Doris 只需要升级 FE 目录下的 `/bin` 和 `/lib` 以及 BE 目录下的 `/bin` 和 `/lib` + +但是在大版本升级时,可能会有新的特性增加或者老功能的重构,这些修改可能会需要升级时**替换/新增**更多的目录来保证所有新功能的可用性,请大版本升级时仔细关注该版本的 Release-Note,以免出现升级故障 + +::: + +#### 升级 BE + +:::tip + +为了保证您的数据安全,请使用 3 副本来存储您的数据,以避免升级误操作或失败导致的数据丢失问题 + +::: + +1. 在多副本的前提下,选择一台 BE 节点停止运行,进行灰度升级 + + ```shell + sh ${DORIS_OLD_HOME}/be/bin/stop_be.sh + ``` + +2. 重命名 BE 目录下的 `/bin`,`/lib` 目录 + + ```shell + mv ${DORIS_OLD_HOME}/be/bin ${DORIS_OLD_HOME}/be/bin_back + mv ${DORIS_OLD_HOME}/be/lib ${DORIS_OLD_HOME}/be/lib_back + ``` + +3. 复制新版本的 `/bin`,`/lib` 目录到原 BE 目录下 + + ```shell + cp ${DORIS_NEW_HOME}/be/bin ${DORIS_OLD_HOME}/be/bin + cp ${DORIS_NEW_HOME}/be/lib ${DORIS_OLD_HOME}/be/lib + ``` + +4. 启动该 BE 节点 + + ```shell + sh ${DORIS_OLD_HOME}/be/bin/start_be.sh --daemon + ``` + +5. 链接集群,查看该节点信息 + + ```mysql + show backends\G + ``` + + 若该 BE 节点 `alive` 状态为 `true`,且 `Version` 值为新版本,则该节点升级成功 + +6. 依次完成其他 BE 节点升级 + +#### 升级 FE + +:::tip + +先升级非 Master 节点,后升级 Master 节点。 + +::: + +1. 多个 FE 节点情况下,选择一个非 Master 节点进行升级,先停止运行 + + ```shell + sh ${DORIS_OLD_HOME}/fe/bin/stop_fe.sh + ``` + +2. 重命名 FE 目录下的 `/bin`,`/lib` 目录 + + ```shell + mv ${DORIS_OLD_HOME}/fe/bin ${DORIS_OLD_HOME}/fe/bin_back + mv ${DORIS_OLD_HOME}/fe/lib ${DORIS_OLD_HOME}/fe/lib_back + ``` + +3. 复制新版本的 `/bin`,`/lib` 目录到原 FE 目录下 + + ```shell + cp ${DORIS_NEW_HOME}/fe/bin ${DORIS_OLD_HOME}/fe/bin + cp ${DORIS_NEW_HOME}/fe/lib ${DORIS_OLD_HOME}/fe/lib + ``` + +4. 启动该 BE 节点 + + ```shell + sh ${DORIS_OLD_HOME}/fe/bin/start_fe.sh --daemon + ``` + +5. 链接集群,查看该节点信息 + + ```mysql + show frontends\G + ``` -## 升级准备 + 若该 FE 节点 `alive` 状态为 `true`,且 `Version` 值为新版本,则该节点升级成功 -1. 在完成数据正确性验证后,将 BE 和 FE 新版本的二进制文件分发到各自目录下。 -2. 原则上版本升级需要替换 FE 和 BE 的 lib 目录以及 bin 目录,和除 conf 目录、数据目录(FE 的 doris-meta,BE 的 storage)、log 目录外的其他目录。 +6. 依次完成其他 FE 节点升级,**最后完成 Master 节点的升级** -## 滚动升级 +#### 打开集群副本修复和均衡功能 -1. 确认新版本的文件部署完成后。逐台重启 FE 和 BE 实例即可。 -2. 建议逐台重启 BE 后,再逐台重启 FE。因为通常 Doris 保证 FE 到 BE 的向后兼容性,即老版本的 FE 可以访问新版本的 BE。但可能不支持老版本的 BE 访问新版本的 FE。 -3. 建议确认前一个实例启动成功后,再重启下一个实例。实例启动成功的标识,请参阅安装部署文档。 +升级完成,并且所有 BE 节点状态变为 `Alive` 后,打开集群副本修复和均衡功能: -## 关于版本回滚 -因为数据库是一个有状态的服务,所以在大多数情况下,Doris 无法支持版本回滚(版本降级)。在某些情况下,可以支持 3 位或 4 位版本的回滚,但不会支持 2 位版本的回滚。 -所以建议通过先升级部分节点并观察业务运行情况的方式(灰度升级)来降低升级风险。 +```sql +admin set frontend config("disable_balance" = "false"); +admin set frontend config("disable_colocate_balance" = "false"); +admin set frontend config("disable_tablet_scheduler" = "false"); +``` -**非法的回滚操作可能导致数据丢失和损坏。**