Skip to content

Commit

Permalink
[typo](docs) Refactor upgrade documentation (apache#21449)
Browse files Browse the repository at this point in the history
Co-authored-by: Yijia Su <[email protected]>
  • Loading branch information
FreeOnePlus and Yijia Su authored Jul 3, 2023
1 parent bb33ad0 commit 5e6242e
Show file tree
Hide file tree
Showing 2 changed files with 515 additions and 102 deletions.
312 changes: 260 additions & 52 deletions docs/en/docs/admin-manual/cluster-management/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,77 +24,285 @@ specific language governing permissions and limitations
under the License.
-->


# Cluster upgrade

Doris can upgrade smoothly by rolling upgrades. The following steps are recommended for security upgrade.
## Overview

To upgrade, please use the steps recommended in this chapter to upgrade the cluster. The Doris cluster upgrade can be upgraded using the **rolling upgrade** method, which does not require all cluster nodes to be shut down for upgrade, which greatly reduces the impact on upper-layer applications.

## Doris Release Notes

:::tip

For Doris upgrade, please follow the principle of **Do not upgrade across two or more key node versions**. If you want to upgrade across multiple key node versions, first upgrade to the nearest key node version, and then upgrade in turn. If it is not critical node version, it can be ignored and skipped.

Key node version: the version that must be experienced when upgrading, it may be a single version, or a version range, such as `1.1.3 - 1.1.5`, it means that you can continue to upgrade after upgrading to any version in this range .

:::

| Version number | Key node version | LTS version |
| ------------------------ | ------------ | -------- |
| 0.12.x | Yes | No |
| 0.13.x | Yes | No |
| 0.14.x | Yes | No |
| 0.15.x | Yes | No |
| 1.0.0 - 1.1.2 | No | No |
| 1.1.3 - 1.1.5 | Yes | 1.1-LTS |
| 1.2.0 - 1.2.5 | Yes | 1.2-LTS |
| 2.0.0-alpha - 2.0.0-beta | Yes | 2.0-LTS |

Example:

The current version is `0.12`, upgrade route to `2.0.0-beta` version

`0.12` -> `0.13` -> `0.14` -> `0.15` -> `1.1.3 - 1.1.5` any version -> `1.2.0 - 1.2.5` any version -> `2.0.0 -beta`

:::tip

LTS version: Long-time Support, LTS version provides long-term support and will be maintained for more than six months. Generally speaking, the version with the larger third digit of the version number is more stable**.

Alpha version: an internal test version, the function has not been fully determined, and there may be major bugs. It is only recommended to use the test cluster for testing, ** it is not recommended to use the production cluster! **

Beta version: public test version, the function has been basically confirmed, there may be non-major bugs, it is only recommended to use the test cluster for testing, ** it is not recommended to use the production cluster! **

Release version: a public release version, which has completed the repair of basic important bugs and verification of functional defect fixes, and is recommended for production clusters.

:::

## Upgrade steps

### Upgrade Instructions

1. During the upgrade process, since Doris's RoutineLoad, Flink-Doris-Connector, and Spark-Doris-Connector have implemented a retry mechanism in the code, in a multi-BE node cluster, the rolling upgrade will not cause the task to fail .
2. The StreamLoad task requires you to implement a retry mechanism in your own code, otherwise the task will fail.
3. The cluster copy repair and balance function must be closed before and opened after the completion of a single upgrade task, regardless of whether all your cluster nodes have been upgraded.

### Overview of the upgrade process

1. Metadata backup
2. Turn off the cluster copy repair and balance function
3. Compatibility testing
4. Upgrade BE
5. Upgrade FE
6. Turn on the cluster replica repair and balance function

### Upgrade pre-work

Please perform the upgrade in sequence according to the upgrade process

#### metadata backup (important)

** Make a full backup of the `doris-meta` directory of the FE-Master node! **

#### Turn off the cluster replica repair and balance function

There will be node restart during the upgrade process, so unnecessary cluster balancing and replica repair logic may be triggered, first close it with the following command:

```sql
admin set frontend config("disable_balance" = "true");
admin set frontend config("disable_colocate_balance" = "true");
admin set frontend config("disable_tablet_scheduler" = "true");
```

#### Compatibility testing

:::tip

**Metadata compatibility is very important, if the upgrade fails due to incompatible metadata, it may lead to data loss! It is recommended to perform a metadata compatibility test before each upgrade! **

:::

##### FE Compatibility Test

:::tip

**important**

1. It is recommended to do FE compatibility test on your local development machine or BE node.

2. It is not recommended to test on Follower or Observer nodes to avoid link exceptions
3. If it must be on the Follower or Observer node, the started FE process needs to be stopped

:::

1. Use the new version alone to deploy a test FE process

```shell
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon
```

2. Modify the FE configuration file fe.conf for testing

```shell
vi ${DORIS_NEW_HOME}/conf/fe.conf
```

Modify the following port information, set **all ports** to **different from online**

```shell
...
http_port = 18030
rpc_port = 19020
query_port = 19030
edit_log_port = 19010
...
```

save and exit

3. Add ClusterID configuration in fe.conf

```shell
echo "cluster_id=123456" >> ${DORIS_NEW_HOME}/conf/fe.conf
```

4. Add metadata failover configuration in fe.conf

```shell
echo "metadata_failure_recovery=true" >> ${DORIS_NEW_HOME}/conf/fe.conf
```

5. Copy the metadata directory doris-meta of the online environment Master FE to the test environment

```shell
cp ${DORIS_OLD_HOME}/fe/doris-meta/* ${DORIS_NEW_HOME}/fe/doris-meta
```

6. Change the cluster_id in the VERSION file copied to the test environment to 123456 (that is, the same as in step 3)

```shell
vi ${DORIS_NEW_HOME}/fe/doris-meta/image/VERSION
clusterId=123456
```

7. In the test environment, run the startup FE

```shell
sh ${DORIS_NEW_HOME}/bin/start_fe.sh --daemon
```

8. Observe whether the startup is successful through the FE log fe.log

```shell
tail -f ${DORIS_NEW_HOME}/log/fe.log
```

9. If the startup is successful, it means that there is no problem with the compatibility, stop the FE process of the test environment, and prepare for the upgrade

```
sh ${DORIS_NEW_HOME}/bin/stop_fe.sh
```

##### BE Compatibility Test

You can use the grayscale upgrade scheme to upgrade a single BE first. If there is no exception or error, the compatibility is considered normal, and subsequent upgrade actions can be performed

### Upgrade process

:::tip

Upgrade BE first, then FE

Generally speaking, Doris only needs to upgrade `/bin` and `/lib` under the FE directory and `/bin` and `/lib` under the BE directory

However, when a major version is upgraded, new features may be added or old functions refactored. These modifications may require **replace/add** more directories during the upgrade to ensure the availability of all new features. Please Carefully pay attention to the Release-Note of this version when upgrading the version to avoid upgrade failures

:::

#### Upgrade BE

:::tip

In order to ensure the safety of your data, please use 3 copies to store your data to avoid data loss caused by misoperation or failure of the upgrade

:::

1. Under the premise of multiple copies, select a BE node to stop running and perform grayscale upgrade

```shell
sh ${DORIS_OLD_HOME}/be/bin/stop_be.sh
```

2. Rename the `/bin`, `/lib` directories under the BE directory

```shell
mv ${DORIS_OLD_HOME}/be/bin ${DORIS_OLD_HOME}/be/bin_back
mv ${DORIS_OLD_HOME}/be/lib ${DORIS_OLD_HOME}/be/lib_back
```

3. Copy the new version of `/bin`, `/lib` directory to the original BE directory

```shell
cp ${DORIS_NEW_HOME}/be/bin ${DORIS_OLD_HOME}/be/bin
cp ${DORIS_NEW_HOME}/be/lib ${DORIS_OLD_HOME}/be/lib
```

4. Start the BE node

```shell
sh ${DORIS_OLD_HOME}/be/bin/start_be.sh --daemon
```

5. Link the cluster to view the node information

```mysql
show backends\G
```

If the `alive` status of the BE node is `true`, and the value of `Version` is the new version, the node upgrade is successful

**The name of the BE binary that appears in this doc is `doris_be`, which was `palo_be` in previous versions.**
6. Complete the upgrade of other BE nodes in sequence

> **Note:**
> 1. Doris does not support upgrading across two-digit version numbers, for example: you cannot upgrade directly from 0.13 to 0.15, only through 0.13.x -> 0.14.x -> 0.15.x, and the three-digit version number can be upgraded across versions, such as from 0.13 .15 can be directly upgraded to 0.14.13.1, it is not necessary to upgrade 0.14.7 or 0.14.12.1
> 2. The following approaches are based on highly available deployments. That is, data 3 replicas, FE high availability.
#### Upgrade FE

## Preparen
:::tip

1. Turn off the replica repair and balance operation.
Upgrade the non-Master nodes first, and then upgrade the Master nodes.

There will be node restarts during the upgrade process, so unnecessary cluster balancing and replica repair logic may be triggered. You can close it first with the following command:
:::

```
# Turn off the replica ealance logic. After it is closed, the balancing operation of the ordinary table replica will no longer be triggered.
$ mysql-client> admin set frontend config("disable_balance" = "true");
# Turn off the replica balance logic of the colocation table. After it is closed, the replica redistribution operation of the colocation table will no longer be triggered.
$ mysql-client> admin set frontend config("disable_colocate_balance" = "true");
# Turn off the replica scheduling logic. After shutting down, all generated replica repair and balancing tasks will no longer be scheduled.
$ mysql-client> admin set frontend config("disable_tablet_scheduler" = "true");
```
1. In the case of multiple FE nodes, select a non-Master node to upgrade and stop running first

After the cluster is upgraded, just use the above command to set the corresponding configuration to the original value.
```shell
sh ${DORIS_OLD_HOME}/fe/bin/stop_fe.sh
```

2. **important! ! Metadata needs to be backed up before upgrading(The entire directory needs to be backed up)! !**
2. Rename the `/bin`, `/lib` directories under the FE directory

## Test the correctness of BE upgrade
```shell
mv ${DORIS_OLD_HOME}/fe/bin ${DORIS_OLD_HOME}/fe/bin_back
mv ${DORIS_OLD_HOME}/fe/lib ${DORIS_OLD_HOME}/fe/lib_back
```

1. Arbitrarily select a BE node and deploy the latest doris_be binary file.
2. Restart the BE node and check the BE log be.INFO to see if the boot was successful.
3. If the startup fails, you can check the reason first. If the error is not recoverable, you can delete the BE directly through DROP BACKEND, clean up the data, and restart the BE using the previous version of doris_be. Then re-ADD BACKEND. (**This method will result in the loss of a copy of the data, please make sure that three copies are complete, and perform this operation!!!**)
4. Install Java UDF function
<version since="1.2.0">Install Java UDF function: </version>, because Java UDF function is supported from version 1.2, you need to download the JAR package of Java UDF function from the official website and put it in the lib directory of BE, otherwise it may will fail to start.
3. Copy the new version of `/bin`, `/lib` directory to the original FE directory

## Testing FE Metadata Compatibility
```shell
cp ${DORIS_NEW_HOME}/fe/bin ${DORIS_OLD_HOME}/fe/bin
cp ${DORIS_NEW_HOME}/fe/lib ${DORIS_OLD_HOME}/fe/lib
```

0. **Important! Exceptional metadata compatibility is likely to cause data cannot be restored!!**
1. Deploy a test FE process (It is recommended to use your own local development machine, or BE node. If it is on the Follower or Observer node, you need to stop the started process, but it is not recommended to test on the Follower or Observer node) using the new version alone.
2. Modify the FE configuration file fe.conf for testing and set all ports to **different from online**.
3. Add configuration in fe.conf: cluster_id=123456
4. Add configuration in fe.conf: metadata_failure_recovery=true
5. Copy the metadata directory doris-meta of the online environment master Fe to the test environment
6.The cluster_ID where copy to the doris-meta/image/VERSION file in the test environment is modified to 123456 (that is, the same as in Step 3)
7. In the test environment,running sh sh bin/start_fe.sh,start FE.
8. Observe whether the start-up is successful through FE log fe.log.
9. If the startup is successful, run sh bin/stop_fe.sh to stop the FE process of the test environment.
10. **The purpose of the above 2-6 steps is to prevent the FE of the test environment from being misconnected to the online environment after it starts.**
4. Start the BE node

**Note:**
1.1.x Before upgrading 1.2.x, you need to delete existing Native UDF ; otherwise, FE startup fails ; And since version 1.2 no longer supports Native UDF, please use [Java UDF](../../ecosystem/udf/java-user-defined-function.md).
```shell
sh ${DORIS_OLD_HOME}/fe/bin/start_fe.sh --daemon
```

## Upgrade preparation
5. Link the cluster to view the node information

1. After data validation, the new version of BE and FE binary files are distributed to their respective directories.
2. In principle, the version upgrade needs to replace the lib directory and bin directory of FE and BE, and other directories except conf directory, data directory (doris-meta of FE, storage of BE), and log directory.
```mysql
show frontends\G
```

## rolling upgrade
If the FE node `alive` status is `true`, and the value of `Version` is the new version, the node is upgraded successfully

1. Confirm that the new version of the file is deployed. Restart FE and BE instances one by one.
2. It is suggested that BE be restarted one by one and FE be restarted one by one. Because Doris usually guarantees backward compatibility between FE and BE, that is, the old version of FE can access the new version of BE. However, the old version of BE may not be supported to access the new version of FE.
3. It is recommended to restart the next instance after confirming the previous instance started successfully. Refer to the Installation Deployment Document for the identification of successful instance startup.
6. Complete the upgrade of other FE nodes in turn, **finally complete the upgrade of the Master node**

## About version rollback
Because the database is a stateful service, Doris cannot support version rollback (version downgrade) in most cases. In some cases, the rollback of the 3-bit or 4-bit version can be supported, but the rollback of the 2-bit version will not be supported.
#### Turn on the cluster replica repair and balance function

Therefore, it is recommended to upgrade some nodes and observe the business operation (gray upgrade) to reduce the upgrade risk.
After the upgrade is complete and all BE nodes become `Alive`, enable the cluster copy repair and balance function:

**Illegal rollback operation may cause data loss and damage.**
```sql
admin set frontend config("disable_balance" = "false");
admin set frontend config("disable_colocate_balance" = "false");
admin set frontend config("disable_tablet_scheduler" = "false");
```
Loading

0 comments on commit 5e6242e

Please sign in to comment.