diff --git a/docs/en/docs/data-table/data-model.md b/docs/en/docs/data-table/data-model.md index f9225a4dac33aaf..76756ee57760f6f 100644 --- a/docs/en/docs/data-table/data-model.md +++ b/docs/en/docs/data-table/data-model.md @@ -411,14 +411,22 @@ Please note that `agg_state` comes with a certain performance overhead. ## Unique Model -In some multidimensional analysis scenarios, users are highly concerned about how to ensure the uniqueness of the Key, -that is, how to create uniqueness constraints for the Primary Key. Therefore, we introduce the Unique Model. Prior to Doris 1.2, -the Unique Model was essentially a special case of the Aggregate Model and a simplified representation of table schema. -The Aggregate Model is implemented by Merge on Read, so it might not deliver high performance in some aggregation queries -(see the [Limitations of Aggregate Model](#limitations-of-aggregate-model) section). In Doris 1.2, -we have introduced a new implementation for the Unique Model--Merge on Write, which can help achieve optimal query performance. -For now, Merge on Read and Merge on Write will coexist in the Unique Model for a while, but in the future, -we plan to make Merge on Write the default implementation of the Unique Model. The following will illustrate the two implementations with examples. +When users have data update requirement, they can choose to use the Unique data model. The Unique model ensures the uniqueness of keys, and when a user updates a piece of data, the newly written data will overwrite the old data with the same key. + +**Two Implementation Methods** + +The Unique model provides two implementation methods: + +- Merge-on-read: In the merge-on-read implementation, no data deduplication-related operations are triggered when writing data. All data deduplication operations occur during queries or compaction. Therefore, merge-on-read has better write performance, poorer query performance, and higher memory consumption. +- Merge-on-write: In version 1.2, we introduced the merge-on-write implementation, which performs all data deduplication during the data writing phase, providing excellent query performance. + +Since version 2.0, merge-on-write has become a mature and stable, due to its excellent query performance, we recommend the majority of users to choose this implementation. Starting from version 2.1, merge-on-write has become the default implementation for the Unique model. +For detailed differences between the two implementation methods, refer to the subsequent sections in this chapter. For performance differences between the two implementation methods, see the description in the following section [Limitations of Aggregate Model](#limitations-of-aggregate-model). + +**Semantic of Data Updates** + +- The default update semantic for the Unique model is **whole-row `UPSERT`**, meaning UPDATE OR INSERT. If the key of a row of data exists, it is updated; if it does not exist, new data is inserted. Under the whole-row `UPSERT` semantic, even if users use `insert into` to write into specific columns, Doris will fill in the columns not provided with NULL values or default values in the Planner. +- Partial column updates: If users want to update only specific fields, they need to use the merge-on-write implementation and enable support for partial column updates through specific parameters. Refer to the documentation [Partial Column Updates](../data-operate/update-delete/partial-update.md) for relevant usage recommendations. ### Merge on Read ( Same Implementation as Aggregate Model) @@ -491,7 +499,7 @@ That is to say, the Merge on Read implementation of the Unique Model is equivale ### Merge on Write -The Merge on Write implementation of the Unique Model is completely different from that of the Aggregate Model. It can deliver better performance in aggregation queries with primary key limitations. +The Merge on Write implementation of the Unique Model can deliver better performance in aggregation queries with primary key limitations. In Doris 1.2.0, as a new feature, Merge on Write is disabled by default(before version 2.1), and users can enable it by adding the following property: @@ -501,9 +509,11 @@ In Doris 1.2.0, as a new feature, Merge on Write is disabled by default(before v In Doris 2.1, Merge on Write is enabled by default. -> NOTE: -> 1. It is recommended to use version 1.2.4 or above, as this version has fixed some bugs and stability issues. -> 2. Add the configuration item "disable_storage_page_cache=false" to the be.conf file. Failure to add this configuration item may have a significant impact on data load performance. +> Note: +> 1. For users on version 1.2: +> 1. It is recommended to use version 1.2.4 or above, as this version addresses some bugs and stability issues. +> 2. Add the configuration item `disable_storage_page_cache=false` in `be.conf`. Failure to add this configuration item may significantly impact data import performance. +> 2. For new users, it is strongly recommended to use version 2.0 or above. In version 2.0, there has been a significant improvement and optimization in the performance and stability of merge-on-write. Take the previous table as an example, the corresponding to CREATE TABLE statement should be: @@ -545,9 +555,8 @@ On a Unique table with the Merge on Write option enabled, during the import stag [NOTE] -1. The Merge on Write implementation is disabled by default can only be enabled by specifying a property when creating a new table. Before version 2.1, it's disabled by default. Since version 2.1, it's enabled by default. +1. The implementation method of a Unique table can only be determined during table creation and cannot be modified through schema changes. 2. The old Merge on Read cannot be seamlessly upgraded to the Merge on Write implementation (since they have completely different data organization). If you want to switch to the Merge on Write implementation, you need to manually execute `insert into unique-mow-table select * from source table` to load data to new table. -3. The two unique features `delete sign` and `sequence col` of the Unique Model can be used as normal in the new implementation, and their usage remains unchanged. diff --git a/docs/zh-CN/docs/data-table/data-model.md b/docs/zh-CN/docs/data-table/data-model.md index b2b72bdc11fcc4e..41b8c6b9e7b58b6 100644 --- a/docs/zh-CN/docs/data-table/data-model.md +++ b/docs/zh-CN/docs/data-table/data-model.md @@ -414,11 +414,22 @@ mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate where k1 != 2; ## Unique 模型 -在某些多维分析场景下,用户更关注的是如何保证 Key 的唯一性,即如何获得 Primary Key 唯一性约束。 -因此,我们引入了 Unique 数据模型。在1.2版本之前,该模型本质上是聚合模型的一个特例,也是一种简化的表结构表示方式。 -由于聚合模型的实现方式是读时合并(merge on read),因此在一些聚合查询上性能不佳(参考后续章节[聚合模型的局限性](#聚合模型的局限性)的描述), -在1.2版本我们引入了Unique模型新的实现方式,写时合并(merge on write),通过在写入时做一些额外的工作,实现了最优的查询性能。 -写时合并将在未来替换读时合并成为Unique模型的默认实现方式,两者将会短暂的共存一段时间。下面将对两种实现方式分别举例进行说明。 +当用户有数据更新需求时,可以选择使用Unique数据模型。Unique模型能够保证Key的唯一性,当用户更新一条数据时,新写入的数据会覆盖具有相同key的旧数据。 + +**两种实现方式** + +Unique模型提供了两种实现方式: + +- 读时合并(merge-on-read)。在读时合并实现中,用户在进行数据写入时不会触发任何数据去重相关的操作,所有数据去重的操作都在查询或者compaction时进行。因此,读时合并的写入性能较好,查询性能较差,同时内存消耗也较高。 +- 写时合并(merge-on-write)。在1.2版本中,我们引入了写时合并实现,该实现会在数据写入阶段完成所有数据去重的工作,因此能够提供非常好的查询性能。 + +自2.0版本起,写时合并已经非常成熟稳定, 由于其优秀的查询性能,我们推荐大部分用户选择该实现。自2.1版本其,写时合并成为Unique模型的默认实现 +关于两种实现方式的详细区别,用户可以本章节后续内容的介绍。关于两种实现方式的性能差异,参考后续章节[聚合模型的局限性](#聚合模型的局限性)的描述。 + +**数据更新的语意** + +- Unique模型默认的更新语意为**整行`UPSERT`**,即UPDATE OR INSERT,该行数据的key如果存在,则进行更新,如果不存在,则进行新数据插入。在整行`UPSERT`语意下,即使用户使用insert into指定部分列进行写入,Doris也会在Planner中将未提供的列使用NULL值或者默认值进行填充 +- 部分列更新。如果用户希望更新部分字段,需要使用写时合并实现,并通过特定的参数来开启部分列更新的支持。请查阅文档[部分列更新](../data-operate/update-delete/partial-update.md)获取相关使用建议 ### 读时合并(与聚合模型相同的实现方式) @@ -494,7 +505,7 @@ PROPERTIES ( ### 写时合并 -Unique模型的写时合并实现,与聚合模型就是完全不同的两种模型了,查询性能更接近于duplicate模型,在有主键约束需求的场景上相比聚合模型有较大的查询性能优势,尤其是在聚合查询以及需要用索引过滤大量数据的查询中。 +Unique模型的写时合并实现,查询性能更接近于duplicate模型,在有主键约束需求的场景上相比聚合模型有较大的查询性能优势,尤其是在聚合查询以及需要用索引过滤大量数据的查询中。 在 1.2.0 版本中,作为一个新的feature,写时合并默认关闭(2.1 版本之前),用户可以通过添加下面的property来开启 @@ -505,8 +516,10 @@ Unique模型的写时合并实现,与聚合模型就是完全不同的两种 从 2.1 版本开始,写时合并默认开启。 > 注意: -> 1. 建议使用1.2.4及以上版本,该版本修复了一些bug和稳定性问题 -> 2. 在be.conf中添加配置项:disable_storage_page_cache=false。不添加该配置项可能会对数据导入性能产生较大影响 +> 1. 对于1.2的用户 +> 1. 建议使用1.2.4及以上版本,该版本修复了一些bug和稳定性问题。 +> 2. 在be.conf中添加配置项:disable_storage_page_cache=false。不添加该配置项可能会对数据导入性能产生较大影响 +> 1. 对于新用户,强烈推荐使用2.0以上版本。在2.0版本中,写时合并的性能和稳定性都有大幅的提升和优化 仍然以上面的表为例,建表语句为 @@ -547,9 +560,8 @@ PROPERTIES ( 所有被标记删除的数据都会在文件级别被过滤掉,读取出来的数据就都是最新的数据,消除掉了读时合并中的数据聚合过程,并且能够在很多情况下支持多种谓词的下推。因此在许多场景都能带来比较大的性能提升,尤其是在有聚合查询的情况下。 【注意】 -1. 要使用Merge-on-write实现的unique表,只能在建表时通过指定property的方式打开。在2.1版本之前该属性默认关闭,从2.1版本开始,该属性默认打开。 +1. Unique表的实现方式只能在建表时确定,无法通过schema change进行修改。 2. 旧的Merge-on-read的实现无法无缝升级到Merge-on-write的实现(数据组织方式完全不同),如果需要改为使用写时合并的实现版本,需要手动执行`insert into unique-mow-table select * from source table`. -3. 在Unique模型上独有的delete sign 和 sequence col,在写时合并的新版实现中仍可以正常使用,用法没有变化。