Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](Nereids) support variant column with index when create table #32948

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

morrySnow
Copy link
Contributor

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@morrySnow
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38351 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 947216aef74762f1c91036d13ee72e74e7e1642b, data reload: false

------ Round 1 ----------------------------------
q1	17662	4182	4141	4141
q2	2124	170	167	167
q3	10567	1169	1195	1169
q4	10239	775	799	775
q5	7472	3030	2985	2985
q6	207	133	128	128
q7	1071	632	586	586
q8	9360	2053	1992	1992
q9	7235	6601	6608	6601
q10	8438	3551	3576	3551
q11	445	244	222	222
q12	382	214	200	200
q13	17797	2881	2863	2863
q14	241	205	205	205
q15	512	469	485	469
q16	502	374	377	374
q17	951	551	592	551
q18	7329	6529	6492	6492
q19	1533	1448	1495	1448
q20	558	264	250	250
q21	3641	2923	2876	2876
q22	359	306	316	306
Total cold run time: 108625 ms
Total hot run time: 38351 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4122	4102	4128	4102
q2	323	226	230	226
q3	2958	2865	2914	2865
q4	1846	1508	1540	1508
q5	5343	5355	5346	5346
q6	193	117	120	117
q7	2278	1902	1841	1841
q8	3155	3277	3263	3263
q9	8755	8686	8755	8686
q10	3788	3846	3798	3798
q11	549	447	443	443
q12	707	535	525	525
q13	16926	2893	2838	2838
q14	283	250	255	250
q15	496	477	467	467
q16	489	419	441	419
q17	1724	1495	1466	1466
q18	7394	7135	7151	7135
q19	1607	1519	1555	1519
q20	1933	1770	1735	1735
q21	4819	4704	4684	4684
q22	549	452	453	452
Total cold run time: 70237 ms
Total hot run time: 53685 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 182166 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 947216aef74762f1c91036d13ee72e74e7e1642b, data reload: false

query1	929	385	352	352
query2	6550	2094	2054	2054
query3	6706	214	224	214
query4	31841	21321	21318	21318
query5	4334	399	386	386
query6	263	180	194	180
query7	4638	294	308	294
query8	237	171	182	171
query9	9268	2365	2326	2326
query10	607	256	278	256
query11	15452	14235	14318	14235
query12	138	97	90	90
query13	1630	426	414	414
query14	10074	7957	7950	7950
query15	345	206	194	194
query16	8252	264	265	264
query17	2106	581	555	555
query18	2127	290	292	290
query19	371	153	160	153
query20	98	91	94	91
query21	204	130	125	125
query22	4930	4787	4819	4787
query23	33862	32602	32889	32602
query24	11730	2833	2849	2833
query25	648	392	389	389
query26	1792	161	162	161
query27	3043	358	362	358
query28	7766	1906	1897	1897
query29	1058	652	637	637
query30	311	156	150	150
query31	968	730	739	730
query32	107	58	59	58
query33	784	263	267	263
query34	1077	491	486	486
query35	837	612	614	612
query36	1040	870	878	870
query37	278	65	67	65
query38	3640	3433	3437	3433
query39	1472	1466	1439	1439
query40	292	115	117	115
query41	54	48	49	48
query42	110	99	96	96
query43	519	443	461	443
query44	1110	740	736	736
query45	310	273	270	270
query46	1116	707	702	702
query47	1908	1833	1831	1831
query48	458	362	363	362
query49	1258	352	337	337
query50	755	378	370	370
query51	6745	6612	6626	6612
query52	110	97	98	97
query53	350	280	277	277
query54	321	251	263	251
query55	90	79	79	79
query56	256	240	229	229
query57	1209	1146	1154	1146
query58	243	215	221	215
query59	2810	2744	2537	2537
query60	271	253	253	253
query61	115	115	117	115
query62	653	452	464	452
query63	316	283	277	277
query64	6858	4065	3892	3892
query65	3137	3069	3042	3042
query66	1421	375	355	355
query67	15618	14868	15087	14868
query68	9322	526	540	526
query69	676	394	397	394
query70	1350	1103	1160	1103
query71	521	266	262	262
query72	7000	2724	2533	2533
query73	1620	318	321	318
query74	8453	6381	6512	6381
query75	3935	2202	2261	2202
query76	5780	908	927	908
query77	639	266	260	260
query78	11158	10208	10166	10166
query79	11375	535	526	526
query80	1872	376	376	376
query81	510	221	209	209
query82	395	88	89	88
query83	230	143	144	143
query84	283	80	77	77
query85	1100	318	318	318
query86	353	287	293	287
query87	3782	3612	3474	3474
query88	4805	2300	2302	2300
query89	496	383	378	378
query90	2050	180	177	177
query91	201	143	140	140
query92	59	52	49	49
query93	6813	505	484	484
query94	1320	182	181	181
query95	440	335	337	335
query96	616	268	273	268
query97	2627	2528	2502	2502
query98	238	220	212	212
query99	1171	900	926	900
Total cold run time: 321116 ms
Total hot run time: 182166 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 947216aef74762f1c91036d13ee72e74e7e1642b with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.7 seconds inserted 10000000 Rows, about 729K ops/s

@morrySnow
Copy link
Contributor Author

run feut

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 28, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit 709158f into apache:master Mar 28, 2024
28 of 31 checks passed
@morrySnow morrySnow deleted the create_variant_column_with_index branch March 28, 2024 07:14
Jibing-Li added a commit that referenced this pull request Mar 29, 2024
* [fix](merge cloud) Fix cloud be set be tag map (#32864)

* [chore] Add gavinchou to collaborators (#32881)

* [chore](show) support statement to show views from table (#32358)

MySQL [test]> show views;
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
| t2_view        |
+----------------+
2 rows in set (0.00 sec)

MySQL [test]> show views like '%t1%';
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
+----------------+
1 row in set (0.01 sec)

MySQL [test]> show views where create_time > '2024-03-18';
+----------------+
| Tables_in_test |
+----------------+
| t2_view        |
+----------------+
1 row in set (0.02 sec)

* [Enhancement](ranger) Disable some permission operations when Ranger or LDAP are enabled (#32538)

Disable some permission operations when Ranger or LDAP are enabled.

* [chore](ci) exclude unstable trino_connector case (#32892)

Co-authored-by: stephen <[email protected]>

* [fix](Nereids) NPE when create table with implicit index type (#32893)

* [improvement](mtmv) Support more join types for query rewriting by materialized view (#32685)

This pattern of rewriting is supported for multi-table joins and supported join types is as following:

INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI JOIN
RIGHT SEMI JOIN
LEFT ANTI JOIN
RIGHT ANTI JOIN

* [Serde](Variant) support arrow serialization for varint type (#32780)

* [fix](multicatalog) fix no data error when read hive table on cosn (#32815)

Currently, when reading a hive on cosn table, doris return empty result, but the table has data.
iceberg on cosn is ok.
The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem

* [fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys (#32878)

* [Fix](executor)Fix regression test for test_active_queries/test_backend_active_tasks #32899

* [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898)

1. Fix iceberg catalog bug

    This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
    to get locationUrl by calling hive metastore's `getCatalog()` method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

2. Fix test cases

    Some of P2 test cases missed `order_qt`. And because the output format of the floating point
    type is changed, some result in `out` files need to be regenerated.

* [revert](jni) revert part of #32455 (#32904)

* [fix](spill) Avoid releasing resources while spill tasks are executing (#32783)

* [chore](log) print query id before logging profile in be.INFO (#32922)

* [fix](grace-exit) Stop incorrectly of reportwork cause heap use after free #32929

* [improvement](decommission be) decommission check replica num (#32748)

* [fix](arrow-flight) Fix reach limit of connections error (#32911)

Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.

Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.

Fix bearer token evict log and exception.

TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH

* [bugfix](cloud) few variable not initialized (#32868)

../../cloud/src/recycler/meta_checker.cpp
can cause uninitialised memory read.

* [fix](arrow-flight) Fix arrow flight sql compatible with JDK 17 and upgrade arrow 15.0.2 (#32796)

--add-opens=java.base/java.nio=ALL-UNNAMED, see: https://arrow.apache.org/docs/java/install.html#java-compatibility
groovy use flight sql connection to execute query SUM(MAX(c1) OVER (PARTITION BY)) report error: AGGREGATE clause must not contain analytic expressions, but no problem in Java execute it with jdbc::arrow-flight-sql.
groovy not support print arrow array type, throw IndexOutOfBoundsException.
"arrow_flight_sql" not support two phase read
./run-regression-test.sh --run --clean -g arrow_flight_sql

* [fix](spill) SpillStream's writer maybe may not have been finalized (#32931)

* [improvement](spill) Disable DistinctStreamingAgg when spill is enabled (#32932)

* [Improve](inverted_index) update clucene and improve array inverted index writer  (#32436)

* [Performance](exec) replace SipHash in function by XXHash (#32919)

* [feature](agg) add aggregate function sum0 (#32541)

* [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797)

Support to get tables in materialized view when collecting table in plan

table scehma as fllowing:

create materialized view mv1
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
DISTRIBUTED BY RANDOM BUCKETS 1 
PROPERTIES ('replication_num' = '1')
 as 
select 
  t1.c1, 
  t3.c2 
from 
  table1 t1 
  inner join table3 t3 on t1.c1 = t3.c2

if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables;

SELECT 
  mv1.*, 
  uuid() 
FROM 
  mv1 LEFT SEMI 
  JOIN table2 ON mv1.c1 = table2.c1 
WHERE 
  mv1.c1 IN (
    SELECT 
      c1 
    FROM 
      table2
  ) 
  OR mv1.c1 < 10

* [enhance](mtmv)support olap table partition column is null (#32698)

* [enhancement](cloud) add table version to cloud (#32738)

Add table version to cloud.

In Fe:
Get: If Fe is cloud mode, get table version from meta service.
Update: Op drop/replace temp partition, commit transaction.

In meta service:
Add: create Index. init value is 1.
Remove: by recycler.
Update: commit/drop partition rpc, commit txn rpc. Atomic++.

* [fix](cloud) schema change from not null to null (#32913)

1. Use equals instead of == for type comparing
2. null bitmap size is reisze by size of ref column.

* [feature](Nereids): add ColumnPruningPostProcessor. (#32800)

* [case](rowpolicy)fix row policy has been exist (#32880)

* [fix](pipeline) fix use error row desc when origin block clear (#32803)

* [fix](Nereids) support variant column with index when create table (#32948)

* [opt](Nereids) support create table with variant type (#32953)

* [test](insert-overwrite) Add insert overwrite auto detect concurrency cases (#32935)

* [fix](compile) fe cannot compile in idea (#32955)

* [enhancement](plsql) Support select * from routines (#32866)

Support show of plsql procedure using select * from routines.

* [fix](trino-connector) fix `NoClassDefFoundError` of hudi `Utils` class (#32846)

Due to the change of this PR #32455 , the `trino-connector-scanner` package cannot access the `hudi_scanner` package, so the exception NoclassDeffounderror will appear.

We need to write a separate Utils class.

* [exec](column) change some complex column move to noexcept (#32954)

* [Enhancement](data skew) extends show data skew (#32732)

* [chore](test) let suite compatible with Nereids (#32964)

* Support identical column name in different index. (#32792)

* Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470)

* [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961)

* [improvement](executor)Add tag property for workload group #32874

* [fix](auth)unified workload and resource permission logic (#32907)

- `Grant resource` can no longer grant global `usage_priv`
-  `grant resource %` instead of `grant resource *`

before change:
```
grant usage_priv on resource * to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: Usage_priv 
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: NULL
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 
```
after change
```
grant usage_priv on resource '%' to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: NULL
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: %: Usage_priv 
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 

```

---------

Co-authored-by: yujun <[email protected]>
Co-authored-by: Gavin Chou <[email protected]>
Co-authored-by: xy720 <[email protected]>
Co-authored-by: yongjinhou <[email protected]>
Co-authored-by: Dongyang Li <[email protected]>
Co-authored-by: stephen <[email protected]>
Co-authored-by: morrySnow <[email protected]>
Co-authored-by: seawinde <[email protected]>
Co-authored-by: lihangyu <[email protected]>
Co-authored-by: Yulei-Yang <[email protected]>
Co-authored-by: starocean999 <[email protected]>
Co-authored-by: wangbo <[email protected]>
Co-authored-by: Mingyu Chen <[email protected]>
Co-authored-by: Jerry Hu <[email protected]>
Co-authored-by: zhiqiang <[email protected]>
Co-authored-by: Xinyi Zou <[email protected]>
Co-authored-by: Vallish Pai <[email protected]>
Co-authored-by: amory <[email protected]>
Co-authored-by: HappenLee <[email protected]>
Co-authored-by: Jensen <[email protected]>
Co-authored-by: zhangdong <[email protected]>
Co-authored-by: Yongqiang YANG <[email protected]>
Co-authored-by: jakevin <[email protected]>
Co-authored-by: Mryange <[email protected]>
Co-authored-by: zclllyybb <[email protected]>
Co-authored-by: Tiewei Fang <[email protected]>
Co-authored-by: Xin Liao <[email protected]>
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Apr 1, 2024
eldenmoon added a commit that referenced this pull request Apr 1, 2024
)

* [fix](Nereids) support variant column with index when create table (#32948)

* [opt](Nereids) support create table with variant type (#32953)

---------

Co-authored-by: morrySnow <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants