Skip to content

Commit

Permalink
Merge branch 'master' into 0606-udf
Browse files Browse the repository at this point in the history
  • Loading branch information
hello-stephen authored Jul 22, 2024
2 parents b670863 + 5b6ae3e commit efd3011
Show file tree
Hide file tree
Showing 5,621 changed files with 235,433 additions and 70,973 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
1 change: 1 addition & 0 deletions .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ github:
- P1 Regression (Doris Regression)
- External Regression (Doris External Regression)
- cloud_p1 (Doris Cloud Regression)
- cloud_p0 (Doris Cloud Regression)
- FE UT (Doris FE UT)
- BE UT (Doris BE UT)
- Build Broker
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ docker/thirdparties/docker-compose/hive/scripts/paimon1
fe_plugins/output
fe_plugins/**/.factorypath

docker/thirdparties/docker-compose/hive/scripts/data/*/*/data

fs_brokers/apache_hdfs_broker/src/main/resources/
fs_brokers/apache_hdfs_broker/src/main/thrift/

Expand Down Expand Up @@ -100,7 +102,6 @@ be/tags
be/test/olap/test_data/tablet_meta_test.hdr
be/.devcontainer/
be/src/apache-orc/
zoneinfo/

# Cloud
cloud/build*/
Expand Down
8 changes: 2 additions & 6 deletions .licenserc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ header:
- "**/*.sql"
- "**/*.lock"
- "**/*.out"
- "**/*.parquet"
- "docs/.markdownlintignore"
- "fe/fe-core/src/test/resources/data/net_snmp_normal"
- "fe/fe-core/src/main/antlr4/org/apache/doris/nereids/JavaLexer.g4"
Expand Down Expand Up @@ -77,12 +78,7 @@ header:
- "docs/package-lock.json"
- "regression-test/script/README"
- "regression-test/suites/load_p0/stream_load/data"
- "docker/thirdparties/docker-compose/hive/scripts/README"
- "docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_table.hql"
- "docker/thirdparties/docker-compose/hive/scripts/create_tpch1_orc.hql"
- "docker/thirdparties/docker-compose/hive/scripts/create_tpch1_parquet.hql"
- "docker/thirdparties/docker-compose/hive/scripts/preinstalled_data/"
- "docker/thirdparties/docker-compose/hive/scripts/data/**"
- "docker/thirdparties/docker-compose/hive/scripts/**"
- "docker/thirdparties/docker-compose/iceberg/spark-defaults.conf.tpl"
- "conf/mysql_ssl_default_certificate/*"
- "conf/mysql_ssl_default_certificate/client_certificate/ca.pem"
Expand Down
74 changes: 59 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,31 +18,63 @@ under the License.
-->

<div align="center">
<img src="https://doris.apache.org/assets/images/home-banner-7f193353c932af31634eca0a028f03ed.png" align="right" height="240"/>
</div>

# Apache Doris

[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
[![GitHub release](https://img.shields.io/github/release/apache/doris.svg)](https://github.com/apache/doris/releases)
[![Jenkins Vec](https://img.shields.io/jenkins/tests?compact_message&jobUrl=https://ci-builds.apache.org/job/Doris/job/doris_daily_enable_vectorized&label=VectorizedEngine)](https://ci-builds.apache.org/job/Doris/job/doris_daily_enable_vectorized)
[![Total Lines](https://tokei.rs/b1/github/apache/doris?category=lines)](https://github.com/apache/doris)
[![Join the Doris Community on Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA)
[![OSSRank](https://shields.io/endpoint?url=https://ossrank.com/shield/516)](https://ossrank.com/p/516)
[![Commit activity](https://img.shields.io/github/commit-activity/m/apache/doris)](https://github.com/apache/doris/commits/master/)
[![EN doc](https://img.shields.io/badge/Docs-English-blue.svg)](https://doris.apache.org/docs/get-starting/quick-start)
[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)]([https://doris.apache.org/zh-CN/docs/dev/get-starting/what-is-apache-doris](https://doris.apache.org/zh-CN/docs/get-starting/what-is-apache-doris))
[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](https://doris.apache.org/zh-CN/docs/get-starting/quick-start/)

<div>

[![Official Website](<https://img.shields.io/badge/-Visit%20the%20Official%20Website%20%E2%86%92-rgb(15,214,106)?style=for-the-badge>)](https://doris.apache.org/)
[![Quick Download](<https://img.shields.io/badge/-Quick%20%20Download%20%E2%86%92-rgb(66,56,255)?style=for-the-badge>)](https://doris.apache.org/download)


</div>


<div>
<a href="https://twitter.com/doris_apache"><img src="https://img.shields.io/badge/- @Doris_Apache -424549?style=social&logo=x" height=25></a>
&nbsp;
<a href="https://github.com/apache/doris/discussions"><img src="https://img.shields.io/badge/- Discussion -red?style=social&logo=discourse" height=25></a>
&nbsp;
<a href="https://apachedoriscommunity.slack.com/join/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA"><img src="https://img.shields.io/badge/-Slack-red?style=social&logo=slack" height=25></a>
&nbsp;
<a href="https://medium.com/@ApacheDoris"><img src="https://img.shields.io/badge/-Medium-red?style=social&logo=medium" height=25></a>

</div>

</div>

---

Apache Doris is an MPP-based real-time data warehouse known for its high query speed. For queries on large datasets, it returns results in sub-seconds. It supports both high-concurrency point queries and high-throughput complex analysis. It can be used for report analysis, ad-hoc queries, unified data warehouse building, and data lake query acceleration. Based on Apache Doris, users can build applications for user behavior analysis, A/B testing platform, log analysis, and e-commerce order analysis.

Please visit our [official download page](https://doris.apache.org/download/) to get the latest release version.

The current stable version is the 2.0.x series, and the latest version is the 2.1.x series. For production, it is recommended to use the latest version of the 2.0.x series. And if used for POC or testing, it is recommended to use the latest version of the 2.1.x series.

Apache Doris is an easy-to-use, high-performance and real-time analytical database based on MPP architecture, known for its extreme speed and ease of use. It only requires a sub-second response time to return query results under massive data and can support not only high-concurrent point query scenarios but also high-throughput complex analysis scenarios.

All this makes Apache Doris an ideal tool for scenarios including report analysis, ad-hoc query, unified data warehouse, and data lake query acceleration. On Apache Doris, users can build various applications, such as user behavior analysis, AB test platform, log retrieval analysis, user portrait analysis, and order analysis.

🎉 Version 2.1.4 released now. Check out the 🔗[Release Notes](https://doris.apache.org/docs/releasenotes/release-2.1.4) here. The 2.1 verison delivers exceptional performance with 100% higher out-of-the-box queries proven by TPC-DS 1TB tests, enhanced data lake analytics that are 4-6 times speedier than Trino and Spark, solid support for semi-structured data analysis with new Variant types and suite of analytical functions, asynchronous materialized views for query acceleration, optimized real-time writing at scale, and better workload management with stability and runtime SQL resource tracking.


🎉 Version 2.0.12 is now released ! This fully evolved and stable release is ready for all users to upgrade. Check out the 🔗[Release Notes](https://doris.apache.org/docs/2.0/releasenotes/release-2.0.12) here.

👀 Have a look at the 🔗[Official Website](https://doris.apache.org/) for a comprehensive list of Apache Doris's core features, blogs and user cases.

## 📈 Usage Scenarios

As shown in the figure below, after various data integration and processing, the data sources are usually stored in the real-time data warehouse Apache Doris and the offline data lake or data warehouse (in Apache Hive, Apache Iceberg or Apache Hudi).

<img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sekvbs5ih5rb16wz6n9k.png">
<br />

<img src="https://cdn.selectdb.com/static/What_is_Apache_Doris_3_a61692c2ce.png" />

<br />

Apache Doris is widely used in the following scenarios:

Expand Down Expand Up @@ -70,7 +102,11 @@ The overall architecture of Apache Doris is shown in the following figure. The D

Both types of processes are horizontally scalable, and a single cluster can support up to hundreds of machines and tens of petabytes of storage capacity. And these two types of processes guarantee high availability of services and high reliability of data through consistency protocols. This highly integrated architecture design greatly reduces the operation and maintenance cost of a distributed system.

![The overall architecture of Apache Doris](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mnz20ae3s23vv3e9ltmi.png)
<br />

![The overall architecture of Apache Doris](https://cdn.selectdb.com/static/What_is_Apache_Doris_adb26397e2.png)

<br />

In terms of interfaces, Apache Doris adopts MySQL protocol, supports standard SQL, and is highly compatible with MySQL dialect. Users can access Doris through various client tools and it supports seamless connection with BI tools.

Expand Down Expand Up @@ -100,11 +136,19 @@ Doris also supports strongly consistent materialized views. Materialized views a

Doris adopts the MPP model in its query engine to realize parallel execution between and within nodes. It also supports distributed shuffle join for multiple large tables so as to handle complex queries.

![](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vjlmumwyx728uymsgcw0.png)
<br />

![Query Engine](https://cdn.selectdb.com/static/What_is_Apache_Doris_1_c6f5ba2af9.png)

<br />

The Doris query engine is vectorized, with all memory structures laid out in a columnar format. This can largely reduce virtual function calls, improve cache hit rates, and make efficient use of SIMD instructions. Doris delivers a 5–10 times higher performance in wide table aggregation scenarios than non-vectorized engines.

![](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ck2m3kbnodn28t28vphp.png)
<br />

![Doris query engine](https://cdn.selectdb.com/static/What_is_Apache_Doris_2_29cf58cc6b.png)

<br />

Apache Doris uses Adaptive Query Execution technology to dynamically adjust the execution plan based on runtime statistics. For example, it can generate runtime filter, push it to the probe side, and automatically penetrate it to the Scan node at the bottom, which drastically reduces the amount of data in the probe and increases join performance. The runtime filter in Doris supports In/Min/Max/Bloom filter.

Expand Down Expand Up @@ -133,7 +177,7 @@ In terms of optimizers, Doris uses a combination of CBO and RBO. RBO supports co

**Apache Doris has graduated from Apache incubator successfully and become a Top-Level Project in June 2022**.

Currently, the Apache Doris community has gathered more than 600 contributors from over 200 companies in different industries, and the number of monthly active contributors exceeds 100.
Currently, the Apache Doris community has gathered more than 400 contributors from nearly 200 companies in different industries, and the number of active contributors is close to 100 per month.


[![Monthly Active Contributors](https://contributor-overtime-api.apiseven.com/contributors-svg?chart=contributorMonthlyActivity&repo=apache/doris)](https://www.apiseven.com/en/contributor-graph?chart=contributorMonthlyActivity&repo=apache/doris)
Expand Down Expand Up @@ -212,7 +256,7 @@ Contact us through the following mailing list.

* Apache Doris Official Website - [Site](https://doris.apache.org)
* Developer Mailing list - <[email protected]>. Mail to <[email protected]>, follow the reply to subscribe the mail list.
* Slack channel - [Join the Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA)
* Slack channel - [Join the Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-28il1o2wk-DD6LsLOz3v4aD92Mu0S0aQ)
* Twitter - [Follow @doris_apache](https://twitter.com/doris_apache)


Expand Down
9 changes: 5 additions & 4 deletions be/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ set(BASE_DIR "${CMAKE_CURRENT_SOURCE_DIR}")
set(ENV{DORIS_HOME} "${BASE_DIR}/..")
set(BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}")
set(GENSRC_DIR "${BASE_DIR}/../gensrc/build/")
set(COMMON_SRC_DIR "${BASE_DIR}/../common")
set(SRC_DIR "${BASE_DIR}/src/")
set(TEST_DIR "${CMAKE_SOURCE_DIR}/test/")
set(OUTPUT_DIR "${BASE_DIR}/output")
Expand Down Expand Up @@ -357,10 +358,6 @@ if (USE_UNWIND)
endif()
endif()

if (ENABLE_STACKTRACE)
add_definitions(-DENABLE_STACKTRACE)
endif()

if (USE_DWARF)
add_compile_options(-gdwarf-5)
endif()
Expand Down Expand Up @@ -436,6 +433,7 @@ include_directories(

include_directories(
SYSTEM
${COMMON_SRC_DIR}
${GENSRC_DIR}/
${THIRDPARTY_DIR}/include
${GPERFTOOLS_HOME}/include
Expand Down Expand Up @@ -500,6 +498,7 @@ set(DORIS_LINK_LIBS
Pipeline
Cloud
${WL_END_GROUP}
CommonCPP
)

set(absl_DIR ${THIRDPARTY_DIR}/lib/cmake/absl)
Expand Down Expand Up @@ -765,6 +764,8 @@ if (MAKE_TEST)
add_subdirectory(${TEST_DIR})
endif ()

add_subdirectory(${COMMON_SRC_DIR}/cpp ${BUILD_DIR}/src/common_cpp)

# Install be
install(DIRECTORY DESTINATION ${OUTPUT_DIR})
install(DIRECTORY DESTINATION ${OUTPUT_DIR}/bin)
Expand Down
1 change: 0 additions & 1 deletion be/cmake/thirdparty.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,6 @@ if (USE_JEMALLOC)
else()
add_thirdparty(tcmalloc WHOLELIBPATH ${GPERFTOOLS_HOME}/lib/libtcmalloc.a NOTADD)
endif()
add_thirdparty(jemalloc_arrow LIBNAME "lib/libjemalloc_arrow.a")

if (WITH_MYSQL)
add_thirdparty(mysql LIBNAME "lib/libmysqlclient.a")
Expand Down
6 changes: 4 additions & 2 deletions be/src/agent/be_exec_version_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,15 @@ class BeExecVersionManager {
* a. change the impl of percentile (need fix)
* b. clear old version of version 3->4
* c. change FunctionIsIPAddressInRange from AlwaysNotNullable to DependOnArguments
* d. change some agg function nullable property: PR #37215
*/
constexpr inline int BeExecVersionManager::max_be_exec_version = 5;
constexpr inline int BeExecVersionManager::min_be_exec_version = 0;

/// functional
constexpr inline int BITMAP_SERDE = 3;
constexpr inline int USE_NEW_SERDE = 4; // release on DORIS version 2.1
constexpr inline int OLD_WAL_SERDE = 3; // use to solve compatibility issues, see pr #32299
constexpr inline int USE_NEW_SERDE = 4; // release on DORIS version 2.1
constexpr inline int OLD_WAL_SERDE = 3; // use to solve compatibility issues, see pr #32299
constexpr inline int AGG_FUNCTION_NULLABLE = 5; // change some agg nullable property: PR #37215

} // namespace doris
7 changes: 4 additions & 3 deletions be/src/agent/cgroup_cpu_ctl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ Status CgroupV1CpuCtl::init() {
return Status::InternalError<false>("invalid cgroup path, not find cpu quota file");
}

if (_tg_id == -1) {
if (_wg_id == -1) {
// means current cgroup cpu ctl is just used to clear dir,
// it does not contains workload group.
// todo(wb) rethinking whether need to refactor cgroup_cpu_ctl
Expand All @@ -140,7 +140,7 @@ Status CgroupV1CpuCtl::init() {
}

// workload group path
_cgroup_v1_cpu_tg_path = _cgroup_v1_cpu_query_path + "/" + std::to_string(_tg_id);
_cgroup_v1_cpu_tg_path = _cgroup_v1_cpu_query_path + "/" + std::to_string(_wg_id);
if (access(_cgroup_v1_cpu_tg_path.c_str(), F_OK) != 0) {
int ret = mkdir(_cgroup_v1_cpu_tg_path.c_str(), S_IRWXU);
if (ret != 0) {
Expand Down Expand Up @@ -186,7 +186,8 @@ Status CgroupV1CpuCtl::add_thread_to_cgroup() {
return Status::OK();
#else
int tid = static_cast<int>(syscall(SYS_gettid));
std::string msg = "add thread " + std::to_string(tid) + " to group";
std::string msg =
"add thread " + std::to_string(tid) + " to group" + " " + std::to_string(_wg_id);
std::lock_guard<std::shared_mutex> w_lock(_lock_mutex);
return CgroupCpuCtl::write_cg_sys_file(_cgroup_v1_cpu_tg_task_file, tid, msg, true);
#endif
Expand Down
4 changes: 2 additions & 2 deletions be/src/agent/cgroup_cpu_ctl.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ class CgroupCpuCtl {
public:
virtual ~CgroupCpuCtl() = default;
CgroupCpuCtl() = default;
CgroupCpuCtl(uint64_t tg_id) { _tg_id = tg_id; }
CgroupCpuCtl(uint64_t wg_id) { _wg_id = wg_id; }

virtual Status init();

Expand Down Expand Up @@ -63,7 +63,7 @@ class CgroupCpuCtl {
int _cpu_hard_limit = 0;
std::shared_mutex _lock_mutex;
bool _init_succ = false;
uint64_t _tg_id = -1; // workload group id
uint64_t _wg_id = -1; // workload group id
uint64_t _cpu_shares = 0;
};

Expand Down
44 changes: 20 additions & 24 deletions be/src/agent/task_worker_pool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
#include <sstream>
#include <string>
#include <thread>
#include <type_traits>
#include <utility>
#include <vector>

Expand Down Expand Up @@ -540,26 +541,20 @@ Status TaskWorkerPool::submit_task(const TAgentTaskRequest& task) {
}

PriorTaskWorkerPool::PriorTaskWorkerPool(
std::string_view name, int normal_worker_count, int high_prior_worker_count,
const std::string& name, int normal_worker_count, int high_prior_worker_count,
std::function<void(const TAgentTaskRequest& task)> callback)
: _callback(std::move(callback)) {
auto st = ThreadPoolBuilder(fmt::format("TaskWP_.{}", name))
.set_min_threads(normal_worker_count)
.set_max_threads(normal_worker_count)
.build(&_normal_pool);
CHECK(st.ok()) << name << ": " << st;

st = _normal_pool->submit_func([this] { normal_loop(); });
CHECK(st.ok()) << name << ": " << st;

st = ThreadPoolBuilder(fmt::format("HighPriorPool.{}", name))
.set_min_threads(high_prior_worker_count)
.set_max_threads(high_prior_worker_count)
.build(&_high_prior_pool);
CHECK(st.ok()) << name << ": " << st;
for (int i = 0; i < normal_worker_count; ++i) {
auto st = Thread::create(
"Normal", name, [this] { normal_loop(); }, &_workers.emplace_back());
CHECK(st.ok()) << name << ": " << st;
}

st = _high_prior_pool->submit_func([this] { high_prior_loop(); });
CHECK(st.ok()) << name << ": " << st;
for (int i = 0; i < high_prior_worker_count; ++i) {
auto st = Thread::create(
"HighPrior", name, [this] { high_prior_loop(); }, &_workers.emplace_back());
CHECK(st.ok()) << name << ": " << st;
}
}

PriorTaskWorkerPool::~PriorTaskWorkerPool() {
Expand All @@ -578,12 +573,10 @@ void PriorTaskWorkerPool::stop() {
_normal_condv.notify_all();
_high_prior_condv.notify_all();

if (_normal_pool) {
_normal_pool->shutdown();
}

if (_high_prior_pool) {
_high_prior_pool->shutdown();
for (auto&& w : _workers) {
if (w) {
w->join();
}
}
}

Expand Down Expand Up @@ -1392,9 +1385,12 @@ void update_s3_resource(const TStorageResource& param, io::RemoteFileSystemSPtr
auto client = static_cast<io::S3FileSystem*>(existed_fs.get())->client_holder();
auto new_s3_conf = S3Conf::get_s3_conf(param.s3_storage_param);
S3ClientConf conf {
.endpoint {},
.region {},
.ak = std::move(new_s3_conf.client_conf.ak),
.sk = std::move(new_s3_conf.client_conf.sk),
.token = std::move(new_s3_conf.client_conf.token),
.bucket {},
.provider = new_s3_conf.client_conf.provider,
};
st = client->reset(conf);
Expand Down Expand Up @@ -1797,7 +1793,7 @@ void PublishVersionWorkerPool::publish_version_callback(const TAgentTaskRequest&
if (tablet->exceed_version_limit(config::max_tablet_version_num * 2 / 3) &&
published_count % 20 == 0) {
auto st = _engine.submit_compaction_task(
tablet, CompactionType::CUMULATIVE_COMPACTION, true);
tablet, CompactionType::CUMULATIVE_COMPACTION, true, false);
if (!st.ok()) [[unlikely]] {
LOG(WARNING) << "trigger compaction failed, tablet_id=" << tablet_id
<< ", published=" << published_count << " : " << st;
Expand Down
Loading

0 comments on commit efd3011

Please sign in to comment.