Skip to content

Commit

Permalink
Merge branch 'master' into 0606-udf
Browse files Browse the repository at this point in the history
  • Loading branch information
hello-stephen authored Jun 28, 2024
2 parents e4fec2f + 4012bce commit cb5f152
Show file tree
Hide file tree
Showing 2,101 changed files with 91,478 additions and 26,154 deletions.
17 changes: 6 additions & 11 deletions .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ github:

required_pull_request_reviews:
dismiss_stale_reviews: true
require_code_owner_reviews: true
required_approving_review_count: 1
branch-1.1-lts:
required_status_checks:
Expand All @@ -93,17 +94,11 @@ github:
- ShellCheck
- Build Third Party Libraries (Linux)
- Build Third Party Libraries (macOS)
checks:
- context: COMPILE (DORIS_COMPILE)
app_id: -1
- context: P0 Regression (Doris Regression)
app_id: -1
- context: External Regression (Doris External Regression)
app_id: -1
- context: FE UT (Doris FE UT)
app_id: -1
- context: BE UT (Doris BE UT)
app_id: -1
- COMPILE (DORIS_COMPILE)
- External Regression (Doris External Regression)
- FE UT (Doris FE UT)
- BE UT (Doris BE UT)
- P0 Regression (Doris Regression)

required_pull_request_reviews:
dismiss_stale_reviews: true
Expand Down
18 changes: 18 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
be/src/io/* @platoneko @gavinchou @dataroaring
fe/fe-core/src/main/java/org/apache/doris/catalog/Env.java @dataroaring @CalvinKirs @morningman
4 changes: 3 additions & 1 deletion .github/workflows/be-ut-mac.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,13 +53,14 @@ jobs:
uses: ./.github/actions/ccache-action
with:
key: BE-UT-macOS
max-size: "2G"
max-size: "5G"
restore-keys: BE-UT-macOS-

- name: Run UT ${{ github.ref }}
if: ${{ github.event_name == 'schedule' || steps.filter.outputs.be_changes == 'true' }}
run: |
cellars=(
'm4'
'automake'
'autoconf'
'libtool'
Expand Down Expand Up @@ -95,4 +96,5 @@ jobs:
tar -xvf doris-thirdparty-prebuilt-darwin-x86_64.tar.xz
popd
export JAVA_HOME="${JAVA_HOME_17_X64%\/}"
./run-be-ut.sh --run -j "$(nproc)" --clean
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,13 @@ lru_cache_test
/fe/fe-core/src/test/resources/real-help-resource.zip
/ui/dist

# docker
docker/thirdparties/docker-compose/*/data
docker/thirdparties/docker-compose/*/logs
docker/thirdparties/docker-compose/*/*.yaml
docker/runtime/be/resource/apache-doris/

# other
compile_commands.json
.github
docker/runtime/be/resource/apache-doris/

1 change: 1 addition & 0 deletions .licenserc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ header:
- "docker/thirdparties/docker-compose/hive/scripts/create_tpch1_orc.hql"
- "docker/thirdparties/docker-compose/hive/scripts/create_tpch1_parquet.hql"
- "docker/thirdparties/docker-compose/hive/scripts/preinstalled_data/"
- "docker/thirdparties/docker-compose/hive/scripts/data/**"
- "docker/thirdparties/docker-compose/iceberg/spark-defaults.conf.tpl"
- "conf/mysql_ssl_default_certificate/*"
- "conf/mysql_ssl_default_certificate/client_certificate/ca.pem"
Expand Down
11 changes: 4 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,11 @@ under the License.
[![GitHub release](https://img.shields.io/github/release/apache/doris.svg)](https://github.com/apache/doris/releases)
[![Jenkins Vec](https://img.shields.io/jenkins/tests?compact_message&jobUrl=https://ci-builds.apache.org/job/Doris/job/doris_daily_enable_vectorized&label=VectorizedEngine)](https://ci-builds.apache.org/job/Doris/job/doris_daily_enable_vectorized)
[![Total Lines](https://tokei.rs/b1/github/apache/doris?category=lines)](https://github.com/apache/doris)
[![Join the Doris Community at Slack](https://img.shields.io/badge/chat-slack-brightgreen)](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-28il1o2wk-DD6LsLOz3v4aD92Mu0S0aQ)
[![Join the chat at https://gitter.im/apache-doris/Lobby](https://badges.gitter.im/apache-doris/Lobby.svg)](https://gitter.im/apache-doris/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Join the Doris Community on Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA)
[![EN doc](https://img.shields.io/badge/Docs-English-blue.svg)](https://doris.apache.org/docs/get-starting/quick-start)
[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)]([https://doris.apache.org/zh-CN/docs/dev/get-starting/what-is-apache-doris](https://doris.apache.org/zh-CN/docs/get-starting/what-is-apache-doris))

Apache Doris is an easy-to-use, high-performance and real-time analytical database based on MPP architecture, known for its extreme speed and ease of use. It only requires a sub-second response time to return query results under massive data and can support not only high-concurrent point query scenarios but also high-throughput complex analysis scenarios.

All this makes Apache Doris an ideal tool for scenarios including report analysis, ad-hoc query, unified data warehouse, and data lake query acceleration. On Apache Doris, users can build various applications, such as user behavior analysis, AB test platform, log retrieval analysis, user portrait analysis, and order analysis.
Apache Doris is an MPP-based real-time data warehouse known for its high query speed. For queries on large datasets, it returns results in sub-seconds. It supports both high-concurrency point queries and high-throughput complex analysis. It can be used for report analysis, ad-hoc queries, unified data warehouse building, and data lake query acceleration. Based on Apache Doris, users can build applications for user behavior analysis, A/B testing platform, log analysis, and e-commerce order analysis.

Please visit our [official download page](https://doris.apache.org/download/) to get the latest release version.

Expand Down Expand Up @@ -136,7 +133,7 @@ In terms of optimizers, Doris uses a combination of CBO and RBO. RBO supports co

**Apache Doris has graduated from Apache incubator successfully and become a Top-Level Project in June 2022**.

Currently, the Apache Doris community has gathered more than 400 contributors from nearly 200 companies in different industries, and the number of active contributors is close to 100 per month.
Currently, the Apache Doris community has gathered more than 600 contributors from over 200 companies in different industries, and the number of monthly active contributors exceeds 100.


[![Monthly Active Contributors](https://contributor-overtime-api.apiseven.com/contributors-svg?chart=contributorMonthlyActivity&repo=apache/doris)](https://www.apiseven.com/en/contributor-graph?chart=contributorMonthlyActivity&repo=apache/doris)
Expand Down Expand Up @@ -215,7 +212,7 @@ Contact us through the following mailing list.

* Apache Doris Official Website - [Site](https://doris.apache.org)
* Developer Mailing list - <[email protected]>. Mail to <[email protected]>, follow the reply to subscribe the mail list.
* Slack channel - [Join the Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-28il1o2wk-DD6LsLOz3v4aD92Mu0S0aQ)
* Slack channel - [Join the Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA)
* Twitter - [Follow @doris_apache](https://twitter.com/doris_apache)


Expand Down
13 changes: 6 additions & 7 deletions be/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,6 @@ option(USE_LIBCPP "Use libc++" OFF)
option(USE_MEM_TRACKER, "Use memory tracker" ON)
option(USE_UNWIND "Use libunwind" ON)
option(USE_JEMALLOC "Use jemalloc" ON)
option(USE_JEMALLOC_HOOK "Use jemalloc hook" ON)
if (OS_MACOSX)
set(GLIBC_COMPATIBILITY OFF)
set(USE_LIBCPP ON)
Expand All @@ -91,7 +90,6 @@ message(STATUS "GLIBC_COMPATIBILITY is ${GLIBC_COMPATIBILITY}")
message(STATUS "USE_LIBCPP is ${USE_LIBCPP}")
message(STATUS "USE_MEM_TRACKER is ${USE_MEM_TRACKER}")
message(STATUS "USE_JEMALLOC is ${USE_JEMALLOC}")
message(STATUS "USE_JEMALLOC_HOOK is ${USE_JEMALLOC_HOOK}")
message(STATUS "USE_UNWIND is ${USE_UNWIND}")
message(STATUS "ENABLE_PCH is ${ENABLE_PCH}")

Expand Down Expand Up @@ -350,9 +348,6 @@ endif()
if (USE_JEMALLOC)
add_definitions(-DUSE_JEMALLOC)
endif()
if (USE_JEMALLOC_HOOK)
add_definitions(-DUSE_JEMALLOC_HOOK)
endif()

# Compile with libunwind
if (USE_UNWIND)
Expand Down Expand Up @@ -382,8 +377,10 @@ endif()
# -O3: Enable all compiler optimizations
# -DNDEBUG: Turn off dchecks/asserts/debug only code.
set(CXX_FLAGS_RELEASE "${CXX_GCC_FLAGS} -O3 -DNDEBUG")
set(CXX_FLAGS_ASAN "${CXX_GCC_FLAGS} -O0 -fsanitize=address -DADDRESS_SANITIZER")
set(CXX_FLAGS_ASAN "${CXX_GCC_FLAGS} -O0 -fsanitize=address -fsanitize=undefined -fno-strict-aliasing -fno-sanitize=alignment,signed-integer-overflow,float-cast-overflow -DUNDEFINED_BEHAVIOR_SANITIZER -DADDRESS_SANITIZER")
set(CXX_FLAGS_LSAN "${CXX_GCC_FLAGS} -O0 -fsanitize=leak -DLEAK_SANITIZER")
## Use for BE-UT
set(CXX_FLAGS_ASAN_UT "${CXX_GCC_FLAGS} -O0 -fsanitize=address -DADDRESS_SANITIZER")

# Set the flags to the undefined behavior sanitizer, also known as "ubsan"
# Turn on sanitizer and debug symbols to get stack traces:
Expand All @@ -408,6 +405,8 @@ elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "UBSAN")
set(CMAKE_CXX_FLAGS "${CXX_FLAGS_UBSAN}")
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "TSAN")
set(CMAKE_CXX_FLAGS "${CXX_FLAGS_TSAN}")
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "ASAN_UT")
set(CMAKE_CXX_FLAGS "${CXX_FLAGS_ASAN_UT}")
else()
message(FATAL_ERROR "Unknown build type: ${CMAKE_BUILD_TYPE}")
endif()
Expand Down Expand Up @@ -629,7 +628,7 @@ endif ()
# Add sanitize static link flags
if ("${CMAKE_BUILD_TYPE}" STREQUAL "DEBUG" OR "${CMAKE_BUILD_TYPE}" STREQUAL "RELEASE")
set(DORIS_LINK_LIBS ${DORIS_LINK_LIBS} ${MALLOCLIB})
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "ASAN")
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "ASAN" OR "${CMAKE_BUILD_TYPE}" STREQUAL "ASAN_UT")
set(DORIS_LINK_LIBS ${DORIS_LINK_LIBS} ${ASAN_LIBS})
elseif ("${CMAKE_BUILD_TYPE}" STREQUAL "LSAN")
set(DORIS_LINK_LIBS ${DORIS_LINK_LIBS} ${LSAN_LIBS})
Expand Down
12 changes: 10 additions & 2 deletions be/cmake/thirdparty.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,6 @@ add_thirdparty(re2)
add_thirdparty(hyperscan LIBNAME "lib64/libhs.a")
add_thirdparty(odbc)
add_thirdparty(pprof WHOLELIBPATH ${GPERFTOOLS_HOME}/lib/libprofiler.a)
add_thirdparty(tcmalloc WHOLELIBPATH ${GPERFTOOLS_HOME}/lib/libtcmalloc.a NOTADD)
add_thirdparty(protobuf)
add_thirdparty(gtest)
add_thirdparty(gtest_main)
Expand All @@ -77,7 +76,11 @@ add_thirdparty(libz LIBNAME "lib/libz.a")
add_thirdparty(crypto)
add_thirdparty(openssl LIBNAME "lib/libssl.a")
add_thirdparty(leveldb)
add_thirdparty(jemalloc LIBNAME "lib/libjemalloc_doris.a")
if (USE_JEMALLOC)
add_thirdparty(jemalloc LIBNAME "lib/libjemalloc_doris.a")
else()
add_thirdparty(tcmalloc WHOLELIBPATH ${GPERFTOOLS_HOME}/lib/libtcmalloc.a NOTADD)
endif()
add_thirdparty(jemalloc_arrow LIBNAME "lib/libjemalloc_arrow.a")

if (WITH_MYSQL)
Expand Down Expand Up @@ -139,6 +142,11 @@ if (NOT OS_MACOSX)
add_thirdparty(aws-s2n LIBNAME "lib/libs2n.a")
endif()

add_thirdparty(azure-core)
add_thirdparty(azure-identity)
add_thirdparty(azure-storage-blobs)
add_thirdparty(azure-storage-common)

add_thirdparty(minizip LIB64)
add_thirdparty(simdjson LIB64)
add_thirdparty(idn LIB64)
Expand Down
3 changes: 3 additions & 0 deletions be/src/agent/agent_server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,9 @@ void AgentServer::start_workers(StorageEngine& engine, ExecEnv* exec_env) {
_workers[TTaskType::CLEAN_TRASH] = std::make_unique<TaskWorkerPool>(
"CLEAN_TRASH", 1, [&engine](auto&& task) {return clean_trash_callback(engine, task); });

_workers[TTaskType::CLEAN_UDF_CACHE] = std::make_unique<TaskWorkerPool>(
"CLEAN_UDF_CACHE", 1, [](auto&& task) {return clean_udf_cache_callback(task); });

_workers[TTaskType::UPDATE_VISIBLE_VERSION] = std::make_unique<TaskWorkerPool>(
"UPDATE_VISIBLE_VERSION", 1, [&engine](auto&& task) { return visible_version_callback(engine, task); });

Expand Down
12 changes: 6 additions & 6 deletions be/src/agent/be_exec_version_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,23 @@
#include <fmt/format.h>
#include <glog/logging.h>

#include "common/status.h"

namespace doris {

class BeExecVersionManager {
public:
BeExecVersionManager() = delete;

static bool check_be_exec_version(int be_exec_version) {
static Status check_be_exec_version(int be_exec_version) {
if (be_exec_version > max_be_exec_version || be_exec_version < min_be_exec_version) {
LOG(WARNING) << fmt::format(
return Status::InternalError(
"Received be_exec_version is not supported, be_exec_version={}, "
"min_be_exec_version={}, max_be_exec_version={}, maybe due to FE version not "
"match "
"with BE.",
"match with BE.",
be_exec_version, min_be_exec_version, max_be_exec_version);
return false;
}
return true;
return Status::OK();
}

static int get_newest_version() { return max_be_exec_version; }
Expand Down
8 changes: 4 additions & 4 deletions be/src/agent/cgroup_cpu_ctl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,14 @@ Status CgroupCpuCtl::init() {
_doris_cgroup_cpu_path = config::doris_cgroup_cpu_path;
if (_doris_cgroup_cpu_path.empty()) {
LOG(INFO) << "doris cgroup cpu path is not specify, path=" << _doris_cgroup_cpu_path;
return Status::InternalError<false>("doris cgroup cpu path {} is not specify.",
_doris_cgroup_cpu_path);
return Status::InvalidArgument<false>("doris cgroup cpu path {} is not specify.",
_doris_cgroup_cpu_path);
}

if (access(_doris_cgroup_cpu_path.c_str(), F_OK) != 0) {
LOG(INFO) << "doris cgroup cpu path not exists, path=" << _doris_cgroup_cpu_path;
return Status::InternalError<false>("doris cgroup cpu path {} not exists.",
_doris_cgroup_cpu_path);
return Status::InvalidArgument<false>("doris cgroup cpu path {} not exists.",
_doris_cgroup_cpu_path);
}

if (_doris_cgroup_cpu_path.back() != '/') {
Expand Down
7 changes: 6 additions & 1 deletion be/src/agent/heartbeat_server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,12 @@ void HeartbeatServer::heartbeat(THeartbeatResult& heartbeat_result,
}
watch.stop();
if (watch.elapsed_time() > 1000L * 1000L * 1000L) {
LOG(WARNING) << "heartbeat consume too much time. time=" << watch.elapsed_time();
LOG(WARNING) << "heartbeat consume too much time. time=" << watch.elapsed_time()
<< ", host:" << master_info.network_address.hostname
<< ", port:" << master_info.network_address.port
<< ", cluster id:" << master_info.cluster_id
<< ", frontend_info:" << PrintFrontendInfos(master_info.frontend_infos)
<< ", counter:" << google::COUNTER << ", BE start time: " << _be_epoch;
}
}

Expand Down
42 changes: 19 additions & 23 deletions be/src/agent/task_worker_pool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
#include "io/fs/file_system.h"
#include "io/fs/hdfs_file_system.h"
#include "io/fs/local_file_system.h"
#include "io/fs/obj_storage_client.h"
#include "io/fs/path.h"
#include "io/fs/remote_file_system.h"
#include "io/fs/s3_file_system.h"
Expand Down Expand Up @@ -82,6 +83,7 @@
#include "service/backend_options.h"
#include "util/debug_points.h"
#include "util/doris_metrics.h"
#include "util/jni-util.h"
#include "util/mem_info.h"
#include "util/random.h"
#include "util/s3_util.h"
Expand Down Expand Up @@ -1117,7 +1119,7 @@ void report_tablet_callback(StorageEngine& engine, const TMasterInfo& master_inf
auto& resource = resource_list.emplace_back();
int64_t id = -1;
if (auto [_, ec] = std::from_chars(id_str.data(), id_str.data() + id_str.size(), id);
ec == std::errc {}) [[unlikely]] {
ec != std::errc {}) [[unlikely]] {
LOG(ERROR) << "invalid resource id format: " << id_str;
} else {
resource.__set_id(id);
Expand Down Expand Up @@ -1378,23 +1380,8 @@ void update_s3_resource(const TStorageResource& param, io::RemoteFileSystemSPtr

if (!existed_fs) {
// No such FS instance on BE
S3Conf s3_conf {
.bucket = param.s3_storage_param.bucket,
.prefix = param.s3_storage_param.root_path,
.client_conf = {
.endpoint = param.s3_storage_param.endpoint,
.region = param.s3_storage_param.region,
.ak = param.s3_storage_param.ak,
.sk = param.s3_storage_param.sk,
.token = param.s3_storage_param.token,
.max_connections = param.s3_storage_param.max_conn,
.request_timeout_ms = param.s3_storage_param.request_timeout_ms,
.connect_timeout_ms = param.s3_storage_param.conn_timeout_ms,
// When using cold heat separation in minio, user might use ip address directly,
// which needs enable use_virtual_addressing to true
.use_virtual_addressing = !param.s3_storage_param.use_path_style,
}};
auto res = io::S3FileSystem::create(std::move(s3_conf), std::to_string(param.id));
auto res = io::S3FileSystem::create(S3Conf::get_s3_conf(param.s3_storage_param),
std::to_string(param.id));
if (!res.has_value()) {
st = std::move(res).error();
} else {
Expand All @@ -1403,10 +1390,12 @@ void update_s3_resource(const TStorageResource& param, io::RemoteFileSystemSPtr
} else {
DCHECK_EQ(existed_fs->type(), io::FileSystemType::S3) << param.id << ' ' << param.name;
auto client = static_cast<io::S3FileSystem*>(existed_fs.get())->client_holder();
auto new_s3_conf = S3Conf::get_s3_conf(param.s3_storage_param);
S3ClientConf conf {
.ak = param.s3_storage_param.ak,
.sk = param.s3_storage_param.sk,
.token = param.s3_storage_param.token,
.ak = std::move(new_s3_conf.client_conf.ak),
.sk = std::move(new_s3_conf.client_conf.sk),
.token = std::move(new_s3_conf.client_conf.token),
.provider = new_s3_conf.client_conf.provider,
};
st = client->reset(conf);
fs = std::move(existed_fs);
Expand All @@ -1418,7 +1407,7 @@ void update_s3_resource(const TStorageResource& param, io::RemoteFileSystemSPtr
LOG_INFO("successfully update hdfs resource")
.tag("resource_id", param.id)
.tag("resource_name", param.name);
put_storage_resource(param.id, {std::move(fs), 0}, param.version);
put_storage_resource(param.id, {std::move(fs)}, param.version);
}
}

Expand Down Expand Up @@ -1452,7 +1441,7 @@ void update_hdfs_resource(const TStorageResource& param, io::RemoteFileSystemSPt
.tag("resource_id", param.id)
.tag("resource_name", param.name)
.tag("root_path", fs->root_path().string());
put_storage_resource(param.id, {std::move(fs), 0}, param.version);
put_storage_resource(param.id, {std::move(fs)}, param.version);
}
}

Expand Down Expand Up @@ -2072,4 +2061,11 @@ void clean_trash_callback(StorageEngine& engine, const TAgentTaskRequest& req) {
LOG(INFO) << "clean trash finish";
}

void clean_udf_cache_callback(const TAgentTaskRequest& req) {
LOG(INFO) << "clean udf cache start: " << req.clean_udf_cache_req.function_signature;
static_cast<void>(
JniUtil::clean_udf_class_load_cache(req.clean_udf_cache_req.function_signature));
LOG(INFO) << "clean udf cache finish: " << req.clean_udf_cache_req.function_signature;
}

} // namespace doris
2 changes: 2 additions & 0 deletions be/src/agent/task_worker_pool.h
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,8 @@ void gc_binlog_callback(StorageEngine& engine, const TAgentTaskRequest& req);

void clean_trash_callback(StorageEngine& engine, const TAgentTaskRequest& req);

void clean_udf_cache_callback(const TAgentTaskRequest& req);

void visible_version_callback(StorageEngine& engine, const TAgentTaskRequest& req);

void report_task_callback(const TMasterInfo& master_info);
Expand Down
Loading

0 comments on commit cb5f152

Please sign in to comment.