Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](move-memtable) do not execute close if create rowset failed when loading MOW table (#40105) #41132

Merged
merged 1 commit into from
Sep 24, 2024

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented Sep 23, 2024

pick (#40105)

Core dump happened when load to MOW table:

Check failure stack trace: ***
@ 0x55fae437d246 google::LogMessage::SendToLog()
@ 0x55fae4379c90 google::LogMessage::Flush()
@ 0x55fae437da89 google::LogMessageFatal::~LogMessageFatal()
@ 0x55faacf26bbf doris::BaseTablet::check_delete_bitmap_correctness()
@ 0x55fab05049ef doris::RowsetBuilder::commit_txn()
@ 0x55fab09026e8 doris::LoadStreamWriter::close()
@ 0x55fab089eff7 std::_Function_handler<>::_M_invoke()
@ 0x55fab0d14d7c doris::WorkThreadPool<>::work_thread()
@ 0x55fae76ae6f0 execute_native_thread_routine
@ 0x7fa32ea45ac3 (unknown)
@ 0x7fa32ead7850 (unknown)
@ (nil) (unknown)
Query id: a21981d5c8ef4113-84df9a5a8680e004 ***
is nereids: 0 ***
tablet id: 0 ***
Aborted at 1724668499 (unix time) try "date -d @1724668499" if you are using GNU date ***
Current BE git commitID: 2f848737c1 ***
SIGABRT unknown detail explain (@0x20db) received by PID 8411 (TID 9837 OR 0x7f9e42cfe640) from PID 8411; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
1# 0x00007FA32E9F3520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill at ./nptl/pthread_kill.c:89
3# raise at ../sysdeps/posix/raise.c:27
4# abort at ./stdlib/abort.c:81
5# 0x000055FAE4387B1D in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
6# 0x000055FAE437A15A in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
8# google::LogMessage::Flush() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::BaseTablet::check_delete_bitmap_correctness(std::shared_ptr, long, long, std::unordered_set, std::equal_to, std::allocator > const&, std::vector, std::allocator > >*) at /home/zcp/repo_center/doris_master/doris/be/src/olap/base_tablet.cpp:1152
11# doris::RowsetBuilder::commit_txn() at /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset_builder.cpp:316
12# doris::LoadStreamWriter::close() at /home/zcp/repo_center/doris_master/doris/be/src/runtime/load_stream_writer.cpp:311
13# std::_Function_handler::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
14# doris::WorkThreadPool::work_thread(int) at /home/zcp/repo_center/doris_master/doris/be/src/util/work_thread_pool.hpp:159
15# execute_native_thread_routine at ../../../../../libstdc+-v3/src/c+11/thread.cc:84
16# start_thread at ./nptl/pthread_create.c:442
17# 0x00007FA32EAD7850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

If create rowset failed, calc_delete_bitmap_task still could be executed:

add segment failed load_id=5649413b98976f0d-a105b42749f561b0, txn_id=2, tablet_id=10088, status=[INTERNAL_ERROR]create row
set failed
...
submit calc delete bitmap task to executor, tablet_id: 10088, txn_id: 2

This PR skips close to avoid submit_calc_delete_bitmap_task if create rowset failed when loading MOW table to solve this problem.

…n loading MOW table (apache#40105)

Core dump happened when load to MOW table:
```
Check failure stack trace: ***
@ 0x55fae437d246 google::LogMessage::SendToLog()
@ 0x55fae4379c90 google::LogMessage::Flush()
@ 0x55fae437da89 google::LogMessageFatal::~LogMessageFatal()
@ 0x55faacf26bbf doris::BaseTablet::check_delete_bitmap_correctness()
@ 0x55fab05049ef doris::RowsetBuilder::commit_txn()
@ 0x55fab09026e8 doris::LoadStreamWriter::close()
@ 0x55fab089eff7 std::_Function_handler<>::_M_invoke()
@ 0x55fab0d14d7c doris::WorkThreadPool<>::work_thread()
@ 0x55fae76ae6f0 execute_native_thread_routine
@ 0x7fa32ea45ac3 (unknown)
@ 0x7fa32ead7850 (unknown)
@ (nil) (unknown)
Query id: a21981d5c8ef4113-84df9a5a8680e004 ***
is nereids: 0 ***
tablet id: 0 ***
Aborted at 1724668499 (unix time) try "date -d @1724668499" if you are using GNU date ***
Current BE git commitID: 2f84873 ***
SIGABRT unknown detail explain (@0x20db) received by PID 8411 (TID 9837 OR 0x7f9e42cfe640) from PID 8411; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
1# 0x00007FA32E9F3520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill at ./nptl/pthread_kill.c:89
3# raise at ../sysdeps/posix/raise.c:27
4# abort at ./stdlib/abort.c:81
5# 0x000055FAE4387B1D in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
6# 0x000055FAE437A15A in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
8# google::LogMessage::Flush() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::BaseTablet::check_delete_bitmap_correctness(std::shared_ptr, long, long, std::unordered_set, std::equal_to, std::allocator > const&, std::vector, std::allocator > >*) at /home/zcp/repo_center/doris_master/doris/be/src/olap/base_tablet.cpp:1152
11# doris::RowsetBuilder::commit_txn() at /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset_builder.cpp:316
12# doris::LoadStreamWriter::close() at /home/zcp/repo_center/doris_master/doris/be/src/runtime/load_stream_writer.cpp:311
13# std::_Function_handler::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
14# doris::WorkThreadPool::work_thread(int) at /home/zcp/repo_center/doris_master/doris/be/src/util/work_thread_pool.hpp:159
15# execute_native_thread_routine at ../../../../../libstdc+-v3/src/c+11/thread.cc:84
16# start_thread at ./nptl/pthread_create.c:442
17# 0x00007FA32EAD7850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

If create rowset failed,` calc_delete_bitmap_task` still could be
executed:
```
add segment failed load_id=5649413b98976f0d-a105b42749f561b0, txn_id=2, tablet_id=10088, status=[INTERNAL_ERROR]create row
set failed
...
submit calc delete bitmap task to executor, tablet_id: 10088, txn_id: 2
```

This PR skips close to avoid `submit_calc_delete_bitmap_task` if create
rowset failed when loading MOW table to solve this problem.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@sollhui
Copy link
Contributor Author

sollhui commented Sep 23, 2024

run buildall

@sollhui sollhui changed the title [fix](move-memtable) do not execute close if create rowset failed when loading MOW table [fix](move-memtable) do not execute close if create rowset failed when loading MOW table (#40105) Sep 23, 2024
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.13% (9333/25830)
Line Coverage: 27.67% (76619/276898)
Region Coverage: 26.46% (39365/148788)
Branch Coverage: 23.25% (20041/86192)
Coverage Report: http://coverage.selectdb-in.cc/coverage/af9b92fa827561996a6d184b593b49a74660f13f_af9b92fa827561996a6d184b593b49a74660f13f/report/index.html

@yiguolei yiguolei merged commit 4237341 into apache:branch-2.1 Sep 24, 2024
21 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants