Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker crash when merging factors #1413

Open
rixed opened this issue Feb 10, 2022 · 0 comments
Open

Worker crash when merging factors #1413

rixed opened this issue Feb 10, 2022 · 0 comments
Labels

Comments

@rixed
Copy link
Owner

rixed commented Feb 10, 2022

When dumping in ORC format after a restart of that raql program.

infra/hosts/memory: : Start outputting to /ramen/workers/ringbufs/v14/infra/hosts/memory/b37b0769087e79b53000b769b8f20c9a/archive.orc
infra/hosts/memory: : Has now 1 outputers (had 0)
infra/hosts/memory: :(W) Stumbled upon preexisting index /ramen/workers/factors/v3/4.07.1/infra/hosts/memory/b37b0769087e79b53000b769b8f20c9a.factors/host/0x1.8814f
d28p+30, merging...
confserver: User _worker_4fe5e40f4ed2/infra/hosts/memory disconnected
supervisor:(E) Worker infra/hosts/memory (pid 4650) killed by signal SEGV.
supervisor: Worker infra/hosts/memory is deadlooping. Deleting its state file, input ringbuffers, binary and out_ref config entry.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000556aa909a90f in orc_write_74de19577969cb437ead8caaee859bfe ()
[Current thread is 1 (Thread 0x7f833d4303c0 (LWP 5133))]
(gdb) bt
#0  0x0000556aa909a90f in orc_write_74de19577969cb437ead8caaee859bfe ()
#1  0x0000556aa88d32fc in camlM012d6079238f3cc63ee5e926c86bb51c_74de19577969cb437ead8caaee859bfe_v104_28_46_5_46_4_v47__anon_fn$5b$2framen$2fexecompserver$2fcache$2fv104_28$2e5$2e4_v47$2fm012d6079238f3cc63ee5e926c86bb51c_74de19577969cb437ead8caaee859bfe_v104_28_46_5_46_4_v47$2eml$3a751$2c22$2d$2d31$5d_2746683 ()
#2  0x0000556aa8d8f35d in camlStdlib__hashtbl__iter_1002 () at hashtbl.ml:266
#3  0x0000556aa88ef87e in camlRamenHelpersNoLog__finally_524 () at src/RamenHelpersNoLog.ml:9
#4  0x0000556aa8b5280c in camlCodeGenLib_Skeletons__aggregate_one_2322 () at src/CodeGenLib_Skeletons.ml:767
#5  0x0000556aa8b51f24 in camlCodeGenLib_Skeletons__anon_fn$5bsrc$2fCodeGenLib_Skeletons$2eml$3a581$2c6$2d$2d101$5d_2231 () at src/CodeGenLib_Skeletons.ml:583
#6  0x0000556aa8b53490 in camlCodeGenLib_Skeletons__on_tup_3289 () at src/CodeGenLib_Skeletons.ml:842
#7  0x0000556aa8ae26b8 in camlRingBufLib__loop_1336 () at src/RingBufLib.ml:288
#8  0x0000556aa8b4f114 in camlCodeGenLib_Skeletons__read_single_rb_8740 () at src/CodeGenLib_Skeletons.ml:834
#9  0x0000556aa88fe310 in camlRamenHelpers__loop_311 () at src/RamenHelpers.ml:42
#10 0x0000556aa88ef87e in camlRamenHelpersNoLog__finally_524 () at src/RamenHelpersNoLog.ml:9
#11 0x0000556aa8b51509 in camlCodeGenLib_Skeletons__aggregate_1832 () at src/CodeGenLib_Skeletons.ml:530
#12 0x0000556aa88d328a in camlM012d6079238f3cc63ee5e926c86bb51c_74de19577969cb437ead8caaee859bfe_v104_28_46_5_46_4_v47__worker_2746621 ()
#13 0x0000556aa88c0ca5 in camlM012d6079238f3cc63ee5e926c86bb51c_casing_v104_28_46_5_46_4_v47__entry ()
#14 0x0000556aa88b6a79 in caml_program ()
#15 0x0000556aa90bd2e0 in caml_start_program ()
#16 0x0000556aa90a02bd in caml_startup_common (argv=0x7ffe7087ef18, pooling=<optimized out>, pooling@entry=0) at startup.c:160
#17 0x0000556aa90a034b in caml_startup_exn (argv=<optimized out>) at startup.c:167
#18 caml_startup (argv=<optimized out>) at startup.c:172
#19 caml_main (argv=<optimized out>) at startup.c:179
#20 0x0000556aa88b51ec in main (argc=<optimized out>, argv=<optimized out>) at main.c:44

Guess is that the orc file created on disc is left in an inconsistent state at exit, which the orc writer fails to detect.
Another good reason not to archive directly in ORC.

@rixed rixed added the bug label Feb 10, 2022
@rixed rixed assigned rixed and unassigned rixed Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant