Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calls to JIT functions within JIT functions #3

Open
DavidPoliakoff opened this issue Mar 7, 2019 · 2 comments
Open

Calls to JIT functions within JIT functions #3

DavidPoliakoff opened this issue Mar 7, 2019 · 2 comments
Labels
enhancement New feature or request

Comments

@DavidPoliakoff
Copy link

Hey Hal. What I'm actually trying to do is make RAJA use JIT, and I think I have a path to it. However, I ran into a bug (some might refer to it as sanity), I don't think you can call JIT-marked functions from one another. A reproducer (no special compilation other than -fjit needed)

template<std::size_t N>
[[clang::jit]] void ping(std::size_t step);

template<std::size_t N>
[[clang::jit]] void pong(std::size_t step){
  std::cout << "Pong "<<N<<"\n";
}

template<std::size_t N>
[[clang::jit]] void ping(std::size_t step){
  std::cout << "Ping "<<N<<"\n";
  pong<N-step>(step);
}


int main(int argc, char* argv[]){
  std::size_t size = atoi(argv[1]);
  std::cout << "Size is "<<size<<"\n";
  ping<size>(size);

}

Note that this isn't really mutual recursion, it's just a call to ping, which calls pong. I think if we solve this we'll help allow constructs like

for(int rep = 0; rep<repeats;rep++){
    std::cout << "Loop iteration "<<rep<<"\n";
    RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(0,size),[&](int i){
      out_matrix[i] = 5.0f;
      std::cout << "Made it in the outer\n";
      RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(0,size),[&](int j){
      std::cout << "Made it in the middle\n";
        RAJA::forall<RAJA::seq_exec>(RAJA::RangeSegment(0,size),[&](int k){
          std::cout << "Made it in the inner\n";
          out_matrix[MAT2D(i,j,size)] += input_matrix1[MAT2D(i,k,size)] * input_matrix2[MAT2D(k,j,size)];
        });
      });
    });
  }

Which under the hood are calls to your JIT system, every call to RAJA::forall here is actually JITting its loop bounds (thanks to the bug you fixed yesterday)

@hfinkel hfinkel added the enhancement New feature or request label Mar 7, 2019
@hfinkel
Copy link
Owner

hfinkel commented Mar 7, 2019

Right now there are just two modes: During the ahead-of-time compile all JIT templates are delayed. At runtime, the system aggressively instantiates (the logic being to limit needless round trips into the compiler), and does this by more-or-less ignoring the JIT attributes. To make this work, there would need to be a fallback to generating more JIT calls (this requires some additional infrastructure that can be added, I think, but isn't there now).

@DavidPoliakoff
Copy link
Author

@hfinkel Just to be clear, this one isn't needed on a short timescale. I file issues so we're aware of what's up. If we work with the -O2 (I'm testing this now) I should have RAJA autojitting its kernel (nested parallelism) loop bounds, this issue is for later

dutiona pushed a commit to dutiona/llvm-project-cxxjit that referenced this issue Dec 21, 2023
When `Target::GetEntryPointAddress()` calls `exe_module->GetObjectFile()->GetEntryPointAddress()`, and the returned
`entry_addr` is valid, it can immediately be returned.

However, just before that, an `llvm::Error` value has been setup, but in this case it is not consumed before returning, like is done further below in the function.

In https://bugs.freebsd.org/248745 we got a bug report for this, where a very simple test case aborts and dumps core:

```
* thread hfinkel#1, name = 'testcase', stop reason = breakpoint 1.1
    frame #0: 0x00000000002018d4 testcase`main(argc=1, argv=0x00007fffffffea18) at testcase.c:3:5
   1	int main(int argc, char *argv[])
   2	{
-> 3	    return 0;
   4	}
(lldb) p argc
Program aborted due to an unhandled Error:
Error value was Success. (Note: Success values must still be checked prior to being destroyed).

Thread 1 received signal SIGABRT, Aborted.
thr_kill () at thr_kill.S:3
3	thr_kill.S: No such file or directory.
(gdb) bt
#0  thr_kill () at thr_kill.S:3
hfinkel#1  0x00000008049a0004 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
hfinkel#2  0x0000000804916229 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
hfinkel#3  0x000000000451b5f5 in fatalUncheckedError () at /usr/src/contrib/llvm-project/llvm/lib/Support/Error.cpp:112
hfinkel#4  0x00000000019cf008 in GetEntryPointAddress () at /usr/src/contrib/llvm-project/llvm/include/llvm/Support/Error.h:267
hfinkel#5  0x0000000001bccbd8 in ConstructorSetup () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:67
hfinkel#6  0x0000000001bcd2c0 in ThreadPlanCallFunction () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:114
hfinkel#7  0x00000000020076d4 in InferiorCallMmap () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp:97
hfinkel#8  0x0000000001f4be33 in DoAllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/FreeBSD/ProcessFreeBSD.cpp:604
hfinkel#9  0x0000000001fe51b9 in AllocatePage () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:347
hfinkel#10 0x0000000001fe5385 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:383
hfinkel#11 0x0000000001974da2 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2301
hfinkel#12 CanJIT () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2331
hfinkel#13 0x0000000001a1bf3d in Evaluate () at /usr/src/contrib/llvm-project/lldb/source/Expression/UserExpression.cpp:190
hfinkel#14 0x00000000019ce7a2 in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Target/Target.cpp:2372
hfinkel#15 0x0000000001ad784c in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:414
hfinkel#16 0x0000000001ad86ae in DoExecute () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:646
hfinkel#17 0x0000000001a5e3ed in Execute () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandObject.cpp:1003
hfinkel#18 0x0000000001a6c4a3 in HandleCommand () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:1762
hfinkel#19 0x0000000001a6f98c in IOHandlerInputComplete () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2760
hfinkel#20 0x0000000001a90b08 in Run () at /usr/src/contrib/llvm-project/lldb/source/Core/IOHandler.cpp:548
hfinkel#21 0x00000000019a6c6a in ExecuteIOHandlers () at /usr/src/contrib/llvm-project/lldb/source/Core/Debugger.cpp:903
hfinkel#22 0x0000000001a70337 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2946
hfinkel#23 0x0000000001d9d812 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/API/SBDebugger.cpp:1169
hfinkel#24 0x0000000001918be8 in MainLoop () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:675
hfinkel#25 0x000000000191a114 in main () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:890```

Fix the incorrect error catch by only instantiating an `Error` object if it is necessary.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D86355

(cherry picked from commit 1ce07cd)
dutiona pushed a commit to dutiona/llvm-project-cxxjit that referenced this issue Dec 21, 2023
This patch re-introduces the fix in the commit llvm/llvm-project@66b0cebf7f736 by @yrnkrn

> In DwarfEHPrepare, after all passes are run, RewindFunction may be a dangling
>
> pointer to a dead function. To make sure it's valid, doFinalization nullptrs
> RewindFunction just like the constructor and so it will be found on next run.
>
> llvm-svn: 217737

It seems that the fix was not migrated to `DwarfEHPrepareLegacyPass`.

This patch also updates `llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` to include `-run-twice` to exercise the cleanup. Without this patch `llvm-lit -v llvm/test/CodeGen/X86/dwarf-eh-prepare.ll` fails with

```
-- Testing: 1 tests, 1 workers --
FAIL: LLVM :: CodeGen/X86/dwarf-eh-prepare.ll (1 of 1)
******************** TEST 'LLVM :: CodeGen/X86/dwarf-eh-prepare.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice < /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll -S | /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll
--
Exit Code: 2

Command Output (stderr):
--
Referencing function in another module!
  call void @_Unwind_Resume(i8* %ehptr) hfinkel#1
; ModuleID = '<stdin>'
void (i8*)* @_Unwind_Resume
; ModuleID = '<stdin>'
in function simple_cleanup_catch
LLVM ERROR: Broken function found, compilation aborted!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/arakaki/build/llvm-project/main/bin/opt -mtriple=x86_64-linux-gnu -dwarfehprepare -simplifycfg-require-and-preserve-domtree=1 -run-twice -S
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Module Verifier' on function '@simple_cleanup_catch'
 #0 0x000056121b570a2c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:0
 hfinkel#1 0x000056121b56eb64 llvm::sys::RunSignalHandlers() /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Signals.cpp:97:0
 hfinkel#2 0x000056121b56f28e SignalHandler(int) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/Unix/Signals.inc:397:0
 hfinkel#3 0x00007fc7e9b22980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 hfinkel#4 0x00007fc7e87d3fb7 raise /build/glibc-S7xCS9/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 hfinkel#5 0x00007fc7e87d5921 abort /build/glibc-S7xCS9/glibc-2.27/stdlib/abort.c:81:0
 hfinkel#6 0x000056121b4e1386 llvm::raw_svector_ostream::raw_svector_ostream(llvm::SmallVectorImpl<char>&) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:674:0
 hfinkel#7 0x000056121b4e1386 llvm::report_fatal_error(llvm::Twine const&, bool) /home/arakaki/repos/watch/llvm-project/llvm/lib/Support/ErrorHandling.cpp:114:0
 hfinkel#8 0x000056121b4e1528 (/home/arakaki/build/llvm-project/main/bin/opt+0x29e3528)
 hfinkel#9 0x000056121adfd03f llvm::raw_ostream::operator<<(llvm::StringRef) /home/arakaki/repos/watch/llvm-project/llvm/include/llvm/Support/raw_ostream.h:218:0
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/arakaki/build/llvm-project/main/bin/FileCheck /home/arakaki/repos/watch/llvm-project/llvm/test/CodeGen/X86/dwarf-eh-prepare.ll

--

********************
********************
Failed Tests (1):
  LLVM :: CodeGen/X86/dwarf-eh-prepare.ll

Testing Time: 0.22s
  Failed: 1
```

Reviewed By: loladiro

Differential Revision: https://reviews.llvm.org/D110979

(cherry picked from commit e8806d7)
dutiona pushed a commit to dutiona/llvm-project-cxxjit that referenced this issue Dec 21, 2023
We experienced some deadlocks when we used multiple threads for logging
using `scan-builds` intercept-build tool when we used multiple threads by
e.g. logging `make -j16`

```
(gdb) bt
#0  0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
hfinkel#1  0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
hfinkel#2  0x00007f2bb3d152e4 in ?? ()
hfinkel#3  0x00007ffcc5f0cc80 in ?? ()
hfinkel#4  0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2
hfinkel#5  0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
hfinkel#6  0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6
hfinkel#7  0x00007f2bb3d144ee in ?? ()
hfinkel#8  0x746e692f706d742f in ?? ()
hfinkel#9  0x692d747065637265 in ?? ()
hfinkel#10 0x2f653631326b3034 in ?? ()
hfinkel#11 0x646d632e35353532 in ?? ()
hfinkel#12 0x0000000000000000 in ?? ()
```

I think the gcc's exit call caused the injected `libear.so` to be unloaded
by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`.
That tried to acquire an already locked mutex which was left locked in the
`bear_report_call()` call, that probably encountered some error and
returned early when it forgot to unlock the mutex.

All of these are speculation since from the backtrace I could not verify
if frames 2 and 3 are in fact corresponding to the `libear.so` module.
But I think it's a fairly safe bet.

So, hereby I'm releasing the held mutex on *all paths*, even if some failure
happens.

PS: I would use lock_guards, but it's C.

Reviewed-by: NoQ

Differential Revision: https://reviews.llvm.org/D118439

(cherry picked from commit d919d02)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants