-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move first RISC-V builder over to cross-compiler + execute tests under qemu-system setup #279
base: main
Are you sure you want to change the base?
Move first RISC-V builder over to cross-compiler + execute tests under qemu-system setup #279
Conversation
…the stage1 CC/CXX This allows us to follow best/usual CMake practice for configuring cross compiling.
The obvious alternative is binfmt_misc. We have used this at Linaro to run AArch64 compiler binaries on x86 to build benchmarks. Though we then copied those to real hardware of course, it just let us use spare machines for compiling. Once we had enough Arm hardware we did it all on AArch64. Splitting the test build vs. test run apart for lit tests is not a viable option however. Benchmark build and run were already discreet steps so it was easy. Even if you run the tests via qemu-user too, you could argue it's not a "real" system layout. You pass a sysroot but having a proper Linux install is going to be more accurate, even if only by a small amount. It's the "test what you ship" saying but "test where you're going to ship to". And it gives you a VM image you can send to someone if they want to dig into the problem. There's probably a lot of overhead in ~10,000 qemu-users being set up and torn down as well, even compared to the kernel boot time. Am I right in thinking that this qemu-system VM is per run? That clears out any leftover files, which is good, disadvantage is that you can't get at reproducers as easily. Then again, when people want a reproducer from a Linaro bot they have to ask us for it, so only difference here is you would start a VM locally and reproduce it again. I thought maybe you could cross compile in stage 1 right away, but any bug in the host compiler's risc-v codegen would break that build. So it makes sense to build latest clang as the compiler for stage 2. You have enabled ccache so this mitigates the overhead of stage 1. We had some issues like this where SVE codegen in our host clang compiler for stage 1 would hit bugs that had already been fixed. Luckily there was a version we could upgrade to but otherwise you're trying to host nightly builds and it's a mess. |
"-DCMAKE_CXX_FLAGS='-march=rva20u64'"] | ||
util.Interpolate(f"-DLLVM_NATIVE_TOOL_DIR=%(prop:builddir)s/stage1.install/bin"), | ||
"-DLLVM_BUILD_TESTS=True", | ||
"-DLLVM_EXTERNAL_LIT=/home/buildbot-worker/lit-on-qemu-system-rva20.py", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a temporary path but where can this script be placed?
I think given that it's in stage 2 you could have it in llvm-project somewhere, as long as you hardcode the ../<whatever buildbot calls the repo folder because sometimes it uses non-default names...>
. I'm pretty sure the workers don't get a copy of llvm-zorg, otherwise that would be the perfect place for it.
(this is a not an important detail though, as long as the script is available somewhere for debugging purposes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick review and for picking this out. I'd actually meant to call out this hardcoding in the PR as something I'd appreciate input on. Happy to put the script wherever is helpful, and I can document the setup I use for the VM - though in reality a bit of work will be needed to precisely replicate the setup (as is the case for most builders). I'd probably be better restructuring so there's a common lit-on-qemu-system.py helper that takes/selects any additional arguments via environment variables (in this case, just different args for -cpu
for qemu).
if stage2_toolchain_options is None: | ||
compiler_args = [ | ||
f"-DCMAKE_C_COMPILER={stage1_cc}", | ||
f"-DCMAKE_CXX_COMPILER={stage1_cxx}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought maybe we should use a toolchain file for all 2 stage builds but:
- It's not recommended practice if you aren't cross compiling.
- CMAKE_C_COMPILER and friends would not be visible in the log files for stage 2.
- Reproducing a build now has the extra step of writing the toolchain file, even if it was just copy paste.
So I agree with the decision to not change how the existing builds work.
If transparency were our only goal here, I'd say to pass all the toolchain file contents as -D...
but you said:
(notably, some CMake variables can't be set outside of a toolchain file)
Which answers that (and I have had similar experiences with CMake).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crazy idea though, could the toolchain file's contents be pulled from a file here in the llvm-zorg repo?
It could be like:
<the options you set in the builder already>
# Set your host compiler here.
# set(CMAKE_C_COMPILER ....)
# set(CMAKE_CXX_COMPILER ...)
Then when someone wants to reproduce, there is a starting point that's already in the right form to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In not actual Python terms:
stage2_toolchain_options=[open(file).splitlines()]
file.write(f"set(CMAKE_C_COMPILER {stage1_cc})\n") | ||
file.write(f"set(CMAKE_CXX_COMPILER {stage1_cxx})\n") | ||
for option in stage2_toolchain_options: | ||
file.write(f"{option}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this runs on the build master or the worker. If its the former, it might need to be a step "write toolchain file" that echos the contents into a file on the worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I think it's likely this isn't writing to a worker-local file as I hope.
Also we cannot test lldb with qemu-user because it doesn't emulate ptrace calls. You can use qemu's own GDB stub as the debug server but lldb -> lldb-server tests don't work. To be clear, don't add lldb here. I am happy to see this though as interest in RISC-V lldb has been picking up lately and it will need checking eventually. |
binfmt_misc and qemu-user does work pretty well, and is slightly faster but requires masking out some tests that are incompatible with the approach. My feeling is that for something I hope to move to the non-staging buildmaster, using qemu-system in this way and thus minimising the chance of test failures being an artifact of the emulation process and its interaction with the build system is probably more viable. As you say below, LLDB tests can't run under qemu-user. I'd be a lot happier running libc tests under qemu-system as well rather than using it as a stress test for qemu-user syscall translation! So my reasoning is this approach can work well as a baseline that doesn't require alternate approaches for different sub-projects.
qemu-user for separate process typically scales better than running that same set of processes withing qemu-system (sorry I don't have numbers to hand right now). But for the reasons above, and due to concern about qemu-user needing more work in terms of masking specific tasks and the like, I prefer qemu-system.
Yes, effectively launched as a single-use appliance.
Plus a documented cross-compile flow means it's a bit easier to reproduce! I can look at making it easier to reproduce an equivalent VM image, though this will be easier once the next Debian release happens so there's a stable base rather than just using sid at the time the image was generated.
My thinking exactly. Especially with the different rva23 vectorisation codegen options, we're using building+testing clang as a bit of a stress test for codegen itself. Plus old enough clang/llvm didn't recognise all the options we're using. |
FYI I've taken a slight diversion to set up a good flow for testing builder config changes locally, in order to easier land this and other related changes: #289 |
This PR is structured as two patches (at the time of writing), though it might make sense to separate out the first one for separate review. I'm posting together to get initial feedback.
My goal is to move
clang-riscv-rva20-2stage
,clang-riscv-rva23-mrvv-vec-bits-2stage
, andclang-riscv-rva23-evl-vec-2stage
over so:ninja check-all
) we rely on a llvm-lit wrapper script that spins up a special-purpose VM with the source+build tree copied into it (see here) under qemu-system and runs tests there. This should be much faster than the current approach that does all building under qemu-system.This PR does this change just for
clang-riscv-rva20-2stage
for ease of review and so we can get any teething problems sorted out. Although it might be possible to properly configure the cross-build otherwise, I've adding a mechanism for creating and adding options to a toolchains file used by the second stage. This matches CMake recommended practice for cross-building, and after spending a lot of time iterating on various cross-build configs recently I've found it's the option that gives most control and fewer problems (notably, some CMake variables can't be set outside of a toolchain file).@gkistanova: The underlying worker is currently configured within a RISC-V qemu-system. Once this is ready to merge, I'll bring it over to running under the host meaning I shouldn't need to set up a new password and so on. But let me know if you'd prefer I do set up a new worker with a new name and retire this name.
We'll later want to add the ability to customise the enabled projects/runtimes vs stage1 vs stage2.
For reference, this approximates the logic that the configured builder should follow: