Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHA: Promote macOS-arm64 cross-compilation job to full native job #4541

Merged
merged 1 commit into from
May 10, 2024

Conversation

kinke
Copy link
Member

@kinke kinke commented Dec 5, 2023

Using the new CI runners (awesome performance!).

@kinke kinke changed the title GHA: Test native macOS-arm64 job GHA: Promote macOS-arm64 cross-compilation job to full native job Feb 8, 2024
@kinke
Copy link
Member Author

kinke commented Feb 8, 2024

2 remaining failures:

  • std.internal.math.gammafunction unittests, with enabled optimizations only
  • lit-test driver/config_diag.d

These work for Cirrus CI, on macOS 12 (not 14, and surely a different Xcode version too).

@JohanEngelen
Copy link
Member

JohanEngelen commented Feb 8, 2024

2 remaining failures:

  • std.internal.math.gammafunction unittests, with enabled optimizations only
  • lit-test driver/config_diag.d

These work for Cirrus CI, on macOS 12 (not 14, and surely a different Xcode version too).

Hmm, some strange miscompile somehow?

lit-test driver/config_diag.d works for me, macOS 14.2.1, LLVM 17, Apple clang 15.0.0.
And the Phobos failure also works locally for me:

❯ bin/ldc2 -O -main -unittest -run ../ldc/runtime/phobos/std/internal/math/gammafunction.d
1 modules passed unittests

@kinke kinke marked this pull request as ready for review February 10, 2024 18:28
@JohanEngelen
Copy link
Member

JohanEngelen commented Feb 10, 2024

Before merging this PR, I think I should download the artifacts (that's possible right?) and compare the output of the gamma unittest with my local build, and see if I can figure out what the miscompile is. Otherwise, I fear we release with a somehow miscompiling compiler...

@JohanEngelen
Copy link
Member

I downloaded the osx-universal artifact:

  • bin/ldc2 -O -main -unittest -run import/std/internal/math/gammafunction.d Passes fine. Should I be running something else?
  • bin/ldc2 -conf=/Users/johan/ldc/ldc/tests/driver/inputs/noswitches.conf reproduces (it works with other LDC, but crashes with the artifact ldc)

@JohanEngelen
Copy link
Member

About the bin/ldc2 -conf=/Users/johan/ldc/ldc/tests/driver/inputs/noswitches.conf failure. I may have found some hints:

  • it fails while throwing an exception. We intend to throw (and catch) the exception, that is exactly what the test is testing (throw new Exception("Could not look up switches in " ~ cast(string) dCfPath);).
  • after some searching I think it is the only case where we throw an exception in the compiler. What I mean is: I think it is the only CI test where inside the compiler an exception is thrown.
  • when loading the ldc2 binary into lldb and running the test with -conf=, this is the output:
(lldb) run -conf=/Users/johan/ldc/ldc/tests/driver/inputs/noswitches.conf
Process 5078 launched: '/Users/johan/ldc/test_gha/ldc2-ce3f8516-osx-universal/bin/ldc2' (arm64)
Process 5078 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000195ebddb4 
         libunwind.dylib`libunwind::CFI_Parser<libunwind::LocalAddressSpace>::decodeFDE(libunwind::LocalAddressSpace&,
         unsigned long, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::FDE_Info*, 
         libunwind::CFI_Parser<libunwind::LocalAddressSpace>::CIE_Info*, bool) + 48
libunwind.dylib`libunwind::CFI_Parser<libunwind::LocalAddressSpace>::decodeFDE:
  • when trying to check the backtrace (bt), lldb outputs a ton of these errors
(lldb) bt
error: unable to find CIE at 0x33bbc for cie_id = 0xfffd1888 for entry at 0x5440.
error: unable to find CIE at 0x3887c for cie_id = 0xfffcf588 for entry at 0x7e00.
error: unable to find CIE at 0x10f38 for cie_id = 0xffff843c for entry at 0x9370.
error: unable to find CIE at 0x174b4 for cie_id = 0xffff2930 for entry at 0x9de0.
error: unable to find CIE at 0x3c990 for cie_id = 0xfffcd9e4 for entry at 0xa370.

@kinke
Copy link
Member Author

kinke commented Feb 11, 2024

The gammafunction module consistently fails on the new M1 GHA runners for the vanilla-LLVM jobs too, using vanilla LLVM 16 & 17. The config_diag.d lit-test works there though (different LLVM, no assertions, different host compiler, no LTO, no PGO...).

@kinke
Copy link
Member Author

kinke commented Feb 24, 2024

@JohanEngelen: So wrt. gammafunction, I'd expect you to see it too, with the regular Phobos unittest runner. - Wrt. the thrown exception for the extra .conf, I'm wondering how the current CI artifacts behave (cross-compiled, but still with PGO + LTO + mimalloc IIRC). [And note that we don't compile the CI artifacts with -g, otherwise they'd be huge.]

@kinke
Copy link
Member Author

kinke commented Apr 13, 2024

[Draft because of random Pure virtual function called! errors (at compiler runtime) in first experiments in #4604, very roughly 0-5 per CI run.] The previous issues are resolved by now.

@kinke kinke force-pushed the gha_native_arm64 branch 2 times, most recently from 3d92b47 to d055b79 Compare April 19, 2024 17:55
@kinke
Copy link
Member Author

kinke commented May 4, 2024

The situation hasn't changed with latest LLVM v18.1.5 and the latest GHA macos-14 image. I've retried the CI job 2 times; the first 2 runs were green, the third now had one failure again:

/Users/runner/work/ldc/build/bin/ldmd2 -conf= -m64 -Irunnable -mcpu=native -g -link-defaultlib-debug  -od../../../build/dmd-testsuite-debug/runnable -of../../../build/dmd-testsuite-debug/runnable/mars1_0  runnable/mars1.d 
libc++abi: Pure virtual function called!
Error: Error executing /Users/runner/work/ldc/build/bin/ldc2: Abort trap: 6

@kinke
Copy link
Member Author

kinke commented May 10, 2024

I went an extra mile of building the LLVM package on macos-14 with the oldest available Xcode v14.3.1, and switching to that Xcode and LLVM package here. No improvements - still sporadic 'pure virtual function called' crashes. swearing

@kinke
Copy link
Member Author

kinke commented May 10, 2024

Oh man, this keeps getting weirder and more ball-busting. So I've now tried switching back to the old LDC-LLVM v17.0.6 package, which was cross-compiled on macos-12 (edit: or more likely even v11) at the time; I don't recall which Xcode version, but at most v14. While using Xcode v14.3.1 for LDC here.

Results for 2 CI pipelines/workflow runs, with 4 native macos arm64 jobs each, 2 without PGO and only D-LTO, and 2 with PGO plus full LTO (incl. C++ parts - the former 'unsupported stack probe' error vanished!):

  • The 4 overall jobs with PGO + full LTO never encountered the 'pure virtual function called' errors so far; only failed sporadically for core.thread.fiber with enabled optimizations (=> Fix crashing core.thread.fiber unittest for AArch64. #4648).
  • The other 4 jobs without PGO and only D-limited LTO encountered at least 1 'pure virtual function called' error every time.

So it looks as if PGO and/or full LTO might fix this abomination for that new combination of prebuilt LLVM and Xcode - the options that with LLVM 18 and latest Xcode v15.3 led to more failures! I'm gonna re-run the workflow some more times to see if this really holds.

@kinke
Copy link
Member Author

kinke commented May 10, 2024

Same results after 5 workflow runs, i.e., 10 jobs each - at least 1 pure-virtual-func error for all 10 jobs without PGO and D-limited LTO, not a single one for the 10 jobs with PGO + full LTO.

@kinke kinke marked this pull request as ready for review May 10, 2024 19:48
@kinke kinke merged commit 3839811 into ldc-developers:master May 10, 2024
22 of 23 checks passed
@kinke kinke deleted the gha_native_arm64 branch May 10, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants