Reapply " [XRay] Add support for instrumentation of DSOs on x86_64 (#90959)" #112930

GEP offsets have sext_or_trunc semantics. We were already doing this for the outer-most GEP, but not for the inner ones. I believe one of the sanitizer buildbot failures was due to this, but I did not manage to reproduce the issue or come up with a test case. Usually the problematic case will already be folded away due to index type canonicalization.

bazelbuild: fix for llvm@2ce10f0. No functional changes intended.

Fix for llvm@d80b9cf. No functional changes intended.

…en issue Since llvm#109628 landed, this test has been failing on 32-bit Arm. This is due to a codegen problem (whether added or uncovered by the change, not known) where the trap instruction is placed after the frame pointer and link register are restored. llvm#113154 So the code was: ``` std::__1::vector<int>::operator[](unsigned int): sub sp, sp, llvm#8 str r0, [sp, llvm#4] str r1, [sp] add sp, sp, llvm#8 .inst 0xe7ffdefe bx lr ``` When lldb saw the trap, the PC was inside operator[] but the frame information actually pointed to g. This bug only happens for leaf functions so adding a return type works around it: ``` std::__1::vector<int>::operator[](unsigned int): push {r11, lr} mov r11, sp sub sp, sp, llvm#8 str r0, [sp, llvm#4] str r1, [sp] mov sp, r11 pop {r11, lr} .inst 0xe7ffdefe bx lr ``` (and operator[] should return T& anyway) Now the PC location and frame information should match and the test passes.

…xt parameter to getBufferForFile (llvm#111723) This patch adds an IsText parameter to the following getBufferForFile, getBufferForFileImpl. We introduce a new virtual function openFileForReadBinary which defaults to openFileForRead except in RealFileSystem which uses the OF_None flag instead of OF_Text. The default is set to OF_Text instead of OF_None, this change in value does not affect any other platforms other than z/OS. Setting this parameter correctly is required to open files on z/OS in the correct encoding. The IsText parameter is based on the context of where we open files, for example, in the ASTReader, HeaderMap requires that files always be opened in binary even though they might be tagged as text.

…) for reductions" This reverts commit 7f2e937 as it causes regressions in the tests it modifies, and undoes what was added in llvm#100653 (which itself was a fix for a previous regression).

The initial version of this feature would use the output file name if it could, but in switching to temp files I forgot to replicate that behaviour. What happens now is we always use a tempfile name and the output path is a template for that. I think the current behaviour still makes sense so I'm just correcting the documentation.

This code is heavily based on the SelectionDAG lowerINSERT_SUBVECTOR code.

Fixes 22e21bc.

The SPIR-V backend will need to use Reg2Mem, hence this pass needs to be wrapped to be used with the legacy pass manager. --------- Signed-off-by: Nathan Gauër <[email protected]>

Add sign extension on i32 return value.

…ace tablegen patterns This lowers the aarch64_neon_sqxtn intrinsics to the new TRUNCATE_SSAT_S ISD nodes, performing the same for sqxtun and uqxtn. This allows us to clean up the tablegen patterns a little and in a future commit add combines for sqxtn.

…llvm#112686) Currently, the `omp.simd` operation is ignored during MLIR to LLVM IR translation when it takes part in a composite construct. One consequence of this limitation is that any entry block arguments defined by that operation will trigger a compiler crash if they are used anywhere, as they are not bound to an LLVM IR value. A previous PR introducing support for the `reduction` clause resulted in the creation and use of entry block arguments attached to the `omp.simd` operation, causing compiler crashes on 'do simd reduction(...)' constructs. This patch disables Flang lowering of simd reductions in 'do simd' constructs to avoid triggering these errors while translation to LLVM IR is still incomplete.

llvm#113119) StringMap::find takes StringRef. We don't need to create an instance of std::string from StringRef only to convert it right back to StringRef.

This code intentionally discards the high bits, so set implicitTrunc=true. This is currently NFC but will enable an APInt assertion in the future.

…vm#111575) Adds a new mlir-opt test-only pass, -test-spirv-cpu-runner-pipeline, which runs the set of MLIR passes needed for the mlir-spirv-cpu-runner, and removes them from the runner. The tests are changed to invoke mlir-opt with this flag before running the runner. The eventual goal is to move all host/device code generation steps out of the runner, like with some of the other runners.

6bac414 added this opcode with the wrong number of operands. It didn't fail on check-llvm for me or on pre-commit CI, but once committed we got buildbot failures. This patch fixes the definition of the instruction and fixes the failing test.

With the truncssat nodes these are relatively simple tablegen patterns to add. The existing intrinsics are converted to shift+truncsat to they can lower using the new patterns. Fixes llvm#112925.

…eturn value is different in pointer / lvalue ref / rvalue ref (llvm#112853) Per https://cplusplus.github.io/CWG/issues/960.html.

This patch adds assembly/disassembly for the following instructions: ldfadd{a,al,l,}, ldbfadd{a,al,l,} ldfmax{a,al,l,}, ldbfmax{a,al,l,} ldfmaxnm{a,al,l,}, ldbfmaxnm{a,al,l,} ldfmin{a,al,l,}, ldbfmin{a,al,l,} ldfminnm{a,al,l,} ldbfminnm{a,al,l,} stfadd{l,}, stbfadd{l,} stfmax{l,}, stbfmax{l,} stfmaxnm{l,}, stbfmaxnm{l,} stfmin{l,}, stbfmin{l,} stfminnm{l,}, stbfminnm{l,} According to [1] [1]https://developer.arm.com/documentation/ddi0602 Co-authored-by: Spencer Abson [[email protected]](mailto:[email protected]) Co-authored-by: Caroline Concatto [[email protected]](mailto:[email protected])

…llvm#113164) Fixes llvm#113123 Alive proof: https://alive2.llvm.org/ce/z/hnqeLC

…m#112935) Done in preparation of exploring rtsan on windows.

Add new register classes/operands and their encoder/decoder behaviour required for the new Armv9.6 instructions (see https://developer.arm.com/documentation/109697/2024_09/Feature-descriptions/The-Armv9-6-architecture-extension). This work is the basis ofthe 2024 Armv9.6 architecture update effort for SME. Co-authored-by: Caroline Concatto [email protected] Co-authored-by: Marian Lukac [email protected] Co-authored-by: Momchil Velikov [email protected]

…-opt" (llvm#113176) Reverts llvm#111575 This caused build failures: https://lab.llvm.org/buildbot/#/builders/138/builds/5244

Improve the codegen for uaddo node for i64 in 64-bit mode and i32 in 32-bit mode by custom lowering.

Test added by commit 47a6da2 fails on the AIX bot. So XFAIL for now to investigate further.

Previously we were attempting to remove the memprof-related metadata when iterating through instructions in the LTO backend. However, we missed some as there are a number of cases where we skip instructions, or even entire functions. Simplify the cleanup and ensure all is removed by doing a full sweep over all instructions after completing cloning. This is largely NFC except with -memprof-report-hinted-sizes enabled, because we were propagating and simplifying the metadata after inlining in the LTO backend, which caused some stray messages as metadata was re-converted to attributes.

movzx r11d,BYTE PTR [rdx] is four bytes long. Follow-up to llvm#111638

Remove the unused functions and register classes from the change below llvm@4679583

Root gather/buildvector node should be ignored when SLP vectorizer tries to find matching gather nodes, vectorized earlier. This node is definitely the last one in the pipeline and it does not have users. It may cause the compiler crash Fixes llvm#113143

This patch adds a LegalityResultWithReason class for describing the reason why legality decided not to vectorize the code.

…llvm#109850) Special case small int constant in the PPC custom lowering of scalar_to_vector.

This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on. Built by GCC 11. Fix warning In destructor ‘llvm::APInt::~APInt()’, inlined from ‘llvm::APInt::~APInt()’ at llvm-project/llvm/include/llvm/ADT/APInt.h:190:3, inlined from ‘llvm::APSInt::~APSInt()’ at llvm-project/llvm/include/llvm/ADT/APSInt.h:23:21, inlined from ‘bool checkOMPArraySectionConstantForReduction(clang::ASTContext&, const clang::ArraySectionExpr*, bool&, llvm::SmallVectorImpl<llvm::APSInt>&)’ at llvm-project/clang/lib/Sema/SemaOpenMP.cpp:18357:45, inlined from ‘bool actOnOMPReductionKindClause(clang::Sema&, {anonymous}::DSAStackTy*, clang::OpenMPClauseKind, llvm::ArrayRef<clang::Expr*>, clang::SourceLocation, clang::SourceLocation, clang::SourceLocation, clang::SourceLocation, clang::CXXScopeSpec&, const clang::DeclarationNameInfo&, llvm::ArrayRef<clang::Expr*>, {anonymous}::ReductionData&)’ at llvm-project/clang/lib/Sema/SemaOpenMP.cpp:18715:68: llvm-project/llvm/include/llvm/ADT/APInt.h:192:18: error: ‘void operator delete [](void*)’ called on a pointer to an unallocated object ‘1’ [-Werror=free-nonheap-object] 192 | delete[] U.pVal; | ^~~~

This patch fixes: llvm/include/llvm/Transforms/Vectorize/SandboxVectorizer/Legality.h:85:16: error: private field 'Reason' is not used [-Werror,-Wunused-private-field]

…2765) MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's been identified this message is only really needed for VGPR limited kernels. A kernel becomes VGPR limited if a total number of VGPRs per SIMD / number of used VGPRs is more than a number of wave slots.

…#112990) Renames LegalizeData to LegalizeDataValues since this pass fixes up SSA values. LegalizeData suggested that it fixed data mapping. This change also adds support to fix up ssa values for data clause operations. Effectively, compute regions within a data region use the ssa values from data operations also. The ssa values within data regions but not within compute regions are not updated. This change is to support the requirement in the OpenACC spec which notes that a visible data clause is not just one on the current compute construct but on the lexically containing data construct or visible declare directive.

Based on this RFC: https://discourse.llvm.org/t/rfc-allow-the-scalarizer-pass-to-scalarize-vectors-returned-in-structs/82306 LLVM intrinsics do not support out params. To get around this limitation implementers will make intrinsics return structs to capture a return type and an out param. This implementation detail should not impact scalarization since these cases should be elementwise operations. ## Three changes are needed. - The CallInst visitor needs to be updated to handle Structs - A new visitor is needed for `ExtractValue` instructions - finsh needs to be update to handle structs so that insert elements are properly propogated. ## Testing changes - Add support for `llvm.frexp` - Add support for `llvm.dx.splitdouble` fixes llvm#111437

…vm#112997) The x86-fold-tables.td has been failing for me and [in CI](https://buildkite.com/llvm-project/github-pull-requests/builds/111277#0192a122-c5c9-4e4e-bc5b-7532fec99ae4) if Git happens to decide to check out the baseline file with Windows line endings. This fix for this is to add the `--strip-trailing-cr` option to diff to normalize the line endings before comparing them.

…lvm#112995) llvm#98060 introduced a warning for unterminated string constants, however it was only checking for `\n` which means that it produced strange results on Windows (always blaming column 1) including having the [associated test fail](https://buildkite.com/llvm-project/github-pull-requests/builds/111277#0192a122-c5c9-4e4e-bc5b-7532fec99ae4) if Git happened to use Windows newlines when creating the file. This fix for this is to detect both `\r` and `\n`, but don't double-warn for Windows newlines.

This file is covered under the Apple open source license rather than the LLVM license. Presumably this was an oversight, but it doesn't really matter as this file is unused. Remove it altogether.

…111927) Autogenerate `.ll` code from cpp code in some `-icf-safe-thunk` tests using `update_test_body.py` ``` PATH=build/bin:$PATH llvm/utils/update_test_body.py lld/test/MachO/icf-safe-thunks.ll lld/test/MachO/icf-safe-thunks-dwarf.ll ``` https://llvm.org/docs/TestingGuide.html#elaborated-tests I recently became aware of this tool and I wanted to practice using it. This also allows to remove the custom instructions to generate the `.ll` code.

…m#113171) Use `auto` when initializing a variable with `cast<>`. Remove some unneeded `const_cast` (since all Init pointers are now const).

Change the spill weight calculations for `optsize` functions to remove the block frequency multiplier. For those functions, we do not want to consider the runtime cost of spilling, only the codesize cost. I built a large app with the basic and greedy (default) register allocator enabled. | Regalloc Type | Uncompressed Size Delta | Compressed Size Delta | | - | - | - | | Basic | -303.8 KiB (-0.23%) | -232.0 KiB (-0.39%) | | Greedy | 159.1 KiB (0.12%) | 130.1 KiB (0.22%) | Since I only saw a size win with the basic register allocator, I decided to only change the behavior for that type.

We need to support 64-bit data types (intrinsics do support it). We are also silently converting FP to integer argument now, also fixed.

…ductions Enables initial non-power-of-2 support (but still requires number of elements, forming whole registers) for reductions. Enables extra vectorization for MultiSource/Benchmarks/7zip/7zip-benchmark, CINT2006/464.h264ref and CFP2017rate/526.blender_r (checked for SSE2) Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#112361

… on Windows (llvm#113209) With llvm#112981, the test uses awk, which gnuwin32 doesn't seem to have.

Similar to bd861d0, this adds some patterns for converting signed and unsigned variants of rshr+qxtn to qrshrn.

With bazel 8.x these are strongly encouraged, and this disambiguates which version of these rules we get for older versions. Specifically the native.py_test was using the wrong version of py_test.

…(again) (llvm#112962) When adding new SEH pseudo instructions in llvm#110024 I noticed that some of the tests were changing their output since these new instructions were counting towards thresholds for branching versus folding decisions. These instructions do not result in real machine instructions being emitted, so they should be marked as meta instructions. This is a re-do of llvm#110889 as we hit an issue where some of the SEH pseudo instructions in the prolog were being duplicated, which resulted errors being raised as the CodeView generator was seeing prolog directives after an end-prolog directive: <llvm#110889 (comment)>. The fix for this is to mark the prolog related SEH pseudo instructions as being non-duplicatable.

If we split these features in the compiler (see relevant pull request llvm#109299), we would only be able to hand-write a 'memtag2' version using inline assembly since the compiler cannot generate the instructions that become available with FEAT_MTE2. However these instructions only work at Exception Level 1, so they would be unusable since FMV is a user space facility. I am therefore unifying them. Approved in ACLE as ARM-software/acle#351

…ties (llvm#112851) When an operation has no properties, no property struct is emitted. To avoid a compilation error, we should also skip emitting `setPropertiesFromParsedAttr`, `parseProperties` and `printProperties` in such cases. Compilation error: ``` error: ‘Properties’ has not been declared static ::llvm::LogicalResult setPropertiesFromParsedAttr(Properties &prop, ::mlir::Attribute attr, ::llvm::function_ref<::mlir::InFlightDiagnostic()> emitError); ```

This improves performance of doing a `bazel test @llvm-project//...` a lot because previously every lit test would have some symlink tree configured for it.

Delete unused class member in `SearchableTableEmitter` class.

Currently all the headings marked as `#` show up as a top-level entry in the `Developing LLDB` toctree. This patch marks these as `##` so only `Adding Programming Language Support` is displayed in the table of contents.

The test was failing on s390x with this error: JIT session error: Unsupported target machine architecture in ELF object <main>-jitted-objectbuffer

Implements std::from_chars for float and double. The implementation uses LLVM-libc to do the real parsing. Since this is the first time libc++ uses LLVM-libc there is a bit of additional infrastructure code. The patch is based on the [RFC] Project Hand In Hand (LLVM-libc/libc++ code sharing) https://discourse.llvm.org/t/rfc-project-hand-in-hand-llvm-libc-libc-code-sharing/77701

…fails (llvm#112695)

Fix a number of dependencies issue to build mlir-linalg-ods-yaml-gen host binary which make a cross-build using the Make generator fail. Namely: - do not use binary path for the custom target created when LLVM_USE_HOST_TOOLS is true; - use target name instead of name of variable holding the target name for add_custom_target and set_target_properties in setup_host_tool(); - remove dependency on target defined in different directory in add_linalg_ods_yaml_gen() since add_custom_target DEPENDS can only be used on "files and outputs of custom commands created with add_custom_command() command calls in the same directory"; - remove unneeded dependency on ${MLIR_LINALG_ODS_YAML_GEN_EXE}, the target dependency will ensure the binary will be built. Note that we keep using ${MLIR_LINALG_ODS_YAML_GEN_EXE} in the COMMAND rather than use ${MLIR_LINALG_ODS_YAML_GEN_TARGET} because when LLVM_NATIVE_TOOL_DIR is used the latter is an empty string. Testing-wise, all three codepaths in get_host_tool_path() were tested with both GNU Make and Ninja generators: - cross-compiling with LLVM_NATIVE_TOOL_DIR checks the if path; - cross-compiling without LLVM_NATIVE_TOOL_DIR checks the elseif path; - native build without LLVM_NATIVE_TOOL_DIR checks the else path.

) Reverts llvm#112224 Many bots are broken

…lvm#112979) There are many more tests to add, but I would like to get this reviewed and the details sorted out before it grows too big.

Add the defines for the `cl_ext_image_unorm_int_2_101010_EXT` extension.

llvm#113231) …eeds (llvm#112979) This reverts commit d91318b.

llvm#112645)

The new matcher is more flexible and can be used to build matchers for additional recipe types without unnecessary duplication.

With this change and appropriate linker changes (https://r.android.com/3236256) AOSP boots with memtag-global throughout the platform. Without this change, we would sometimes generate PC-relative references to tagged globals, which then do not have the proper tag.

For Hand-In-Hand we need float to string to support a wider set of architectures, specifically the ones that libc++ supports. This includes powerpc, which apparently uses "double double" as its long double type. Since Hand-In-Hand isn't currently using long double, this just opts them out.

@michalpaszkowski

Add the llvm-canon tool. Description from the [original PR](https://reviews.llvm.org/D66029#change-wZv3yOpDdxIu): > Added a new llvm-canon tool which aims to transform LLVM Modules into a canonical form by reordering and renaming instructions while preserving the same semantics. This tool makes it easier to spot semantic differences while diffing two modules which have undergone different transformation passes. The current version of this tool can: - Reorder instructions within a function. - Rename instructions based on the operands. - Sort commutative operands. This code was originally written by @michalpaszkowski and [submitted to mainline LLVM](llvm@14d3585). However, it was quickly [reverted](llvm@335de55) to do BuildBot errors. Michal presented his version of the tool in [LLVM-Canon: Shooting for Clear Diffs](https://www.youtube.com/watch?v=c9WMijSOEUg). @AidanGoldfarb and I ported the code to the new pass manager, added more tests, and fixed some bugs related to PHI nodes that may have been the root cause of the BuildBot errors that caused the patch to be reverted. Additionally, we rewrote the implementation of instruction reordering to fix cases where the original algorithm would break use-def chains. Note that this is @AidanGoldfarb and I's first time submitting to LLVM. Please liberally critique the PR! CC @plotfi for initial review. --------- Co-authored-by: Aidan <[email protected]>

@Snowy1803

This patch is inspired by @Snowy1803 excellent work in swift and the patch: https://github.com/swiftlang/swift/pull/73334/files Add an instrumentation pass to llvm to collect dropped debug information variable statistics for every Function-level and Module-level IR pass. This patch creates adds the class DroppedVariableStats which iterates over every DbgRecord in a function or module before and after an optimization pass and counts the number of variables who's debug information has been dropped due to that pass, then prints that output to stdout in a csv format. I ran this patch on optdriver.cpp can see: Pass Name, Dropped Variables 'InstCombinePass', 1 'SimplifyCFGPass', 6 'JumpThreadingPass', 25

…m#109571) This PR fixes multiple bugs in `DuplicateFunctionElimination`. - Prevents elimination of function declarations. - Updates all symbol uses to reference unique function representatives. Fixes llvm#93483.

Fixes llvm#113011.

Implement VPlan-based cost computation for VPWidenSelectRecipe.

@Mel-Chen

34cdd67 accidentially dropped the case for VPWidenIntrinsicSC. Add it back again. Thanks @Mel-Chen for spotting this.

This patch implement `VPWidenCastRecipe::computeCost()` and skip cast recipies in the in-loop reduction.

Simplify code as suggested post-commit in llvm#110576.

…nstant value. (llvm#113079) This patch adds support for constant folding for the `erf` and `erff` libc functions.

…ctions (llvm#112726) This patch adds the assembly/disassembly for the following instructions: CBB<cc>, CBH<cc>, CB<cc>(immediate), CB<cc>(register) CBBLE, CBBLO, CBBLS, CBBLT CBHLE, CBHLO, CBHLS, CBHLT CBGE, CBHS, CBLE, CBLS (immediate) CBLE, CBLO, CBLS, CBLT(register) According to [1] [1]https://developer.arm.com/documentation/ddi0602 Co-authored-by: Momchil Velikov [email protected] Co-authored-by: Spencer Abson [email protected]

… (PR llvm#111970)" (llvm#112877) **Change relanded after feedback on failures and improvements to the check of the addend. Original PR llvm#111970** Changes from original patch: - The value that is being checked has changed, it is now correctly checking any Addend for the instruction, rather than the Value. The addend is kept within the Target data structure from my investigation. - Removed changes to the following tests due to the original behaviour being correct, and my original patch causing unexpected errors - llvm/test/MC/ARM/Windows/mov32t-range.s - llvm/test/MC/MachO/ARM/thumb2-movw-fixup.s As per the ARM ABI, the MOVT and MOVW instructions should have addends that fall within a 16bit signed range. LLVM does not check this so it is possible to use addends that are beyond the accepted range. These addends are silently truncated. A new check is added to ensure the addend falls within the expected range, rejecting an addend that falls outside with an error. Information relating to the ABI requirements can be found here: https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst#addends-and-pc-bias-compensation

Add include needed for DWARFContext to DwarfTransformer.h Add include needed for windows types like HRESULT to DIAUtils.h

…on windows' (llvm#109024) (llvm#112640) Fix missing extern templates for llvm::Registry use in other projects of llvm Windows doesn't implicitly import and merge exported symbols across shared libraries like Linux does so we need to explicitly export/import each instantiation of llvm::Registry. Updated LLVM_INSTANTIATE_REGISTRY to just be a full explicit template instantiation. This is part of the work to enable LLVM_BUILD_LLVM_DYLIB and LLVM plugins on window.

…sValidAddrSpaceCast (llvm#112493) So far, isValidAddrSpaceCast only allows casts to the flat address space and between the constant(32) address spaces. It does not allow casting between the global and constant address spaces, even though they alias. That affects, e.g., the lowering of memmoves from the constant to the global address space in LowerMemIntrinsics, since that requires aliasing address spaces to be castable. This patch relaxes isValidAddrSpaceCast and allows such casts. It also includes a memmove test that would crash with the previous implementation because the memmove IR lowering would not be applicable for the move from constant AS to global AS.

…onstructible even if a union member has a default member initializer (llvm#95854) (llvm#96301) Resolves llvm#95854 -- As per https://eel.is/c++draft/dcl.init#general-8.3

… (NFC) (llvm#113172) The `@llvm.experimental.vector.histogram.add` returns void.

llvm#113202) ... with non-constant initializers.

…movsb` (llvm#113161) When using `-mprefer-vector-width=128` with `-march=sandybridge` copying 3 cache lines in one go (192B) gets converted into `rep;movsb` which translate into a 60% hit in performance. Consecutive calls to `__builtin_memcpy_inline` (implementation behind `builtin::Memcpy::block_offset`) are not coalesced by the compiler and so calling it three times in a row generates the desired assembly. It only differs in the interleaving of the loads and stores and does not affect performance. This is needed to reland llvm#108939.

… lambdas (llvm#112896) Nested lambdas could refer to outer packs that would be expanded by a larger CXXFoldExpr, in which case that reference could happen to be a full expression containing intermediate types/expressions, e.g. SubstTemplateTypeParmPackType/FunctionParmPackExpr. They are designated as "UnexpandedPack" dependencies but don't introduce new packs anyway. This also handles a missed case for VarDecls, where the flag of ContainsUnexpandedPack was not propagated up to the surrounding lambda. Fixes llvm#112352

isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.

With the introduction of the nusw flag in GEPNoWrapFlags, it should be safe to weaken the check in LoopAccessAnalysis to just check the nusw flag on the GEP, instead of inbounds.

Fixed: llvm#113044 the type of `ArrayTypeTraitExpr` can be changed, use i32 directly is incorrect. --------- Co-authored-by: Eli Friedman <[email protected]>

…h instructions (llvm#112726)" This reverts commit dc84337. Reversting because the sanitizer fails with the following error llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp:502:56: runtime error: left shift of negative value -256

Replace unused analysis (VirtRegMap) dependency with the used one (SlotIndexes) Initializes `SlotIndexesWrapperPass` which is used by SILowerSGPRSpills to ensure that legacy pass manager finds it. Removes the initialization for `VirtRegMapWrapperPass` since it is not requested in this pass.

…111357) Solves the double free error.

Removes unnecessary extends on the indices passed into histogram instructions. It also removes the instruction when the mask is zero.

This patch adjusts the requires clause/expression parser to imply a requires clause if it is preceded by a bitwise and operator `&`, and assume it is a reference qualifier. The justification is that bitwise operations should not be used for requires expressions. This is a band-aid fix. The real problems lie in the lookahead heuristic in the same method. It may be worth it to rewrite that whole heuristic to track more state in the future, instead of just blindly marching forward across multiple unrelated definitions, since right now, the definition following the one with the requires clause can influence whether the heuristic chooses clause or expression. Fixes llvm#110485

…AS for when `generic` is available (llvm#112442) Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use `private` as the default address space. This is wrong for cases where the `generic` address space is available, and is corrected via this patch. In general, this AS map abuse is a bad hack and we should re-work it altogether, but at least after this patch we will stop being incorrect for e.g. OpenCL 2.0.

) Tweak VOP2eInst_Base so that it does not rely on !eq comparing an int value (-1) with a bits<5> value. This is to avoid a change in behaviour when llvm#112904 lands, which is a bug fix which has the side effect of implicitly casting template arguments to the declared template parameter type.

When the second operand of an incrementing while instruction is the maximum value, comparisons that include equality can never fail.

…annotated types. (llvm#113180) This issue is identified during the discussion of [this comment](llvm#112234 (comment)). There will be no release note for this fix as it is a follow-up to [llvm#106997](llvm#106997).

…110322)" (llvm#113124) This reverts commit 2026501. Failing bot: * https://lab.llvm.org/staging/#/builders/125/builds/389

Since we don't generate a full dependency graph of headers, we can greatly simplify the script that parses the result of --trace-includes. At the same time, we also unify the mechanism for detecting whether a header is a public/C compat/internal/etc header with the existing mechanism in header_information.py. As a drive-by this fixes the headers_in_modulemap.sh.py test which had been disabled by mistake because it used its own way of determining the list of libc++ headers. By consistently using header_information.py to get that information, problems like this shouldn't happen anymore. This should also unblock llvm#110303, which was blocked because of a brittle implementation of the transitive includes check which broke when the repository was cloned at a path like /path/__something/more.

This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on. Built by GCC 11. ``` Fix warning: llvm-project/clang/lib/Parse/ParseDeclCXX.cpp:3153:14: error: enumerated mismatch in conditional expression: ‘clang::diag::<unnamed enum>’ vs ‘clang::diag::<unnamed enum>’ [-Werror=enum-compare] 3152 | DS.isFriendSpecified() || NextToken().is(tok::kw_friend) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3153 | ? diag::err_friend_concept | ^~~~~~~~~~~~~~~~~~~~~~~~~~ 3154 | : diag:: | ~~~~~~~~ 3155 | err_concept_decls_may_only_appear_in_global_namespace_scope); ``` --------- Co-authored-by: Sirraide <[email protected]> Co-authored-by: cor3ntin <[email protected]>

…ess (llvm#112605) Treat calls to zero-param const methods as having stable return values (with a cache) to address issue llvm#58510. The cache is invalidated when non-const methods are called. This uses the infrastructure from PR llvm#111006. For now we cache methods returning: - ref to optional - optional by value - booleans We can extend that to pointers to optional in a next change.

llvm#112975) In preparation for future work on separating the output of the GNU/HLASM ASM dialects, we first separate the SystemZInstPrinter classes to two versions, one for each ASM dialect. The common code remains in a SystemZInstPrinterCommon class instead. --------- Co-authored-by: Tony Tao <[email protected]>

Turns out for double double LDBL_MANT_DIG == 106. This patch fixes the constant. Should fix the ppc buildbot. Previously: llvm#113235 llvm#113237 llvm#91651

) The IR lowering of memcpy/memmove intrinsics uses a target-specific type for its load/store operations. So far, the loaded and stored addresses are computed with GEPs based on this type. That is wrong if the allocation size of the type differs from its store size: The width of the accesses is determined by the store size, while the GEP stride is determined by the allocation size. If the allocation size is greater than the store size, some bytes are not copied/moved. This patch changes the GEPs to use i8 addressing, with offsets based on the type's store size. The correctness of the lowering therefore no longer depends on the type's allocation size. This is in support of PR llvm#112332, which allows adjusting the memcpy loop lowering type through a command line argument in the AMDGPU backend.

…lvm#112907) Since llvm@ddf2d62 , 0-d vectors are supported in VectorType. This patch removes 0-d vector handling with scalars for the TransferOpReduceRank pattern. This pattern specifically introduces tensor.extract_slice during vectorization, causing vectorization to not fold transfer_read/transfer_write slices properly. The changes in vectorization test files reflect this. There are other places where lowering patterns are still side-stepping from handling 0-d vectors properly, by turning them into scalars, but this patch only focuses on the vector.transfer_x patterns.

… (NFC) (llvm#113328) The previous submission looked like it triggered build failure https://lab.llvm.org/buildbot/#/builders/17/builds/3116, but this appears to be a spurious failure due to a flaky test.

)

…lvm#113168) This patch add assembly/disassembly and tests for sve bfscale instruction according to https://developer.arm.com/documentation/ddi0602 .

…vm#111568) KnownBits retrieved from DenseMap may invalidate if insertion requires a (re)growth. Fixes llvm#110930

…tensions (llvm#112341) Add support for the following Armv9.6-A memory systems extensions: FEAT_LSUI - Unprivileged Load Store FEAT_OCCMO - Outer Cacheable Cache Maintenance Operation FEAT_PCDPHINT - Producer-Consumer Data Placement Hints FEAT_SRMASK - Bitwise System Register Write Masks as documented here: https://developer.arm.com/documentation/109697/2024_09/Feature-descriptions/The-Armv9-6-architecture-extension Co-authored-by: Jonathan Thackray <[email protected]> --------- Co-authored-by: Jonathan Thackray <[email protected]>

…13251) TestUseSourceCache attempts to write to a build artifact copied from the source tree, and asserts the write succeeded. If the source tree is read only, the copy will also be read only, causing it to fail. When producing the build artifact, ensure that it is writable.

In `clang-scan-deps`, we're creating lots of `Module` instances. Allocating them all in a bump-pointer allocator reduces the number of retired instructions by 1-1.5% on my workload.

This keeps common operations together, and should make it easier to write re-usable dylib managers in the future (e.g. a DylibManager that uses the EPC's remote-execution APIs to implement load and lookup).

…ed (llvm#111911) Log errors to the (always-on) system log if they would otherwise get dropped by LLDB_LOG_ERROR.

- Accidentally left out of db21bd4.

…llvm#110447) When compiling HIP source for AMDGCN flavoured SPIR-V that is expected to be consumed by the ROCm HIP RT, it's not desirable to set the OpenCL Kernel CC on `__global__` functions. On one hand, this is not an OpenCL RT, so it doesn't compose with e.g. OCL specific attributes. On the other it is a "noisy" CC that carries semantics, and breaks overload resolution when using [generic dispatchers such as those used by RAJA](https://github.com/LLNL/RAJAPerf/blob/186d4194a5719788ae96631c923f9ca337f56970/src/common/HipDataUtils.hpp#L39).

…lvm#113193) This PR refactors the std::vector's initializer_list constructors to reduce code duplication. The constructors now call `__init_with_size` directly, reducing code duplication and improving readability and maintainability.

…lvm#109011) This commit enables 'llvm-dwarfdump --veriy' to verify the DWARF in foreign type units when using split DWARF for the .debug_names section.

…lvm#113335) This command-line option is now required while building the HIP applications (mainly for the host side) after we enabled __fp16 args and return values with patches D133885 & D145345.

…llvm#113234) I (re)discovered that dsymutil was instantiating two BinaryHolders: one for parsing the debug map and one for linking. That really defeats the purpose of the BinaryHolder as it serves as a cache. Fix the issue and remove an old FIXME.

…lvm#113238) Provide a option (--no-object-timestamp) to ignore object file timestamp mismatches. We already have a similar option for Swift modules (--no-swiftmodule-timestamp). rdar://123975869

…a helper (NFC)" (llvm#113340) Reverts llvm#113328 This change breaks a number of builds (e.g https://lab.llvm.org/buildbot/#/builders/25/builds/3504), for some reason. Reverting to do some troubleshooting.

The tag-html.test has been failing for me and [in CI](https://buildkite.com/llvm-project/github-pull-requests/builds/111277#0192a122-c5c9-4e4e-bc5b-7532fec99ae4) if Git happens to decide to check out the baseline file with Windows line endings. This fix for this is to call `tr` to strip Windows newlines when copying the baselines files to the test output directory before embedding them.

Fixes a bug in APFloat handling of E8M0 type (zero mantissa). Related PRs: - llvm#107127 - llvm#111028

This patch fixes: bolt/lib/Core/DIEBuilder.cpp:285:40: error: too many arguments to function call, expected 2, have 3

Fixes: llvm#112836

This patch fixes the build failure seen on z/OS: ``` llvm/clang/include/clang/ASTMatchers/ASTMatchers.h:7212:1: error: unknown type name 'CLANG_ABI' ```

…m#113294) This fixes layering violation introduced in 2fd01d7. The declaration is moved to `SemaTemplateInstantiate` section of `Sema.h`, after the file where it's implemented.

…TQ` (llvm#113295) This patch improves, but doens't fully resolve the layering violation, which stems from relying on Sema. There's one function that needs to convert enumerator to a string (`buildQualifier` in `FixItHintUtils.cpp`), but `Qualifiers::TQ` doesn't offer such function. Even more, the set of enumerators is not complete compared to `DeclSpec::TQ`, so I'm afraid that this would be a functional change.

Enable all valid registers for intrinsics that read from and write to global named registers.

…llvm#111084) - [x] Add a simple canonicalization for `mlir::index::AddOp`.

Recent change applied too strict check for old and src operands match. These shall be compatible, but not necessarily exactly the same. Fixes: SWDEV-493072

PR is to: 1. Simplify test update in llvm#113200 2. Make tests more comprehensive, currently interesting cases looks very basic: ``` ; CHECK-LABEL: @ICmpSGTAllOnes ; CHECK: icmp slt ; CHECK-NOT: call void @__msan_warning ; CHECK: icmp sgt ; CHECK-NOT: call void @__msan_warning ; CHECK: ret i1 ```

Fixes llvm#111212. This grows .text by 5.3% on CTMark, (or 2.6% large internal binary) Perf regressed by 1.6%. We will try to improve in follow up patches. It worth to pay some performance regression to fix correctness to avoid stuff like llvm#111212.

For consistency with other dialects and other CUF passes and files, this patch renames passes CufOpConversion to CUFOpConversion, CufImplicitDeviceGlobal to CUFDeviceGlobal. It also renames the file.

…lvm#113216) Until now debug info was printing the symbols names as-is and that resulted in invalid PTX when the symbols contained characters that are invalid for PTX. E.g. `__PRETTY_FUNCTION.something` Debug info is somewhat disconnected from the symbols themselves, so the regular "NVPTXAssignValidGlobalNames" pass can't easily fix them. As the "plan B" this patch catches printout of debug symbols and fixes them, as needed. One gotcha is that the same code path is used to print the names of debug info sections. Those section names do start with a '.debug'. The dot in those names is nominally illegal in PTX, but the debug section names with a dot are accepted as a special case. The downside of this change is that if someone ever has a `.debug*` symbol that needs to be referred to from the debug info, that label will be passed through as-is, and will still produce broken PTX output. If/when we run into a case where we need it to work, we could consider only passing through specific debug section names, or add a mechanism allowing us to tell section names apart from regular symbols. Fixes llvm#58491

…3376) Reverts llvm#113200 Breaks bots, see llvm#113200

…13378)

When printing a memory operand in MIR, this line https://github.com/llvm/llvm-project/blob/d37bc32a65651e647148236ffb9728ea2e77eac3/llvm/lib/CodeGen/MachineOperand.cpp#L1247 calls this https://github.com/llvm/llvm-project/blob/d37bc32a65651e647148236ffb9728ea2e77eac3/llvm/include/llvm/Support/Alignment.h#L238 which assumes `Rhs` (the size in this case) is positive. But Wasm reference types' size is set to 0: https://github.com/llvm/llvm-project/blob/d37bc32a65651e647148236ffb9728ea2e77eac3/llvm/include/llvm/CodeGen/ValueTypes.td#L326-L328 `getSize() > 0` condition was added with the Wasm reference types support in llvm@46667a1, and it looks it was removed in llvm#84751. This revives the condition so that Wasm reference types will not crash the MIR printer.

…llvm#113379) Reverts llvm#113376 Fixed with llvm#113378

) The convention is to use enum names that match the source spelling (up to upper/lower case), including names with underscores. Remove the special case from unparser, update tests.

This PR adds a verifier check for tosa.mul, requiring that the shift be 0 for float types. Fixes llvm#112716.

This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on. Built by GCC 11. Refactor the code to avoid the false warning llvm-project/llvm/tools/llvm-isel-fuzzer/llvm-isel-fuzzer.cpp llvm-project/llvm/tools/llvm-isel-fuzzer/llvm-isel-fuzzer.cpp: In function ‘int LLVMFuzzerInitialize(int*, char***)’: llvm-project/llvm/tools/llvm-isel-fuzzer/llvm-isel-fuzzer.cpp:141:43: error: ISO C++ forbids zero-size array ‘argv’ [-Werror=pedantic] 141 | ExitOnError ExitOnErr(std::string(*argv[0]) + ": error:"); |

Clang uses timestamp files to track the last time an implicitly-built PCM file was verified to be up-to-date with regard to its inputs. With `-fbuild-session-{file,timestamp}=` and `-fmodules-validate-once-per-build-session` this reduces the number of times a PCM file is checked per "build session". The behavior I'm seeing with the current scheme is that when lots of Clang instances wait for the same PCM to be built, they race to validate it as soon as the file lock gets released, causing lots of concurrent IO. This patch makes it so that the timestamp is written by the same Clang instance responsible for building the PCM while still holding the lock. This makes it so that whenever a PCM file gets compiled, it's never re-validated in the same build session. I believe this is as sound as the current scheme. One thing to be aware of is that there might be a time interval between accessing input file N and writing the timestamp file, where changes to input files 0..<N would not result in a rebuild. Since this is the case current scheme too, I'm not too concerned about that. I've seen this speed up `clang-scan-deps` by ~27%.

llvm#112904 will add typechecking to submulticlass arguments, and these ones are currently mistyped.

We already have the .o, there is no reason to go .o -> YAML -> .o

…13350) This corrects a couple off by ones related to the sampling of **instrumented** counters, and enables setting 100% rates for burst sampling (burst duration = period). Off by ones: Prior to this change it was impossible to set a period of 65535 because this was converted to fast sampling which rollsover at USHRT_MAX + 1 (65536). Similarly the burst durations would collect burst duration + 1 counts as they used an ULE comparison. 100% sampling: Although this is not useful for a productionized use case, it does allow for more deterministic testing with the sampling checks in place. After all the off by ones are fixed, allowing for 100% sampling is a matter of letting burst duration = period.

Reverts llvm#68176 Introduced BuildBot failure: llvm#68176 (comment)

With sampled instrumentation (llvm#69535), profile counts may appear corrupt and `fixFuncEntryCount` may assert. In particular a function can have a 0 block count for its entry, while later blocks are non zero. This is only likely to happen for colder functions, so it is reasonable to take any action that does not crash. Here we simply bail from fixing the entry count.

…0569) Extend the logic added in 123c036 (llvm#76612) to support pointers to non-builtin types by using the mangled name of the canonical type. PR: llvm#110569

…cessible outside of Sema (llvm#113206) Moves `IsIntangibleType` from SemaHLSL to Type class and renames it to `isHLSLIntangibleType`. The existing `isHLSLIntangibleType` is renamed to `isHLSLBuiltinIntangibleType` and updated to return true only for the builtin `__hlsl_resource_t` type. This change makes `isHLSLIntangibleType` functionality accessible outside of Sema, for example from clang CodeGen.

Add support for ``llvm.nvvm.fshl.clamp`` and ``llvm.nvvm.fshr.clamp`` intrinsics. These intrinsics are similar to the generic llvm funnel shift, except that the shift value is clamped to the integer width. Currently only ``i32`` is supported and is implemented with the `shf.[rl].clamp.b32` PTX instruction.

…2802) Store Swift mangled names in DW_AT_linkage_name. The Swift compiler emits only the type mangled name in debug information, and LLDB uses those mangled names as keys to look up size, alignment, fields, etc from either reflection metadata or Swift modules. Additionally, emit types linkage names for types into the accelerator table if they exist and they're different from the display name.

… invalid (llvm#104540) Fixes llvm#102945.

…e with flexible array init (llvm#113336) Fixes: llvm#113187 Avoid to create init function since clang does not support global variable with flexible array init. It will cause assertion failure later.

This patch adds functionality for atomically reading `llvm.struct` types. Fixes: llvm#93441

llvm#113260) …tyle

Fixes llvm#113256.

Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368

…ds (llvm#113264) Looks like having a constant in `Z` also caused infinite loops. This fixes llvm#113240.

…ructs (llvm#113045) According to OpenMPv5.2 1.2.6, "For Fortran, a scalar variable with intrinsic type, as defined by the base language, excluding character type.". Likewise, section 4.3.1.3 states that atomic operations are on "scalar variables of intrinsic type". This PR hence introduces a check to error out when CHARACTER type is used in atomic operations. Fixes llvm#112918

…lvm#113108) Restricts the verifier for tensor.pack and tensor.unpack Ops so that the following is no longer allowed: ```mlir %c8 = arith.constant 8 : index %0 = tensor.pack %input inner_dims_pos = [0, 1] inner_tiles = [8, %c8] into %output : tensor<?x?xf32> -> tensor<?x?x8x8xf32> ``` Specifically, in line with other Tensor Ops, require: * a dynamic dimensions for each (dynamic) SSA value, * a static dimension for each static size (attribute). In the example above, a static dimension (8) is mixed with a dynamic size (%c8). Note that this is mostly deleting existing code - that's because this change simplifies the logic in verifier. For more context: * https://discourse.llvm.org/t/tensor-ops-with-dynamic-sizes-which-behaviour-is-more-correct

This fixes the infer output shape of TOSA slice op for start/size values that are out-of-bound or -1 added tests to check: - size = -1 - size is out of bound - start is out of bound Signed-off-by: Tai Ly <[email protected]>

Fix ordering of checks in atomic02.f90.

Reverts llvm#108306

…cl (llvm#113276) This is more similar to the diagnostic output of the current interpreter

The patch adds graceful handling of incorrectly constructed MLIR operation with less operands than expected.

…#104764)

…h-abs feature This is to align with GAS. Additionally, there are some minor changes: the definition and expansion process of the TLS_DESC pseudo-instruction were modified in the same style. Reviewed By: heiher Pull Request: llvm#112858

Current implementation of `slice(N)` is buggy, since `slice(N, size() - N)` will never fail the assertion `assert(N+M <= size() && "Invalid specifier")` above, even `N > size()`.

To match the convention and DarwinAsmParser.

Don't use plain `if` for things that are compile-time constants. Instead, use `if constexpr`. This both ensures that these are properly wired up constant expressions as intended, and prevents warnings from the compiler about useless `if` checks that look in the source like they're meant to do something at runtime but will just be compiled away.

The `zx_cprng_draw` system call has no limit on how much you can draw. Co-authored-by: Marco Vanotti <[email protected]>

) This intrinsic was introduced by llvm#92289 and currently we just expand it for RISC-V. This patch adds custom lowering for this intrinsic and simply maps it to `vcompress` instruction. Fixes llvm#113242.

Follow-up to fec1b6f

…b].{b/h} instructions (llvm#113255) Two options for clang: -mlam-bh & -mno-lam-bh. Enable or disable amswap[__db].{b/h} and amadd[__db].{b/h} instructions. The default is -mno-lam-bh. Only works on LoongArch64.

Only compute the Latency component of a specialisation's Bonus when necessary, to avoid unnecessarily computing the Block Frequency Information for a Function.

Fixes llvm#113154 The encodings used for llvm.trap() on ARM were all marked as barriers and terminators. This lead to stack frame destroy code being inserted before the trap if the trap was the last thing in the function and it had no return statement. ``` void fn() { volatile int i = 0; __builtin_trap(); } ``` Produced: ``` fn: push {r11, lr} << stack frame create <...> mov sp, r11 pop {r11, lr} << stack frame destroy .inst 0xe7ffdefe << trap bx lr ``` All the other targets don't mark them this way, instead they mark them with isTrap. I've changed ARM to do this, which fixes the code generation: ``` fn: push {r11, lr} << stack frame create <...> .inst 0xe7ffdefe << trap mov sp, r11 pop {r11, lr} << stack frame destroy bx lr ``` I've updated the existing trap test to force the need for a stack frame, then check that the instruction immediately after the trap is resetting the stack pointer. debugtrap was already working but I've added the same checks for it anyway.

Co-authored-by: Alex Richardson <[email protected]>

…literal in StackAddressEscape This patch simplifies the diagnostic message in the core.StackAddrEscape for stack memory associated with compound literals by removing the redundant "returned to caller" suffix. Example: https://godbolt.org/z/KxM67vr7c ```c // clang --analyze -Xanalyzer -analyzer-checker=core.StackAddressEscape void* compound_literal() { return &(unsigned short){((unsigned short)0x22EF)}; } ``` warning: Address of stack memory associated with a compound literal declared on line 2 **returned to caller returned to caller** [core.StackAddressEscape]

This PR updates the cast to bool from IntN to treat any non-zero value as TRUE. This makes the cast more resilient to non-generic (i.e. "non 1") TRUE values. Signed-off-by: Dmitriy Smirnov <[email protected]>

…3305) Extends `nowait` support for other device directives. This PR refactors the task generation utils used for the `target` directive so that they are general enough to be reused for other device directives as well.

… docs (llvm#112869) * Note up front that the author may not have permissions to use the merge button and should ask a reviewer to do those steps. * Make it clear that a single commit PR can be landed with a single button click. * There are in fact 3 ways to land a multi-commit PR. * Order the ways in increasing amount of overhead for the PR author. * Put them in bullet point sections so they are visually separate. * Add a note that force pushes can be problematic when the PR has multiple authors, but don't go too much into how to solve that, Git's docs are better here anyway.

Until now, these options have been hardcoded as downstream patches in LLD. Add them to the driver so that the private patches can be removed. PS5 only. The implementation of these behaviours will remain in the proprietary linker on PS4. SIE tracker: TOOLCHAIN-16704

hlfir.assign currently has the `MemoryEffects<[MemWrite]` which makes it look like it can write to anything. This is good for some cases where the assign effect cannot be precisely described through the MLIR side effect API (e.g., when the LHS is a descriptor and it is not possible to get an OpOperand describing the data address, or when derived type are involved and finalization could be called, or user defined assignment for some components). For the most common case of hlfir.assign on intrinsic types without whole allocatable LHS, this is pessimistic. This patch implements a finer description of the side effects when possible, and also adds the proper read/allocate/free effects when relevant. The ultimate goal is to suppress the generation of temporary for the LHS address when dealing with an assignment to a vector subscripted LHS where the vector subscript is an array constructor that does not refer to the LHS (as in `x([a,b]) = y`). Two more patches will follow to enable this.

…oca (llvm#113321) See https://reviews.llvm.org/D157626 for the rational of declare having side effects. The write effect is to scary for passes that look for read/write effects without caring about the resource affected. I know Slava asked for it, but I think the creation of the `DebuggingResource` was enough and that a write is too much. The alloca effect is sufficient to prevent DCE to remove it, which is all we care about currently. This currently is flag as a reason for creating LHS temporary in assignment to vector subscripted entity with array constructor. There is a lot of read/write side effect analysis in the "lower-hlfir-ordered-assignments" pass, and I feel like we will just keep adding weird "debug ressource" bypassing here and there with these side effects.

…nt (llvm#113330) Last patch required to avoid creating a temporary for the LHS when dealing with `x([a,b]) = y`. The code dealing with "ordered assignments" (where, forall, user and vector subscripted assignments) is saving the evaluated RHS/LHS and masks if they have write effects because this write effects should not be evaluated when they affect entities that may be written to in other contexts after the evaluation and before the re-evaluation. But when dealing with write to storage allocated in the region for the expression being evluated, there is no problem to re-evaluate the write: it has no effect outside of the expression evaluation that owns the allocation. In the case of `x([a,b]) = y`, the temporary is created for the vector subscript. Raising the HLFIR abstraction for simple array constructors may be a good idea, but local temps are created in other contexts, so this fix is more generic.

…ons (llvm#113292) This patch adds the zeroing predicate forms (Pg/z) of the following instructions: - FCVTXNT - FCVTNT - FCVTLT - BFCVTNT As specified in https://developer.arm.com/documentation/ddi0602. Co-authored-by: Spencer Abson [[email protected]](mailto:[email protected])

…ls (llvm#113283) On ARM64EC, external function calls emit a pair of weak-dependency aliases: `func` to `#func` and `#func` to the `func` guess exit thunk (instead of a single undefined `func` symbol, which would be emitted on other targets). Allow such aliases to be overridden by lazy archive symbols, just as we would for undefined symbols.

The Intel C++ Compiler (ICX) passes linker flags through the driver unlike MSVC and clang-cl, and therefore needs them to be prefixed with `/Qoption,link` (the equivalent of `-Wl,` for gcc on *nix). Use `LINKER:` prefix wherever supported by cmake, when that's not possible fall-back to `${CMAKE_CXX_LINKER_WRAPPER_FLAG}`. CMake replaces these with `/Qoption,link` for ICX and with the empty string for MSVC and clang-cl. For `target_link_libraries` neither `LINKER:` (not supported prior to CMake 3.32) nor `${CMAKE_CXX_LINKER_WRAPPER_FLAG}` (does not begin with `-` would be taken as a library name) works, use `-Qoption,link` directly within a conditional generator expression that we're linking with ICX. For MSVC and clang-cl no functional change is intended. Tested by compiling with ICX and setting `CMAKE_(EXE|SHARED|STATIC|MODULE)_LINKER_FLAGS_INIT` to `-Werror=unknown-argument`. RFC: https://discourse.llvm.org/t/rfc-cmake-linker-flags-need-wl-equivalent-for-intel-c-icx-on-windows/82446

…lazy archive symbol to the symbol table on ARM64EC (llvm#113284) On ARM64EC, a function symbol may appear in both mangled and demangled forms: - ARM64EC archives contain only the mangled name, while the demangled symbol is defined by the object file as an alias. - x86_64 archives contain only the demangled name (the mangled name is usually defined by an object referencing the symbol as an alias to a guess exit thunk). - ARM64EC import files contain both the mangled and demangled names for thunks. If more than one archive defines the same function, this could lead to different libraries being used for the same function depending on how they are referenced. Avoid this by checking if the paired symbol is already defined before adding a symbol to the table.

…m#112928) Member pointers refer to data or function members of a `CXXRecordDecl` and require a `MSInheritanceAttr` in order to be complete. Without that we cannot calculate their size in memory. The attempt has been causing a crash further down in the clang AST context. In order to implement the feature, DWARF will need a new attribtue to convey the information. For the moment, this patch teaches LLDB to handle to situation and avoid the crash.

…lvm#111130) Before this patch, redundant COPY couldn't be removed for the following case: ``` $R0 = OP ... ... // Read of %R0 $R1 = COPY killed $R0 ``` This patch adds support for tracking the users of the source register during backward propagation, so that we can remove the redundant COPY in the above case and optimize it to: ``` $R1 = OP ... ... // Replace all uses of %R0 with $R1 ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reapply " [XRay] Add support for instrumentation of DSOs on x86_64 (#90959)" #112930

Reapply " [XRay] Add support for instrumentation of DSOs on x86_64 (#90959)" #112930

Commits on Oct 21, 2024

Commits on Oct 22, 2024

Commits on Oct 23, 2024

Commits on Oct 24, 2024