Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the 'initializes' attribute langref and support #84803

Merged
merged 39 commits into from
Jun 21, 2024
Merged

Conversation

haopliu
Copy link
Contributor

@haopliu haopliu commented Mar 11, 2024

We propose adding a new LLVM attribute, initializes((Lo1,Hi1),(Lo2,Hi2),...), which expresses the notion of memory space (i.e., intervals, in bytes) that the argument pointing to is initialized in the function.

Will commit the attribute inferring in the follow-up PRs.

https://discourse.llvm.org/t/rfc-llvm-new-initialized-parameter-attribute-for-improved-interprocedural-dse/77337

@haopliu haopliu marked this pull request as ready for review March 11, 2024 21:23
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 11, 2024

@llvm/pr-subscribers-llvm-adt
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-ir

Author: Haopeng Liu (haopliu)

Changes

We propose adding a new LLVM attribute, initialized((Lo1,Hi1),(Lo2,Hi2),...), which expresses the notion of memory space (i.e., intervals, in bytes) that the argument pointing to is initialized in the function.

Will commit the attribute inferring in the follow-up PRs.

https://discourse.llvm.org/t/rfc-llvm-new-initialized-parameter-attribute-for-improved-interprocedural-dse/77337


Full diff: https://github.com/llvm/llvm-project/pull/84803.diff

1 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+22)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index b70220dec92615..39a555edfa80d6 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -1621,6 +1621,28 @@ Currently, only the following parameter attributes are defined:
     ``readonly`` or a ``memory`` attribute that does not contain
     ``argmem: write``.
 
+``initialized((Lo1,Hi1),...)``
+    This attribute is a list of const ranges in ascending order with no
+    overlapping or continuous. It indicates that the function initializes the
+    memory through the pointer argument, [%p+LoN, %p+HiN): there are no reads,
+    and no special accesses (such as volatile access or untrackable capture)
+    before the initialization in the function. LoN/HiN are 64-bit ints;
+    negative values are allowed in case a pointer to partway through the
+    allocation is passed to.
+
+    This attribute implies that the function initializes and does not read
+    before initialization through this pointer argument, even though it may
+    read the memory before initialization that the pointer points to, such
+    as through other arguments.
+
+    The ``writable`` or ``dereferenceable`` attribute does not imply
+    ``initialized`` attribute, and ``initialized`` does not imply ``writeonly``
+    since cases that read from the pointer after write, can be ``initialized``
+    but not ``writeonly``.
+
+    Note that this attribute does not apply to the unwind edge: the memory may
+    not actually be written to when unwinding happens.
+
 ``dead_on_unwind``
     At a high level, this attribute indicates that the pointer argument is dead
     if the call unwinds, in the sense that the caller will not depend on the

@@ -1621,6 +1621,28 @@ Currently, only the following parameter attributes are defined:
``readonly`` or a ``memory`` attribute that does not contain
``argmem: write``.

``initialized((Lo1,Hi1),...)``
This attribute is a list of const ranges in ascending order with no
overlapping or continuous. It indicates that the function initializes the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "with no overlapping or adjoining list elements"? or something like that (felt like it was missing a word at the end at least)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should start this off with a one sentence overview of the attribute. So starting with something like "This indicates that the function initializes the ranges of the pointer parameter's memory." Then describe what "initialize" means. Then the random details like non-overlapping/continuous ranges at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

and no special accesses (such as volatile access or untrackable capture)
before the initialization in the function. LoN/HiN are 64-bit ints;
negative values are allowed in case a pointer to partway through the
allocation is passed to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "in case the argument points partway into an allocation." ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

as through other arguments.

The ``writable`` or ``dereferenceable`` attribute does not imply
``initialized`` attribute, and ``initialized`` does not imply ``writeonly``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "does not imply [the] initialized attribute, and ... writeonly, since initialized allows reading from the pointer after writing." ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@aeubanks
Copy link
Contributor

we should combine this PR with the PR that adds support for this in LLVM, or else it's weird if we're documenting something that LLVM doesn't support yet

since cases that read from the pointer after write, can be ``initialized``
but not ``writeonly``.

Note that this attribute does not apply to the unwind edge: the memory may
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in the RFC, we should be more accurate here. the part of the attribute where the memory is read from before a write still must apply on the unwind edge. Also, I'd move this up since this is important to the semantics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I meant that the memory is not read from before a write

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@aeubanks
Copy link
Contributor

unintentional add of FunctionAttrs.cpp change?

we should combine this PR with the PR that adds support for this in LLVM, or else it's weird if we're documenting something that LLVM doesn't support yet

specifically just the attribute support (e.g. bitcode) not any of the inference or usage of it

llvm/docs/LangRef.rst Show resolved Hide resolved
special accesses (such as volatile access or untrackable capture) before
the initialization write in the function.

This attribute implies that the function initializes and does not read
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't use implies.

`This attribute only holds for the memory accessed via this pointer parameter. Other arbitrary accesses to the same memory via other pointers are allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

written to when unwinding happens.

The ``writable`` or ``dereferenceable`` attribute does not imply the
``initialized`` attribute, and ``initialized`` does not imply ``writeonly``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd separate this into two sentences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

``initialized`` attribute, and ``initialized`` does not imply ``writeonly``
since ``initialized`` allows reading from the pointer after writing.

This attribute is a list of const ranges in ascending order with no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/const/constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, done!

since ``initialized`` allows reading from the pointer after writing.

This attribute is a list of const ranges in ascending order with no
overlapping or adjoining list elements. LoN/HiN are 64-bit ints, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't heard "adjoining" used in this context, I think "consecutive" is more commonly used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

since ``initialized`` allows reading from the pointer after writing.

This attribute is a list of const ranges in ascending order with no
overlapping or adjoining list elements. LoN/HiN are 64-bit ints, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put LoN/HiN in double backticks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@nikic
Copy link
Contributor

nikic commented Mar 13, 2024

As a heads up, the inference implementation for this (or at least the available prototype) does not look viable from a compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=69b09b43b0a2057918078edb401adab888d1014b&to=0bd68ae2d56783377acc0aa5d7958b47411b8342&stat=instructions:u

@haopliu
Copy link
Contributor Author

haopliu commented Mar 14, 2024

As a heads up, the inference implementation for this (or at least the available prototype) does not look viable from a compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=69b09b43b0a2057918078edb401adab888d1014b&to=0bd68ae2d56783377acc0aa5d7958b47411b8342&stat=instructions:u

Thanks for this heads up! This PR only adds the attribute support. Will further tune the inferring performance and post the updated compile time tracker in the inference PR.

@haopliu
Copy link
Contributor Author

haopliu commented Mar 14, 2024

Thank you all! Update the LangRef, and add the attribute support. Please take another look :-D

@haopliu haopliu changed the title Add 'initialized' attribute langref Add the 'initialized' attribute langref and support Mar 14, 2024
@@ -38,6 +38,9 @@ class IntAttr<string S, list<AttrProperty> P> : Attr<S, P>;
/// Type attribute.
class TypeAttr<string S, list<AttrProperty> P> : Attr<S, P>;

/// Const range list attribute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put closer to the ConstantRangeAttr?

@@ -318,6 +321,9 @@ def Writable : EnumAttr<"writable", [ParamAttr]>;
/// Function only writes to memory.
def WriteOnly : EnumAttr<"writeonly", [ParamAttr]>;

/// Pointer argument memory [%p+LoN, %p+HiN) is initialized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list looks almost sorted, so may be better to put it near InAlloca or ?

@@ -307,6 +308,10 @@ namespace llvm {
bool AllowParens = false);
bool parseOptionalCodeModel(CodeModel::Model &model);
bool parseOptionalDerefAttrBytes(lltok::Kind AttrKind, uint64_t &Bytes);
bool parseConstRange(std::pair<int64_t, int64_t> &Range);
bool parseInitializedRanges(
lltok::Kind AttrKind,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the AttrKind be hardcoded within the function instead of taking as parameter? (unlike DerefAttrBytes, there is only one variation of the attribute, I believe?)

@@ -94,6 +94,8 @@ class Attribute {

static const unsigned NumIntAttrKinds = LastIntAttr - FirstIntAttr + 1;
static const unsigned NumTypeAttrKinds = LastTypeAttr - FirstTypeAttr + 1;
static const unsigned NumConstRangeListAttrKinds =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used by anything (if not, can remove)? I don't see the ConstantRangeAttrKind version of it, and hard to see "NumTypeAttrKinds" used

@@ -186,6 +193,9 @@ class Attribute {
/// Return true if the attribute is a type attribute.
bool isTypeAttribute() const;

/// Return true if the attribute is a const range list attribute.
bool isConstRangeListAttribute() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wonder if should s/Const/Constant/ to be consistent with isConstantRangeAttribute, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, no need to abbreviate

@@ -362,6 +362,21 @@ class FoldingSetNodeID {
}
}

void AddRanges(const SmallVector<std::pair<int64_t, int64_t>, 16> &Ranges) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use ArrayRef or something to not need specific SmallVector inline size of 16 here? "Prefer to use ArrayRef or SmallVectorImpl as a parameter type." under https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h

@@ -226,6 +236,10 @@ class Attribute {
/// attribute to be a ConstantRange attribute.
ConstantRange getValueAsConstantRange() const;

/// Return the attribute's value as a const range list. This requires the
/// attribute to be a const range list attribute.
SmallVector<std::pair<int64_t, int64_t>, 16> getValueAsRanges() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to return an ArrayRef instead of a copy? Will the underlying storage lifetime work out?

if (parseInt64(Range.first))
return true;

if (EatIfPresent(lltok::comma)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be an error to be missing the comma?

@@ -930,11 +932,24 @@ void ModuleBitcodeWriter::writeAttributeGroupTable() {
Record.push_back(getAttrKindEncoding(Attr.getKindAsEnum()));
if (Ty)
Record.push_back(VE.getTypeID(Attr.getValueAsType()));
} else {
} else if (Attr.isConstantRangeAttribute()) {
assert(Attr.isConstantRangeAttribute());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can remove the assert?

Check(!Inits.empty(), "Attribute 'initialized' does not support empty list",
V);

for (size_t i = 1; i < Inits.size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps check low < high for the same "i"?

@@ -226,6 +236,10 @@ class Attribute {
/// attribute to be a ConstantRange attribute.
ConstantRange getValueAsConstantRange() const;

/// Return the attribute's value as a const range list. This requires the
/// attribute to be a const range list attribute.
SmallVector<std::pair<int64_t, int64_t>, 16> getValueAsRanges() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use ConstantRange instead of std::pair, especially since it already has intersection/union methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, 16 elements for the small vector optimization seems very high, I feel the vast majority of cases will be at most 4. I'd choose 4 (or even 2)

@@ -616,6 +652,23 @@ std::string Attribute::getAsString(bool InAttrGrp) const {
return Result;
}

if (hasAttribute(Attribute::Initialized)) {
auto Ranges = getValueAsRanges();
if (Ranges.empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this allowed? I think we should forbid an empty list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see the verifier check below, so I don't think we need this check

@@ -1993,6 +1993,14 @@ void Verifier::verifyParameterAttrs(AttributeSet Attrs, Type *Ty,
Attrs.hasAttribute(Attribute::ReadOnly)),
"Attributes writable and readonly are incompatible!", V);

Check(!(Attrs.hasAttribute(Attribute::Initialized) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true. A function that immediately unwinds can be marked as initializing all arguments according to our semantics, but is also readnone

llvm/docs/LangRef.rst Show resolved Hide resolved
@@ -186,6 +193,9 @@ class Attribute {
/// Return true if the attribute is a type attribute.
bool isTypeAttribute() const;

/// Return true if the attribute is a const range list attribute.
bool isConstRangeListAttribute() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, no need to abbreviate

parameter. Other arbitrary accesses to the same memory via other pointers
are allowed.

Note that this attribute does not apply to the unwind edge: the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now redundant with the unwind comment above

@nikic
Copy link
Contributor

nikic commented Mar 15, 2024

As a heads up, the inference implementation for this (or at least the available prototype) does not look viable from a compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=69b09b43b0a2057918078edb401adab888d1014b&to=0bd68ae2d56783377acc0aa5d7958b47411b8342&stat=instructions:u

Thanks for this heads up! This PR only adds the attribute support. Will further tune the inferring performance and post the updated compile time tracker in the inference PR.

The main thing I would be concerned about at this point is whether the entire approach is viable or not -- I'd rather not add the attribute (plus a whole new attribute kind) if it later turns out that this is too expensive.


Anyway, a couple of high level comments:

  • I'm not sure this is the ideal attribute name. initialized to me sounds like this attribute promises that the memory is initialized on entry to the function, not that the function will initialize it. Maybe will_initialize is a better name? Or initializes?
  • How important is it to represent multiple ranges? We have just added support for ConstantRange attributes, so a single range can be represented out of the box. This also avoids the tricky question of how to deal with non-zero start offsets in DSE, which we don't really have AA support for right now. My intuition would be that you usually get whole object initializations -- but maybe this is not the case due to padding holes?
  • Failing that, I feel like we should still piggy-back more on the new ConstantRange attribute kind. We could convert that into a list of ConstantRange stored as TrailingObjects and make the current single-range case a special case of the general multiple-range case.

@aeubanks
Copy link
Contributor

aeubanks commented Mar 25, 2024

The main thing I would be concerned about at this point is whether the entire approach is viable or not -- I'd rather not add the attribute (plus a whole new attribute kind) if it later turns out that this is too expensive.

+1

Anyway, a couple of high level comments:

  • I'm not sure this is the ideal attribute name. initialized to me sounds like this attribute promises that the memory is initialized on entry to the function, not that the function will initialize it. Maybe will_initialize is a better name? Or initializes?

Makes sense, initializes sounds good.

  • How important is it to represent multiple ranges? We have just added support for ConstantRange attributes, so a single range can be represented out of the box. This also avoids the tricky question of how to deal with non-zero start offsets in DSE, which we don't really have AA support for right now. My intuition would be that you usually get whole object initializations -- but maybe this is not the case due to padding holes?

Yes the concern was padding holes. @haopliu do you have any data on how often padding prevents whole object initialization?

@aeubanks
Copy link
Contributor

Even if we represent padding more precisely with multiple ranges in initializes, clang generates a memset of the entire alloca and DSE can only remove the part of the memset that corresponds to the first range and the last range, since it doesn't split memsets. So if we had one padding hole and used multiple ranges to represent everything else, we'd still end up with a store to the padding hole.

I think !tbaa.struct was designed for this, but we don't emit it for -ftrivial-auto-var-init. Also it might not apply to memset right now, only memcpy?

@aeubanks
Copy link
Contributor

godbolt seems to not show metadata (I'm not seeing it on a memcpy on godbolt but am seeing it locally), but my point still stands

@aeubanks
Copy link
Contributor

actually I'm unsure if it would be legal to remove stores on padding bytes

@aeubanks
Copy link
Contributor

@nikic any more concerns?

@aeubanks
Copy link
Contributor

ping @nikic

llvm/include/llvm/IR/ConstantRangeList.h Show resolved Hide resolved
llvm/include/llvm/IR/ConstantRangeList.h Show resolved Hide resolved
llvm/lib/Bitcode/Reader/BitcodeReader.cpp Outdated Show resolved Hide resolved
llvm/lib/Bitcode/Reader/BitcodeReader.cpp Outdated Show resolved Hide resolved
llvm/lib/Bitcode/Writer/BitcodeWriter.cpp Outdated Show resolved Hide resolved
llvm/lib/IR/AttributeImpl.h Outdated Show resolved Hide resolved
assert(Size > 0);
unsigned BitWidth = Val.front().getLower().getBitWidth();
for (unsigned I = 0; I != Size; ++I) {
assert(BitWidth == Val[I].getLower().getBitWidth());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(BitWidth == Val[I].getLower().getBitWidth());
assert(BitWidth == Val[I].getBitWidth());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this code snippet.

unsigned BitWidth = Val.front().getLower().getBitWidth();
for (unsigned I = 0; I != Size; ++I) {
assert(BitWidth == Val[I].getLower().getBitWidth());
new (&TrailingCR[I]) ConstantRange(BitWidth, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you use uninitialized_copy instead of performing a dummy initialization first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this code snippet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to uninitialized_copy!

// If we didn't find any existing attributes of the same shape then create a
// new one and insert it.
PA = new (pImpl->ConstantRangeListAttributeAlloc.Allocate(
ConstantRangeListAttributeImpl::totalSizeToAlloc(Val)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, this does something completely different from what you think it does. It allocates an array of sizeof(ConstantRangeListAttributeImpl) * totalSizeToAlloc(Val).

It's not possible to use SpecificBumpPtrAllocator with a dynamically sized class, it can only be used to allocate objects of fixed size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have a variable number of ConstantRanges in a ConstantRangeListAttributeImpl and they need a destructor, I don't see a nice way of using trailing objects and bump allocators. I think we should go back to ConstantRangeListAttributeImpl containing a ConstantRangeList and using SpecificBumpPtrAllocator<ConstantRangeListAttributeImpl> so that we allocate the right amount of memory and properly call the destructor. most of the time we'll be in the non-allocating version of the SmallVector<ConstantRange> anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the attr implementation back to ConstantRangeList.

@haopliu
Copy link
Contributor Author

haopliu commented May 30, 2024

Thanks for the comments, Nikita and Arthur! @nikic please take another look.

Copy link
Contributor

@aeubanks aeubanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic ping

llvm/include/llvm/IR/ConstantRangeList.h Show resolved Hide resolved
@@ -858,6 +859,14 @@ class BitcodeReader : public BitcodeReaderBase, public GVMaterializer {
}
}

Expected<ConstantRange>
readBitWidthAndConstantRange(ArrayRef<uint64_t> Record, unsigned &OpNum) {
if (Record.size() - OpNum < 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd check 1 instead of 3 and let readConstantRange deal with its own reading error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

static void Profile(FoldingSetNodeID &ID, Attribute::AttrKind Kind,
const ConstantRangeList &CRL) {
ID.AddInteger(Kind);
ID.AddInteger(CRL.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also add bit width here now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice catch. Done!

return error("Too few records for constant range list");
unsigned RangeSize = Record[i++];
unsigned BitWidth = Record[i++];
if (i + 2 * RangeSize > e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove this hoisted i + 2 * RangeSize > e check, since it's not always 2 elements per RangeSize anymore and readConstantRange is doing the checks in the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, done!

@aeubanks
Copy link
Contributor

I think @nikic's concerns have been addressed and there are no major objections to this patch going in. We can address any post-commit feedback with followup PRs, but I'd like to get this in so we can start reviewing the DSE + inference PRs.

@nikic
Copy link
Contributor

nikic commented Jun 17, 2024

Why did this change back to storing ConstantRangeList in the Attribute rather than the efficient TrailingObjects / ArrayRef representation? If this was just because of the memory leak issue, I think an easy to solve that would have been to a) use the normal Alloc for allocation (instead of a separate SpecificBumpPtrAllocator) and b) add a vector to which all allocated pointers are added. Then in the LLVMContext dtor call the dtors of everything in the vector.

@haopliu
Copy link
Contributor Author

haopliu commented Jun 18, 2024

Good point. Changed the attr impl to use normal Alloc w/ a vector and explicitly call each attr's dtor in LLVMContext dtor. Please take another look. Thanks!

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more nits, but mostly this looks good to me.

llvm/lib/IR/AttributeImpl.h Outdated Show resolved Hide resolved
llvm/lib/IR/Attributes.cpp Outdated Show resolved Hide resolved
llvm/lib/IR/Attributes.cpp Outdated Show resolved Hide resolved
llvm/lib/IR/Attributes.cpp Outdated Show resolved Hide resolved
llvm/lib/IR/Attributes.cpp Outdated Show resolved Hide resolved
llvm/lib/IR/ConstantRangeList.cpp Outdated Show resolved Hide resolved
llvm/lib/IR/ConstantRangeList.cpp Outdated Show resolved Hide resolved
[](const ConstantRange &a, const ConstantRange &b) {
return a.getLower().slt(b.getLower());
});
if (LowerBound != Ranges.end() && *LowerBound == NewRange)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking strict equality here, could check whether LowerBound contains NewRange instead? That would handle more cases via fast path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, changed to check whether LowerBound contains NewRange.

V);

Check(Inits[0].getLower().slt(Inits[0].getUpper()),
"Attribute 'initializes' requires interval lower less than upper", V);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse isOrderedRanges here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 58 to 59
LowerBound->getLower().eq(NewRange.getLower()) &&
LowerBound->getUpper().sge(NewRange.getUpper()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LowerBound->getLower().eq(NewRange.getLower()) &&
LowerBound->getUpper().sge(NewRange.getUpper()))
LowerBound->contains(NewRange))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't notice ConstantRange::contains(). Nice!

@haopliu haopliu merged commit 5ece35d into llvm:main Jun 21, 2024
6 of 7 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 21, 2024

LLVM Buildbot has detected a new failure on builder clang-cuda-l4 running on cuda-l4-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/101/builds/483

Here is the relevant piece of the build log for the reference:

Step 3 (annotate) failure: '/buildbot/cuda-build --jobs=' (failure)
...
+ step_summary ''
@@@STEP_SUMMARY_CLEAR@@@
+ echo @@@STEP_SUMMARY_TEXT@@@@
@@@STEP_SUMMARY_TEXT@@@@
+ run ninja check-cuda-simple
+ echo '>>> ' ninja check-cuda-simple
>>>  ninja check-cuda-simple
+ ninja check-cuda-simple
[0/40] cd /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA && /usr/local/bin/lit -vv -j 1 assert-cuda-11.8-c++11-libc++.test axpy-cuda-11.8-c++11-libc++.test algorithm-cuda-11.8-c++11-libc++.test cmath-cuda-11.8-c++11-libc++.test complex-cuda-11.8-c++11-libc++.test math_h-cuda-11.8-c++11-libc++.test new-cuda-11.8-c++11-libc++.test empty-cuda-11.8-c++11-libc++.test printf-cuda-11.8-c++11-libc++.test future-cuda-11.8-c++11-libc++.test builtin_var-cuda-11.8-c++11-libc++.test test_round-cuda-11.8-c++11-libc++.test
-- Testing: 12 tests, 1 workers --
FAIL: test-suite :: External/CUDA/algorithm-cuda-11.8-c++11-libc++.test (1 of 12)
******************** TEST 'test-suite :: External/CUDA/algorithm-cuda-11.8-c++11-libc++.test' FAILED ********************

/buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/algorithm-cuda-11.8-c++11-libc++.test.out --redirect-input /dev/null --summary /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/algorithm-cuda-11.8-c++11-libc++.test.time /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/algorithm-cuda-11.8-c++11-libc++
cd /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA ; /buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/algorithm-cuda-11.8-c++11-libc++.test.out algorithm.reference_output-cuda-11.8-c++11-libc++

+ cd /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA
+ /buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/algorithm-cuda-11.8-c++11-libc++.test.out algorithm.reference_output-cuda-11.8-c++11-libc++
/buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target: Comparison failed, textual difference between 'C' and 'S'

********************
FAIL: test-suite :: External/CUDA/assert-cuda-11.8-c++11-libc++.test (2 of 12)
******************** TEST 'test-suite :: External/CUDA/assert-cuda-11.8-c++11-libc++.test' FAILED ********************

/buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/assert-cuda-11.8-c++11-libc++.test.out --redirect-input /dev/null --summary /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/assert-cuda-11.8-c++11-libc++.test.time /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/assert-cuda-11.8-c++11-libc++
cd /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA ; /buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/assert-cuda-11.8-c++11-libc++.test.out assert.reference_output-cuda-11.8-c++11-libc++

+ cd /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA
+ /buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/assert-cuda-11.8-c++11-libc++.test.out assert.reference_output-cuda-11.8-c++11-libc++
/buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target: Comparison failed, textual difference between 'e' and 'a'

********************
FAIL: test-suite :: External/CUDA/axpy-cuda-11.8-c++11-libc++.test (3 of 12)
******************** TEST 'test-suite :: External/CUDA/axpy-cuda-11.8-c++11-libc++.test' FAILED ********************

/buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/axpy-cuda-11.8-c++11-libc++.test.out --redirect-input /dev/null --summary /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/axpy-cuda-11.8-c++11-libc++.test.time /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/axpy-cuda-11.8-c++11-libc++
cd /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA ; /buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/axpy-cuda-11.8-c++11-libc++.test.out axpy.reference_output-cuda-11.8-c++11-libc++

+ cd /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA
+ /buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target /buildbot/cuda-l4-0/work/clang-cuda-l4/build/External/CUDA/Output/axpy-cuda-11.8-c++11-libc++.test.out axpy.reference_output-cuda-11.8-c++11-libc++
/buildbot/cuda-l4-0/work/clang-cuda-l4/build/tools/fpcmp-target: Comparison failed, textual difference between '3' and '2'

********************
PASS: test-suite :: External/CUDA/builtin_var-cuda-11.8-c++11-libc++.test (4 of 12)
********** TEST 'test-suite :: External/CUDA/builtin_var-cuda-11.8-c++11-libc++.test' RESULTS **********
exec_time: 0.0011 
hash: "293d0eb9282156edc5422e7a8c9268e3" 
**********
FAIL: test-suite :: External/CUDA/cmath-cuda-11.8-c++11-libc++.test (5 of 12)

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 21, 2024

LLVM Buildbot has detected a new failure on builder clang-cuda-p4 running on cuda-p4-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/15/builds/486

Here is the relevant piece of the build log for the reference:

Step 3 (annotate) failure: '/buildbot/cuda-build --jobs=' (failure)
...
[455/1026] Linking CXX executable libc/test/src/stdbit/libc.test.src.stdbit.stdc_count_ones_uc_test.__hermetic__.__build__
[456/1026] Building CXX object libc/test/src/stdio/CMakeFiles/libc.test.src.stdio.fopen_test.__hermetic__.__build__.dir/fopen_test.cpp.o
[457/1026] Linking CXX executable libc/test/src/stdbit/libc.test.src.stdbit.stdc_count_ones_us_test.__hermetic__.__build__
[458/1026] Linking CXX executable libc/test/src/stdbit/libc.test.src.stdbit.stdc_count_ones_ui_test.__hermetic__.__build__
[459/1026] Linking CXX executable libc/test/include/libc.test.include.assert_test.__hermetic__.__build__
[460/1026] Linking CXX executable libc/test/include/libc.test.include.stdckdint_test.__hermetic__.__build__
[461/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.math_extras_test.__hermetic__.__build__
[462/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.blockstore_test.__hermetic__.__build__
[463/1026] Linking CXX executable libc/test/include/libc.test.include.sys_queue_test.__hermetic__.__build__
[464/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__
FAILED: libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__ 
: && /buildbot/cuda-p4-0/work/clang-cuda-p4/clang/bin/clang++ -O3 -DNDEBUG --target=nvptx64-nvidia-cuda -Wno-multi-gpu -Wl,--suppress-stack-size-warning -march=native -nostdlib -static --cuda-path=/usr/local/cuda-12.2 libc/src/__support/CPP/CMakeFiles/libc.src.__support.CPP.new.__internal__.dir/new.cpp.o libc/src/__support/RPC/CMakeFiles/libc.src.__support.RPC.rpc_client.__internal__.dir/rpc_client.cpp.o libc/src/__support/OSUtil/gpu/CMakeFiles/libc.src.__support.OSUtil.gpu.gpu_util.__internal__.dir/exit.cpp.o libc/src/__support/OSUtil/gpu/CMakeFiles/libc.src.__support.OSUtil.gpu.gpu_util.__internal__.dir/io.cpp.o libc/src/__support/GPU/CMakeFiles/libc.src.__support.GPU.allocator.__internal__.dir/allocator.cpp.o libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.malloc.__internal__.dir/malloc.cpp.o libc/src/string/CMakeFiles/libc.src.string.memcmp.__internal__.dir/memcmp.cpp.o libc/src/string/CMakeFiles/libc.src.string.memcpy.__internal__.dir/memcpy.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib._Exit.__internal__.dir/_Exit.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.exit_handler.__internal__.dir/exit_handler.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.atexit.__internal__.dir/atexit.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.exit.__internal__.dir/exit.cpp.o libc/startup/gpu/nvptx/CMakeFiles/libc.startup.gpu.nvptx.crt1.__internal__.dir/start.cpp.o libc/src/string/CMakeFiles/libc.src.string.bcmp.__internal__.dir/bcmp.cpp.o libc/src/string/CMakeFiles/libc.src.string.bzero.__internal__.dir/bzero.cpp.o libc/src/string/CMakeFiles/libc.src.string.memmove.__internal__.dir/memmove.cpp.o libc/src/string/CMakeFiles/libc.src.string.memset.__internal__.dir/memset.cpp.o libc/src/__support/StringUtil/CMakeFiles/libc.src.__support.StringUtil.error_to_string.__internal__.dir/error_to_string.cpp.o libc/src/time/gpu/CMakeFiles/libc.src.time.gpu.time_utils.__internal__.dir/time_utils.cpp.o libc/src/time/gpu/CMakeFiles/libc.src.time.gpu.clock.__internal__.dir/clock.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/CmakeFilePath.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/LibcTest.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/LibcTestMain.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/TestLogger.cpp.o libc/test/UnitTest/CMakeFiles/LibcHermeticTestSupport.hermetic.dir/HermeticTestUtils.cpp.o libc/test/src/__support/CMakeFiles/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__.dir/freelist_heap_test.cpp.o libc/test/src/__support/CMakeFiles/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__.dir/freelist_malloc_test.cpp.o -o libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__  -llibc.startup.gpu.crt1.__internal__ && :
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git13freelist_heapE' in '/tmp/freelist_heap_test.cpp-4ad8cf.cubin'
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git4freeEPv' in '/tmp/freelist_malloc_test.cpp-132c6b.cubin'
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git6callocEmm' in '/tmp/freelist_malloc_test.cpp-132c6b.cubin'
clang++: error: nvlink command failed with exit code 255 (use -v to see invocation)
clang version 19.0.0git (https://github.com/llvm/llvm-project.git 5ece35df8586d0cb8c104a9f44eaae771de025f5)
Target: nvptx64-nvidia-cuda
Thread model: posix
InstalledDir: /buildbot/cuda-p4-0/work/clang-cuda-p4/clang/bin
Build config: +assertions
clang++: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.
[465/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.freelist_test.__hermetic__.__build__
[466/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.block_test.__hermetic__.__build__
[467/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.endian_test.__hermetic__.__build__
[468/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.high_precision_decimal_test.__hermetic__.__build__
[469/1026] Building CXX object libc/test/src/stdio/CMakeFiles/libc.test.src.stdio.putc_test.__hermetic__.__build__.dir/putc_test.cpp.o
[470/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.str_to_float_test.__hermetic__.__build__
[471/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.str_to_integer_test.__hermetic__.__build__
[472/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.integer_to_string_test.__hermetic__.__build__
[473/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.integer_literals_test.__hermetic__.__build__
ninja: build stopped: subcommand failed.
++ err=1
++ echo PID 952578: subprocess exited with error 1
++ exit 1
PID 952578: subprocess exited with error 1
+ step_failure
+ echo @@@STEP_FAILURE@@@
@@@STEP_FAILURE@@@
Step 12 (Testing GPU libc) failure:  (failure)
...
[455/1026] Linking CXX executable libc/test/src/stdbit/libc.test.src.stdbit.stdc_count_ones_uc_test.__hermetic__.__build__
[456/1026] Building CXX object libc/test/src/stdio/CMakeFiles/libc.test.src.stdio.fopen_test.__hermetic__.__build__.dir/fopen_test.cpp.o
[457/1026] Linking CXX executable libc/test/src/stdbit/libc.test.src.stdbit.stdc_count_ones_us_test.__hermetic__.__build__
[458/1026] Linking CXX executable libc/test/src/stdbit/libc.test.src.stdbit.stdc_count_ones_ui_test.__hermetic__.__build__
[459/1026] Linking CXX executable libc/test/include/libc.test.include.assert_test.__hermetic__.__build__
[460/1026] Linking CXX executable libc/test/include/libc.test.include.stdckdint_test.__hermetic__.__build__
[461/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.math_extras_test.__hermetic__.__build__
[462/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.blockstore_test.__hermetic__.__build__
[463/1026] Linking CXX executable libc/test/include/libc.test.include.sys_queue_test.__hermetic__.__build__
[464/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__
FAILED: libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__ 
: && /buildbot/cuda-p4-0/work/clang-cuda-p4/clang/bin/clang++ -O3 -DNDEBUG --target=nvptx64-nvidia-cuda -Wno-multi-gpu -Wl,--suppress-stack-size-warning -march=native -nostdlib -static --cuda-path=/usr/local/cuda-12.2 libc/src/__support/CPP/CMakeFiles/libc.src.__support.CPP.new.__internal__.dir/new.cpp.o libc/src/__support/RPC/CMakeFiles/libc.src.__support.RPC.rpc_client.__internal__.dir/rpc_client.cpp.o libc/src/__support/OSUtil/gpu/CMakeFiles/libc.src.__support.OSUtil.gpu.gpu_util.__internal__.dir/exit.cpp.o libc/src/__support/OSUtil/gpu/CMakeFiles/libc.src.__support.OSUtil.gpu.gpu_util.__internal__.dir/io.cpp.o libc/src/__support/GPU/CMakeFiles/libc.src.__support.GPU.allocator.__internal__.dir/allocator.cpp.o libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.malloc.__internal__.dir/malloc.cpp.o libc/src/string/CMakeFiles/libc.src.string.memcmp.__internal__.dir/memcmp.cpp.o libc/src/string/CMakeFiles/libc.src.string.memcpy.__internal__.dir/memcpy.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib._Exit.__internal__.dir/_Exit.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.exit_handler.__internal__.dir/exit_handler.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.atexit.__internal__.dir/atexit.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.exit.__internal__.dir/exit.cpp.o libc/startup/gpu/nvptx/CMakeFiles/libc.startup.gpu.nvptx.crt1.__internal__.dir/start.cpp.o libc/src/string/CMakeFiles/libc.src.string.bcmp.__internal__.dir/bcmp.cpp.o libc/src/string/CMakeFiles/libc.src.string.bzero.__internal__.dir/bzero.cpp.o libc/src/string/CMakeFiles/libc.src.string.memmove.__internal__.dir/memmove.cpp.o libc/src/string/CMakeFiles/libc.src.string.memset.__internal__.dir/memset.cpp.o libc/src/__support/StringUtil/CMakeFiles/libc.src.__support.StringUtil.error_to_string.__internal__.dir/error_to_string.cpp.o libc/src/time/gpu/CMakeFiles/libc.src.time.gpu.time_utils.__internal__.dir/time_utils.cpp.o libc/src/time/gpu/CMakeFiles/libc.src.time.gpu.clock.__internal__.dir/clock.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/CmakeFilePath.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/LibcTest.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/LibcTestMain.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/TestLogger.cpp.o libc/test/UnitTest/CMakeFiles/LibcHermeticTestSupport.hermetic.dir/HermeticTestUtils.cpp.o libc/test/src/__support/CMakeFiles/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__.dir/freelist_heap_test.cpp.o libc/test/src/__support/CMakeFiles/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__.dir/freelist_malloc_test.cpp.o -o libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__  -llibc.startup.gpu.crt1.__internal__ && :
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git13freelist_heapE' in '/tmp/freelist_heap_test.cpp-4ad8cf.cubin'
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git4freeEPv' in '/tmp/freelist_malloc_test.cpp-132c6b.cubin'
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git6callocEmm' in '/tmp/freelist_malloc_test.cpp-132c6b.cubin'
clang++: error: nvlink command failed with exit code 255 (use -v to see invocation)
clang version 19.0.0git (https://github.com/llvm/llvm-project.git 5ece35df8586d0cb8c104a9f44eaae771de025f5)
Target: nvptx64-nvidia-cuda
Thread model: posix
InstalledDir: /buildbot/cuda-p4-0/work/clang-cuda-p4/clang/bin
Build config: +assertions
clang++: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.
[465/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.freelist_test.__hermetic__.__build__
[466/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.block_test.__hermetic__.__build__
[467/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.endian_test.__hermetic__.__build__
[468/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.high_precision_decimal_test.__hermetic__.__build__
[469/1026] Building CXX object libc/test/src/stdio/CMakeFiles/libc.test.src.stdio.putc_test.__hermetic__.__build__.dir/putc_test.cpp.o
[470/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.str_to_float_test.__hermetic__.__build__
[471/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.str_to_integer_test.__hermetic__.__build__
[472/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.integer_to_string_test.__hermetic__.__build__
[473/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.integer_literals_test.__hermetic__.__build__
ninja: build stopped: subcommand failed.
++ err=1
++ echo PID 952578: subprocess exited with error 1
++ exit 1
PID 952578: subprocess exited with error 1
+ step_failure
+ echo @@@STEP_FAILURE@@@
program finished with exit code 0
elapsedTime=721.972016

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 21, 2024

LLVM Buildbot has detected a new failure on builder clang-cuda-t4 running on cuda-t4-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/69/builds/467

Here is the relevant piece of the build log for the reference:

Step 3 (annotate) failure: '/buildbot/cuda-build --jobs=' (failure)
...
[453/1026] Linking CXX executable libc/test/include/libc.test.include.stdckdint_test.__hermetic__.__build__
[454/1026] Linking CXX executable libc/test/include/libc.test.include.assert_test.__hermetic__.__build__
[455/1026] Linking CXX executable libc/test/src/stdbit/libc.test.src.stdbit.stdc_count_ones_ui_test.__hermetic__.__build__
[456/1026] Linking CXX executable libc/test/include/libc.test.include.sys_queue_test.__hermetic__.__build__
[457/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.block_test.__hermetic__.__build__
[458/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.freelist_test.__hermetic__.__build__
[459/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.high_precision_decimal_test.__hermetic__.__build__
[460/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.str_to_float_test.__hermetic__.__build__
[461/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.fixedvector_test.__hermetic__.__build__
[462/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__
FAILED: libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__ 
: && /buildbot/cuda-t4-0/work/clang-cuda-t4/clang/bin/clang++ -O3 -DNDEBUG --target=nvptx64-nvidia-cuda -Wno-multi-gpu -Wl,--suppress-stack-size-warning -march=native -nostdlib -static --cuda-path=/usr/local/cuda-12.2 libc/src/__support/CPP/CMakeFiles/libc.src.__support.CPP.new.__internal__.dir/new.cpp.o libc/src/__support/RPC/CMakeFiles/libc.src.__support.RPC.rpc_client.__internal__.dir/rpc_client.cpp.o libc/src/__support/OSUtil/gpu/CMakeFiles/libc.src.__support.OSUtil.gpu.gpu_util.__internal__.dir/exit.cpp.o libc/src/__support/OSUtil/gpu/CMakeFiles/libc.src.__support.OSUtil.gpu.gpu_util.__internal__.dir/io.cpp.o libc/src/__support/GPU/CMakeFiles/libc.src.__support.GPU.allocator.__internal__.dir/allocator.cpp.o libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.malloc.__internal__.dir/malloc.cpp.o libc/src/string/CMakeFiles/libc.src.string.memcmp.__internal__.dir/memcmp.cpp.o libc/src/string/CMakeFiles/libc.src.string.memcpy.__internal__.dir/memcpy.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib._Exit.__internal__.dir/_Exit.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.exit_handler.__internal__.dir/exit_handler.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.atexit.__internal__.dir/atexit.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.exit.__internal__.dir/exit.cpp.o libc/startup/gpu/nvptx/CMakeFiles/libc.startup.gpu.nvptx.crt1.__internal__.dir/start.cpp.o libc/src/string/CMakeFiles/libc.src.string.bcmp.__internal__.dir/bcmp.cpp.o libc/src/string/CMakeFiles/libc.src.string.bzero.__internal__.dir/bzero.cpp.o libc/src/string/CMakeFiles/libc.src.string.memmove.__internal__.dir/memmove.cpp.o libc/src/string/CMakeFiles/libc.src.string.memset.__internal__.dir/memset.cpp.o libc/src/__support/StringUtil/CMakeFiles/libc.src.__support.StringUtil.error_to_string.__internal__.dir/error_to_string.cpp.o libc/src/time/gpu/CMakeFiles/libc.src.time.gpu.time_utils.__internal__.dir/time_utils.cpp.o libc/src/time/gpu/CMakeFiles/libc.src.time.gpu.clock.__internal__.dir/clock.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/CmakeFilePath.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/LibcTest.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/LibcTestMain.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/TestLogger.cpp.o libc/test/UnitTest/CMakeFiles/LibcHermeticTestSupport.hermetic.dir/HermeticTestUtils.cpp.o libc/test/src/__support/CMakeFiles/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__.dir/freelist_heap_test.cpp.o libc/test/src/__support/CMakeFiles/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__.dir/freelist_malloc_test.cpp.o -o libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__  -llibc.startup.gpu.crt1.__internal__ && :
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git13freelist_heapE' in '/tmp/freelist_heap_test.cpp-7e0196.cubin'
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git4freeEPv' in '/tmp/freelist_malloc_test.cpp-64cbab.cubin'
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git6callocEmm' in '/tmp/freelist_malloc_test.cpp-64cbab.cubin'
clang++: error: nvlink command failed with exit code 255 (use -v to see invocation)
clang version 19.0.0git (https://github.com/llvm/llvm-project.git 5ece35df8586d0cb8c104a9f44eaae771de025f5)
Target: nvptx64-nvidia-cuda
Thread model: posix
InstalledDir: /buildbot/cuda-t4-0/work/clang-cuda-t4/clang/bin
Build config: +assertions
clang++: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.
[463/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.endian_test.__hermetic__.__build__
[464/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.math_extras_test.__hermetic__.__build__
[465/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.blockstore_test.__hermetic__.__build__
[466/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.integer_to_string_test.__hermetic__.__build__
[467/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.str_to_integer_test.__hermetic__.__build__
[468/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.integer_literals_test.__hermetic__.__build__
[469/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.char_vector_test.__hermetic__.__build__
[470/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.memory_size_test.__hermetic__.__build__
[471/1026] Linking CXX executable libc/test/src/__support/CPP/libc.test.src.__support.CPP.algorithm_test.__hermetic__.__build__
ninja: build stopped: subcommand failed.
++ err=1
++ echo PID 714372: subprocess exited with error 1
++ exit 1
PID 714372: subprocess exited with error 1
+ step_failure
+ echo @@@STEP_FAILURE@@@
@@@STEP_FAILURE@@@
Step 12 (Testing GPU libc) failure:  (failure)
...
[453/1026] Linking CXX executable libc/test/include/libc.test.include.stdckdint_test.__hermetic__.__build__
[454/1026] Linking CXX executable libc/test/include/libc.test.include.assert_test.__hermetic__.__build__
[455/1026] Linking CXX executable libc/test/src/stdbit/libc.test.src.stdbit.stdc_count_ones_ui_test.__hermetic__.__build__
[456/1026] Linking CXX executable libc/test/include/libc.test.include.sys_queue_test.__hermetic__.__build__
[457/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.block_test.__hermetic__.__build__
[458/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.freelist_test.__hermetic__.__build__
[459/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.high_precision_decimal_test.__hermetic__.__build__
[460/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.str_to_float_test.__hermetic__.__build__
[461/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.fixedvector_test.__hermetic__.__build__
[462/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__
FAILED: libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__ 
: && /buildbot/cuda-t4-0/work/clang-cuda-t4/clang/bin/clang++ -O3 -DNDEBUG --target=nvptx64-nvidia-cuda -Wno-multi-gpu -Wl,--suppress-stack-size-warning -march=native -nostdlib -static --cuda-path=/usr/local/cuda-12.2 libc/src/__support/CPP/CMakeFiles/libc.src.__support.CPP.new.__internal__.dir/new.cpp.o libc/src/__support/RPC/CMakeFiles/libc.src.__support.RPC.rpc_client.__internal__.dir/rpc_client.cpp.o libc/src/__support/OSUtil/gpu/CMakeFiles/libc.src.__support.OSUtil.gpu.gpu_util.__internal__.dir/exit.cpp.o libc/src/__support/OSUtil/gpu/CMakeFiles/libc.src.__support.OSUtil.gpu.gpu_util.__internal__.dir/io.cpp.o libc/src/__support/GPU/CMakeFiles/libc.src.__support.GPU.allocator.__internal__.dir/allocator.cpp.o libc/src/stdlib/gpu/CMakeFiles/libc.src.stdlib.gpu.malloc.__internal__.dir/malloc.cpp.o libc/src/string/CMakeFiles/libc.src.string.memcmp.__internal__.dir/memcmp.cpp.o libc/src/string/CMakeFiles/libc.src.string.memcpy.__internal__.dir/memcpy.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib._Exit.__internal__.dir/_Exit.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.exit_handler.__internal__.dir/exit_handler.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.atexit.__internal__.dir/atexit.cpp.o libc/src/stdlib/CMakeFiles/libc.src.stdlib.exit.__internal__.dir/exit.cpp.o libc/startup/gpu/nvptx/CMakeFiles/libc.startup.gpu.nvptx.crt1.__internal__.dir/start.cpp.o libc/src/string/CMakeFiles/libc.src.string.bcmp.__internal__.dir/bcmp.cpp.o libc/src/string/CMakeFiles/libc.src.string.bzero.__internal__.dir/bzero.cpp.o libc/src/string/CMakeFiles/libc.src.string.memmove.__internal__.dir/memmove.cpp.o libc/src/string/CMakeFiles/libc.src.string.memset.__internal__.dir/memset.cpp.o libc/src/__support/StringUtil/CMakeFiles/libc.src.__support.StringUtil.error_to_string.__internal__.dir/error_to_string.cpp.o libc/src/time/gpu/CMakeFiles/libc.src.time.gpu.time_utils.__internal__.dir/time_utils.cpp.o libc/src/time/gpu/CMakeFiles/libc.src.time.gpu.clock.__internal__.dir/clock.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/CmakeFilePath.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/LibcTest.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/LibcTestMain.cpp.o libc/test/UnitTest/CMakeFiles/LibcTest.hermetic.dir/TestLogger.cpp.o libc/test/UnitTest/CMakeFiles/LibcHermeticTestSupport.hermetic.dir/HermeticTestUtils.cpp.o libc/test/src/__support/CMakeFiles/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__.dir/freelist_heap_test.cpp.o libc/test/src/__support/CMakeFiles/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__.dir/freelist_malloc_test.cpp.o -o libc/test/src/__support/libc.test.src.__support.freelist_heap_test.__hermetic__.__build__  -llibc.startup.gpu.crt1.__internal__ && :
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git13freelist_heapE' in '/tmp/freelist_heap_test.cpp-7e0196.cubin'
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git4freeEPv' in '/tmp/freelist_malloc_test.cpp-64cbab.cubin'
nvlink error   : Undefined reference to '_ZN22__llvm_libc_19_0_0_git6callocEmm' in '/tmp/freelist_malloc_test.cpp-64cbab.cubin'
clang++: error: nvlink command failed with exit code 255 (use -v to see invocation)
clang version 19.0.0git (https://github.com/llvm/llvm-project.git 5ece35df8586d0cb8c104a9f44eaae771de025f5)
Target: nvptx64-nvidia-cuda
Thread model: posix
InstalledDir: /buildbot/cuda-t4-0/work/clang-cuda-t4/clang/bin
Build config: +assertions
clang++: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.
[463/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.endian_test.__hermetic__.__build__
[464/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.math_extras_test.__hermetic__.__build__
[465/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.blockstore_test.__hermetic__.__build__
[466/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.integer_to_string_test.__hermetic__.__build__
[467/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.str_to_integer_test.__hermetic__.__build__
[468/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.integer_literals_test.__hermetic__.__build__
[469/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.char_vector_test.__hermetic__.__build__
[470/1026] Linking CXX executable libc/test/src/__support/libc.test.src.__support.memory_size_test.__hermetic__.__build__
[471/1026] Linking CXX executable libc/test/src/__support/CPP/libc.test.src.__support.CPP.algorithm_test.__hermetic__.__build__
ninja: build stopped: subcommand failed.
++ err=1
++ echo PID 714372: subprocess exited with error 1
++ exit 1
PID 714372: subprocess exited with error 1
+ step_failure
+ echo @@@STEP_FAILURE@@@
program finished with exit code 0
elapsedTime=742.820871

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 21, 2024

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-nfc running on bolt-worker while building llvm at step 8 "test-build-bolt-check-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/92/builds/447

Here is the relevant piece of the build log for the reference:

Step 8 (test-build-bolt-check-bolt) failure: test (failure)
******************** TEST 'BOLT :: perf2bolt/perf_test.test' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 5: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -fuse-ld=lld -Wl,--script=/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.lds -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
RUN: at line 6: perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
+ perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp
Lowering default frequency rate from 4000 to 2000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 (20 samples) ]
RUN: at line 7: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id 2>&1 | /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp2 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp3 -nl -ignore-build-id
RUN: at line 12: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -no-pie -fuse-ld=lld -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/clang /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/Inputs/perf_test.c -no-pie -fuse-ld=lld -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
RUN: at line 13: perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
+ perf record -e cycles:u -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -- /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4
Lowering default frequency rate from 4000 to 2000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 (9 samples) ]
RUN: at line 14: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4 -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp6 -nl -ignore-build-id 2>&1 | /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test --check-prefix=CHECK-NO-PIE
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/perf2bolt /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp4 -p=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp5 -o /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/tools/bolt/test/perf2bolt/Output/perf_test.test.tmp6 -nl -ignore-build-id
+ /home/worker/bolt-worker2/bolt-x86_64-ubuntu-nfc/build/bin/FileCheck /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test --check-prefix=CHECK-NO-PIE
/home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test:17:19: error: CHECK-NO-PIE-NOT: excluded string found in input
CHECK-NO-PIE-NOT: !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection.
                  ^
<stdin>:27:2: note: found here
 !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance.
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/worker/bolt-worker2/llvm-project/bolt/test/perf2bolt/perf_test.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        .
        .
        .
       22: BOLT-WARNING: Running parallel work of 0 estimated cost, will switch to trivial scheduling. 
       23: PERF2BOLT: processing basic events (without LBR)... 
       24: PERF2BOLT: read 9 samples 
       25: PERF2BOLT: out of range samples recorded in unknown regions: 9 (100.0%) 
       26:  
       27:  !! WARNING !! This high mismatch ratio indicates the input binary is probably not the same binary used during profiling collection. The generated data may be ineffective for improving performance. 
...

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
We propose adding a new LLVM attribute,
`initializes((Lo1,Hi1),(Lo2,Hi2),...)`, which expresses the notion of
memory space (i.e., intervals, in bytes) that the argument pointing to
is initialized in the function.

Will commit the attribute inferring in the follow-up PRs.


https://discourse.llvm.org/t/rfc-llvm-new-initialized-parameter-attribute-for-improved-interprocedural-dse/77337
haopliu added a commit that referenced this pull request Oct 24, 2024
Apply the initializes attribute to DSE and guard with a flag,
"enable-dse-initializes-attr-improvement".

The attribute support has been landed in:
#84803
The attribute inference will be landed after this PR:
#97373
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants