Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze causes pessimization of simple code #109704

Open
gbaraldi opened this issue Sep 23, 2024 · 2 comments
Open

Freeze causes pessimization of simple code #109704

gbaraldi opened this issue Sep 23, 2024 · 2 comments

Comments

@gbaraldi
Copy link
Contributor

gbaraldi commented Sep 23, 2024

In julia we have a pass that generates code that looks something like

define void @julia_split128_13(ptr noalias nocapture noundef nonnull sret([2 x i64]) align 8 dereferenceable(16) %sret_return, i128 noundef zeroext %0) {
top:
  %1 = alloca [16 x i8], align 16
  %2 = alloca [16 x i8], align 16
  %3 = alloca [2 x i64], align 8
  store i128 %0, ptr %1, align 16
  %13 = freeze [16 x i8] undef
  store [16 x i8] %13, ptr %2, align 1
  call void @llvm.memcpy.p0.p0.i64(ptr align 1 %2, ptr align 1 %1, i64 16, i1 false)
  call void @llvm.memcpy.p0.p0.i64(ptr %3, ptr %2, i64 16, i1 false)
  call void @llvm.memcpy.p0.p0.i64(ptr align 8 %sret_return, ptr align 8 %3, i64 16, i1 false)
  ret void
}

As it turns out that freeze we emit there causes a large pessimization.
The code without it optimizes to

define void @julia_split128_132(ptr noalias nocapture noundef nonnull writeonly sret([2 x i64]) align 8 dereferenceable(16) %sret_return, i128 noundef zeroext %0) local_unnamed_addr #0 {
  store i128 %0, ptr %sret_return, align 8
  ret void
}

While the code with it expands to

define void @julia_split128_13(ptr noalias nocapture noundef nonnull writeonly sret([2 x i64]) align 8 dereferenceable(16) %sret_return, i128 noundef zeroext %0) local_unnamed_addr #0 {
  %.sroa.046.0.extract.trunc = trunc i128 %0 to i8
  %.sroa.347.0.extract.shift = lshr i128 %0, 8
  %.sroa.347.0.extract.trunc = trunc i128 %.sroa.347.0.extract.shift to i8
  %.sroa.448.0.extract.shift = lshr i128 %0, 16
  %.sroa.448.0.extract.trunc = trunc i128 %.sroa.448.0.extract.shift to i8
  %.sroa.549.0.extract.shift = lshr i128 %0, 24
  %.sroa.549.0.extract.trunc = trunc i128 %.sroa.549.0.extract.shift to i8
  %.sroa.650.0.extract.shift = lshr i128 %0, 32
  %.sroa.650.0.extract.trunc = trunc i128 %.sroa.650.0.extract.shift to i8
  %.sroa.751.0.extract.shift = lshr i128 %0, 40
  %.sroa.751.0.extract.trunc = trunc i128 %.sroa.751.0.extract.shift to i8
  %.sroa.852.0.extract.shift = lshr i128 %0, 48
  %.sroa.852.0.extract.trunc = trunc i128 %.sroa.852.0.extract.shift to i8
  %.sroa.953.0.extract.shift = lshr i128 %0, 56
  %.sroa.953.0.extract.trunc = trunc i128 %.sroa.953.0.extract.shift to i8
  %.sroa.1054.0.extract.shift = lshr i128 %0, 64
  %.sroa.1054.0.extract.trunc = trunc i128 %.sroa.1054.0.extract.shift to i8
  %.sroa.1155.0.extract.shift = lshr i128 %0, 72
  %.sroa.1155.0.extract.trunc = trunc i128 %.sroa.1155.0.extract.shift to i8
  %.sroa.1256.0.extract.shift = lshr i128 %0, 80
  %.sroa.1256.0.extract.trunc = trunc i128 %.sroa.1256.0.extract.shift to i8
  %.sroa.1357.0.extract.shift = lshr i128 %0, 88
  %.sroa.1357.0.extract.trunc = trunc i128 %.sroa.1357.0.extract.shift to i8
  %.sroa.1458.0.extract.shift = lshr i128 %0, 96
  %.sroa.1458.0.extract.trunc = trunc i128 %.sroa.1458.0.extract.shift to i8
  %.sroa.1559.0.extract.shift = lshr i128 %0, 104
  %.sroa.1559.0.extract.trunc = trunc i128 %.sroa.1559.0.extract.shift to i8
  %.sroa.1660.0.extract.shift = lshr i128 %0, 112
  %.sroa.1660.0.extract.trunc = trunc i128 %.sroa.1660.0.extract.shift to i8
  %.sroa.1761.0.extract.shift = lshr i128 %0, 120
  %.sroa.1761.0.extract.trunc = trunc nuw i128 %.sroa.1761.0.extract.shift to i8
  store i8 %.sroa.046.0.extract.trunc, ptr %sret_return, align 8
  %.sroa.2.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 1
  store i8 %.sroa.347.0.extract.trunc, ptr %.sroa.2.0.sret_return.sroa_idx, align 1
  %.sroa.3.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 2
  store i8 %.sroa.448.0.extract.trunc, ptr %.sroa.3.0.sret_return.sroa_idx, align 2
  %.sroa.433.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 3
  store i8 %.sroa.549.0.extract.trunc, ptr %.sroa.433.0.sret_return.sroa_idx, align 1
  %.sroa.534.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 4
  store i8 %.sroa.650.0.extract.trunc, ptr %.sroa.534.0.sret_return.sroa_idx, align 4
  %.sroa.635.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 5
  store i8 %.sroa.751.0.extract.trunc, ptr %.sroa.635.0.sret_return.sroa_idx, align 1
  %.sroa.736.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 6
  store i8 %.sroa.852.0.extract.trunc, ptr %.sroa.736.0.sret_return.sroa_idx, align 2
  %.sroa.837.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 7
  store i8 %.sroa.953.0.extract.trunc, ptr %.sroa.837.0.sret_return.sroa_idx, align 1
  %.sroa.938.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 8
  store i8 %.sroa.1054.0.extract.trunc, ptr %.sroa.938.0.sret_return.sroa_idx, align 8
  %.sroa.1039.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 9
  store i8 %.sroa.1155.0.extract.trunc, ptr %.sroa.1039.0.sret_return.sroa_idx, align 1
  %.sroa.1140.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 10
  store i8 %.sroa.1256.0.extract.trunc, ptr %.sroa.1140.0.sret_return.sroa_idx, align 2
  %.sroa.1241.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 11
  store i8 %.sroa.1357.0.extract.trunc, ptr %.sroa.1241.0.sret_return.sroa_idx, align 1
  %.sroa.1342.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 12
  store i8 %.sroa.1458.0.extract.trunc, ptr %.sroa.1342.0.sret_return.sroa_idx, align 4
  %.sroa.1443.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 13
  store i8 %.sroa.1559.0.extract.trunc, ptr %.sroa.1443.0.sret_return.sroa_idx, align 1
  %.sroa.1544.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 14
  store i8 %.sroa.1660.0.extract.trunc, ptr %.sroa.1544.0.sret_return.sroa_idx, align 2
  %.sroa.1645.0.sret_return.sroa_idx = getelementptr inbounds i8, ptr %sret_return, i64 15
  store i8 %.sroa.1761.0.extract.trunc, ptr %.sroa.1645.0.sret_return.sroa_idx, align 1
  ret void
}

Is there anything we can do to make this a little better. i.e emit this code differently. Specially because the memcpy there guarantees that this store is always dead.

Replacing the [16 x i8] array with a [1 x i128] also fixes this but is not really tenable with how this code is emitted.

@efriedma-quic
Copy link
Collaborator

  %13 = freeze [16 x i8] undef
  store [16 x i8] %13, ptr %2, align 1

I assume the point of this is that you want "uninitialized" variables that aren't undef/poison? That's not something anyone has really looked at... in C/Rust it's UB to access uninitialized memory, and most other languages don't expose uninitialized local variables at all. And for security-sensitive C code, they usually go straight to zero-init.

I don't think using an array of bytes like this is a good approach; that's what's making SROA decompose the memcpy into individual byte stores. And it's like to be unfriendly to other passes.

Maybe we could add a dedicated intrinsic to freeze memory.

Replacing the [16 x i8] array with a [1 x i128] also fixes this but is not really tenable with how this code is emitted.

Not really following the reason this is problematic.

@gbaraldi
Copy link
Contributor Author

We emit this code as bytes. But the solution is just to memset this with 0 and not be weird :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants