Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObjectFifo - Assertion Error in Nested Loop with Only One Iteration #1547

Closed
andrej opened this issue Jun 11, 2024 · 1 comment
Closed

ObjectFifo - Assertion Error in Nested Loop with Only One Iteration #1547

andrej opened this issue Jun 11, 2024 · 1 comment

Comments

@andrej
Copy link
Collaborator

andrej commented Jun 11, 2024

I'm just making this issue to document an error (and workaround, see below) I'm seeing with the ObjectFifo loop unrolling:

Summary

  • Three nested loops
  • Middle loop only has a single iteration
  • ObjectFifo accesses in both the two inner loops
  • The above conditions result in an assertion error, causing the compiler to crash

Error

/usr/include/c++/11/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = xilinx::AIE::BufferOp*; _Alloc = std::allocator<xilinx::AIE::BufferOp*>; std::vector<_Tp, _Alloc>::reference = xilinx::AIE::BufferOp*&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: 
Assertion '__n < this->size()' failed.
Aborted (core dumped)

Compilation Command

aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=bug.xclbin \
                         --aie-generate-npu --npu-insts-name=bug.txt bug.mlir

Code

The "unique" thing about this code is that we have a loop with only a single iteration. If we make it multiple iterations, the error does not happen. The error also does not happen when we only have two, not three, nested loops.

module {
  aie.device(npu1_4col) {

    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c4 = arith.constant 4 : index
    %c4294967295 = arith.constant 4294967295 : index

    %tile_0_1 = aie.tile(0, 1)
    %tile_0_2 = aie.tile(0, 2)

    aie.objectfifo @fifoA(%tile_0_2, {%tile_0_1}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>
    aie.objectfifo @fifoB(%tile_0_1, {%tile_0_2}, 2 : i32) : !aie.objectfifo<memref<64x64xbf16>>

    %core_0_2 = aie.core(%tile_0_2) {

      scf.for %arg0 = %c0 to %c4294967295 step %c1 {
        scf.for %arg1 = %c0 to %c1 step %c1 {
          %0 = aie.objectfifo.acquire @fifoA(Produce, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
          %1 = aie.objectfifo.subview.access %0[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>
          scf.for %arg2 = %c0 to %c4 step %c1 {
            %2 = aie.objectfifo.acquire @fifoB(Consume, 1) : !aie.objectfifosubview<memref<64x64xbf16>>
            %3 = aie.objectfifo.subview.access %2[0] : !aie.objectfifosubview<memref<64x64xbf16>> -> memref<64x64xbf16>
            aie.objectfifo.release @fifoB(Consume, 1)
          }
          aie.objectfifo.release @fifoA(Produce, 1)
        }
      }
      
      aie.end

    }
  }
}

Alternative error

If we remove the two aie.objectfifo.subview.access statements, the error instead becomes:

/home/github/actions-runner/_work/mlir-aie/mlir-aie/mlir/src/python/MLIRPythonExtension.Core/IRModule.h:433:
mlir::python::PyMlirContext::ErrorCapture::~ErrorCapture(): Assertion `errors.empty() && "unhandled captured errors"' failed.
Aborted (core dumped)

Workaround

In the Python code that generates the MLIR, check if loops have a single iteration. If so, do not emit the loop.

cc @AndraBisca

@andrej
Copy link
Collaborator Author

andrej commented Jun 11, 2024

Just realized I already reported something very similar in #1128. This is probably the same issue. I will add the minimal example I came up with here as a comment to the other issue and close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant