Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IRON programming example: matrix multiply ref design results are wrong #1589

Open
hecmay opened this issue Jul 2, 2024 · 1 comment
Open

Comments

@hecmay
Copy link

hecmay commented Jul 2, 2024

Hi,

I am following ASPLOS tutorial on a Minisforum UM790 Pro machine with AMD Ryzen NPU. I was able to successfully set up the linux environment, IPU driver, and all Vitis dependencies.

However, when I tried the MM reference example, the results seem to be wrong. Here follows the output from single_core version. The whole_array version's output does not match with ref design either. Only matrix vector version worked.

rm -rf _build
mkdir -p _build
cd _build &&  cmake -E env CXXFLAGS="-std=c++23 -ggdb" cmake /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/.. -D CMAKE_C_COMPILER=gcc-13 -D CMAKE_CXX_COMPILER=g++-13 -DTARGET_NAME=matrixMultiplication -Dsubdir=single_core
CMake Deprecation Warning at CMakeLists.txt:14 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- The C compiler identification is GNU 13.1.0
-- The CXX compiler identification is GNU 13.1.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc-13 - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++-13 - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.74.0/BoostConfig.cmake (found version "1.74.0")  
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build
cd _build &&  cmake --build . --config Release
gmake[1]: Entering directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
gmake[2]: Entering directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
gmake[3]: Entering directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
gmake[3]: Leaving directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
gmake[3]: Entering directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
[ 33%] Building CXX object CMakeFiles/matrixMultiplication.dir/home/user/mlir-aie-test/runtime_lib/test_lib/test_utils.cpp.o
[ 66%] Building CXX object CMakeFiles/matrixMultiplication.dir/single_core/test.cpp.o
[100%] Linking CXX executable matrixMultiplication
gmake[3]: Leaving directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
[100%] Built target matrixMultiplication
gmake[2]: Leaving directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
gmake[1]: Leaving directory '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/_build'
cp _build/matrixMultiplication matrixMultiplication.exe 
mkdir -p build
python3 /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/aie2.py -m 64 -k 64 -n 64 -M 256 -K 256 -N 256 > build/aie_256x256x256_64x64x64.mlir
mkdir -p build
cd build && xchesscc_wrapper aie2 -I /tools/Xilinx/Vitis/2023.2/aietools/include  -DBIT_WIDTH=8 -DDIM_M=64 -DDIM_K=64 -DDIM_N=64 -c /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/../../../../aie_kernels/aie2/mm.cc -o mm_64x64x64.o
/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/../../../../aie_kernels/aie2/mm.cc:11:9: warning: '__AIENGINE__' macro redefined [-Wmacro-redefined]
#define __AIENGINE__ 2
        ^
<command line>:3:9: note: previous definition is here
#define __AIENGINE__ 1
        ^
In file included from /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/../../../../aie_kernels/aie2/mm.cc:23:
In file included from /tools/Xilinx/Vitis/2023.2/aietools/include/aie_api/aie.hpp:10185:
In file included from /tools/Xilinx/Vitis/2023.2/aietools/include/aie_api/aie_adf.hpp:75:
In file included from /tools/Xilinx/Vitis/2023.2/aietools/include/aie_api/adf/stream.hpp:54:
In file included from /tools/Xilinx/Vitis/2023.2/aietools/include/adf.h:7:
/tools/Xilinx/Vitis/2023.2/aietools/include/adf/intrinsics.h:28:9: warning: 'REL_WRITE' macro redefined [-Wmacro-redefined]
#define REL_WRITE -1
        ^
/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/../../../../aie_kernels/aie2/mm.cc:20:9: note: previous definition is here
#define REL_WRITE 0
        ^
2 warnings generated.
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <344>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <346>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <348>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <350>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <344>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <346>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <348>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): cannot move keep_with_operand operation `v16acc64 ups_w2c(v16int16, uint6_t, uint1_t, uint1_t, uint2_t, bool &)' <350>, would violate structural limitations [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): ... intended destination [-Wkeep-with-operand]
Warning in "": (imprecise line-number, the error occurred somewhere in this function): loop found to have 4 iterations, fewer than the explicitly annotated minimum 8 [-Wincorrect-annotation]
Warning: : (loop #8)
        Non leaf loop was prepared for pipelining. But the pipelined solutions have not been selected.
        Consider removing the chess_prepare_for_pipelining directive as it may improve results
mkdir -p build
cd build && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=final_256x256x256_64x64x64.xclbin \
			--aie-generate-npu --npu-insts-name=insts_256x256x256_64x64x64.txt ../build/aie_256x256x256_64x64x64.mlir
warning: overriding the module target triple with pdarch-unknown-unknown-elf [-Woverride-module]
1 warning generated.
Warning in "": (imprecise line-number, the error occurred somewhere in this function): loop with essential overflow in loop count computation (number of iterations exceeds internal maximum) [-Wloop-count-overflow]


****** Bootgen v2024.1
  **** Build date : Jun 18 2024-22:04:45
    ** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
    ** Copyright 2022-2024 Advanced Micro Devices, Inc. All Rights Reserved.


[INFO]   : Bootimage generated successfully

XRT Build Version: 2.18.0 (HEAD)
       Build Date: 2024-07-01 14:52:40
          Hash ID: 73fe5440974fc51ccaba6366094e4bfa8151f79a
Creating a default 'in-memory' xclbin image.

Section: 'MEM_TOPOLOGY'(6) was successfully added.
Size   : 88 bytes
Format : JSON
File   : '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/build/aie_256x256x256_64x64x64.mlir.prj/mem_topology.json'

Section: 'AIE_PARTITION'(32) was successfully added.
Size   : 12560 bytes
Format : JSON
File   : '/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/build/aie_256x256x256_64x64x64.mlir.prj/aie_partition.json'
Info: Embedded Metadata section is missing project.platform.device.core element, adding it.
Successfully wrote (18589 bytes) to the output file: final_256x256x256_64x64x64.xclbin
Leaving xclbinutil.
 AIE Compilation: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:01 2/2 4 Workers
Generating: /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/build/aie_256x256x256_64x64x64.mlir.prj/aie_cdo_elfs.bin
Generating: /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/build/aie_256x256x256_64x64x64.mlir.prj/aie_cdo_init.bin
Generating: /home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/build/aie_256x256x256_64x64x64.mlir.prj/aie_cdo_enable.bin
export XRT_HACK_UNSECURE_LOADING_XCLBIN=1 && \
 ./matrixMultiplication.exe -x build/final_256x256x256_64x64x64.xclbin -i build/insts_256x256x256_64x64x64.txt -k MLIR_AIE -M 256 -K 256 -N 256 -v 2 --warmup 1 --iters 1
Matrix size 256x256x256
Sequence instr count: 278
Loading xclbin: build/final_256x256x256_64x64x64.xclbin
Kernel opcode: MLIR_AIE
Name: MLIR_AIE
Registering xclbin: build/final_256x256x256_64x64x64.xclbin
Getting hardware context.
Getting handle to kernel:MLIR_AIE
Writing data into buffer objects.
A = 
    3.53      3.33      0.56      1.59      0.50   ...     0.59      1.92      1.36      1.98      2.33  
    3.55      3.38      3.45      3.70      2.58   ...     0.88      0.92      1.83      1.84      0.10  
    3.81      1.03      2.38      0.70      2.00   ...     3.34      3.78      3.06      3.69      2.83  
    0.45      3.11      0.21      0.88      1.32   ...     2.92      0.66      0.50      0.61      0.12  
    1.48      3.42      0.21      3.56      2.23   ...     2.12      2.22      1.54      1.88      3.33  
    ...       ...       ...       ...       ...    ...     ...       ...       ...       ...       ...   
    3.72      2.95      0.41      0.58      1.48   ...     2.20      2.34      3.14      0.74      2.08  
    3.75      1.64      2.86      0.95      3.70   ...     3.39      2.27      1.22      1.20      1.34  
    3.95      0.57      1.95      0.05      0.11   ...     0.74      0.05      2.12      1.55      3.98  
    2.80      2.81      1.49      1.01      1.28   ...     0.57      0.34      3.39      2.62      2.75  
    2.50      1.41      0.57      3.41      1.20   ...     3.52      3.27      0.61      3.11      3.42  
B = 
    3.75      1.54      1.69      1.72      3.64   ...     3.06      1.29      1.19      3.28      1.34  
    1.14      1.06      0.57      0.12      2.27   ...     3.28      3.94      3.89      3.25      1.56  
    0.07      3.58      2.45      3.48      3.94   ...     1.16      1.06      2.17      2.17      3.30  
    2.31      0.21      3.45      2.09      2.47   ...     0.78      3.30      2.53      0.82      3.20  
    3.56      1.68      2.25      2.16      2.59   ...     3.12      3.94      2.97      3.23      2.06  
    ...       ...       ...       ...       ...    ...     ...       ...       ...       ...       ...   
    3.38      3.83      1.18      2.48      3.50   ...     1.02      3.92      1.11      3.06      1.11  
    1.73      1.41      1.59      3.47      1.08   ...     2.22      1.18      3.33      1.08      3.08  
    0.62      2.20      0.60      0.57      2.62   ...     2.36      3.34      1.81      0.15      1.60  
    0.96      0.79      3.92      0.95      1.16   ...     0.06      3.02      1.02      1.78      2.42  
    2.09      2.58      3.58      1.59      1.55   ...     1.66      3.44      3.34      1.09      3.56  
Running Kernel (iteration 0).
Running Kernel (iteration 1).
Verifying against reference matmul ...
[   64,     6] 1024.00 =!= 916.00
[   64,     7] 1032.00 =!= 912.00
[   64,    12] 1040.00 =!= 904.00
[   64,    19] 1088.00 =!= 956.00
[   64,    20] 1032.00 =!= 928.00
[   64,    23] 1128.00 =!= 1000.00
[   64,    24] 1128.00 =!= 996.00
[   64,    28] 1080.00 =!= 972.00
[   64,    30] 1112.00 =!= 984.00
[   64,    31] 1128.00 =!= 1008.00
[   64,    33] 1024.00 =!= 912.00
[   64,    35] 1056.00 =!= 940.00
[   64,    46] 1112.00 =!= 996.00
[   64,    47] 988.00 =!= 892.00
[   64,    49] 1088.00 =!= 980.00
[   64,    50] 1128.00 =!= 992.00
[   64,    54] 1136.00 =!= 1012.00
[   64,    56] 1048.00 =!= 940.00
[   64,    58] 1112.00 =!= 996.00
[   64,    64] 1048.00 =!= 936.00
[   64,    68] 1136.00 =!= 1016.00
[   64,    74] 1080.00 =!= 968.00
[   64,    79] 1024.00 =!= 900.00
[   64,    90] 1072.00 =!= 956.00
[   64,    92] 1048.00 =!= 948.00
[   64,    95] 1096.00 =!= 988.00
[   64,    97] 1048.00 =!= 944.00
[   64,    99] 1040.00 =!= 928.00
[   64,   133] 952.00 =!= 852.00
[   64,   156] 1120.00 =!= 1012.00
[   64,   163] 1072.00 =!= 944.00
[   64,   174] 1080.00 =!= 876.00
...and 3407 further errors.
Maximum relative error:  21%

Reference:
 1008.00   1016.00    984.00    980.00    956.00   ...   936.00   1080.00   1096.00   1064.00   1040.00  
 1024.00    948.00    944.00    948.00    904.00   ...   928.00   1048.00   1032.00   1072.00   1004.00  
  996.00    944.00    940.00    956.00    928.00   ...   872.00   1072.00   1024.00   1032.00   1000.00  
  940.00    912.00    880.00    912.00    900.00   ...   912.00   1032.00   1032.00   1020.00    956.00  
 1012.00    984.00    988.00    956.00    952.00   ...   936.00   1088.00   1064.00   1040.00   1056.00  
    ...       ...       ...       ...       ...    ...     ...       ...       ...       ...       ...   
  996.00    940.00    924.00    920.00    896.00   ...   924.00   1020.00    996.00   1032.00   1004.00  
 1056.00   1008.00    988.00   1016.00    996.00   ...   960.00   1104.00   1160.00   1080.00   1112.00  
  948.00    980.00    976.00    924.00    924.00   ...   896.00   1040.00   1016.00   1040.00   1056.00  
 1088.00   1016.00    996.00   1020.00    976.00   ...   960.00   1088.00   1104.00   1120.00   1080.00  
 1048.00   1020.00   1020.00   1048.00   1024.00   ...   952.00   1152.00   1112.00   1112.00   1112.00  

Output:
 1004.00   1008.00    980.00    972.00    948.00   ...   932.00   1072.00   1080.00   1056.00   1040.00  
 1016.00    944.00    936.00    944.00    896.00   ...   924.00   1040.00   1020.00   1064.00   1000.00  
  988.00    940.00    928.00    952.00    920.00   ...   864.00   1056.00   1020.00   1024.00    996.00  
  936.00    908.00    872.00    904.00    892.00   ...   904.00   1024.00   1024.00   1012.00    952.00  
 1004.00    976.00    980.00    952.00    944.00   ...   928.00   1080.00   1056.00   1032.00   1048.00  
    ...       ...       ...       ...       ...    ...     ...       ...       ...       ...       ...   
 1000.00    952.00    916.00    976.00    984.00   ...   900.00   1056.00   1032.00   1040.00   1056.00  
  928.00    908.00    936.00    916.00    920.00   ...   872.00   1032.00    988.00    996.00    996.00  
  988.00    968.00   1020.00    996.00    968.00   ...   924.00   1072.00   1048.00   1072.00   1024.00  
 1056.00   1032.00   1032.00    988.00   1000.00   ...   996.00   1096.00   1104.00   1112.00   1088.00  
 1056.00    996.00    940.00    988.00    980.00   ...   948.00   1072.00   1096.00   1128.00   1056.00  
Verify time: 0.00 s.

Avg NPU matmul time: 834.00us.
Avg NPU gflops: 40.23

Min NPU matmul time: 834.00us.
Max NPU gflops: 40.23

Max NPU matmul time: 834.00us.
Min NPU gflops: 40.23

Error count: 3439


Failed.

make: *** [/home/user/mlir-aie-test/programming_examples/basic/matrix_multiplication/single_core/../makefile-common:87: run] Error 1
@makslevental
Copy link
Contributor

related #1554

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants