Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy Discrepancies between the built Accelerator (68%) on ZCU102 and Brevitas Model (86%) #996

Open
shakeelakram00 opened this issue Mar 4, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@shakeelakram00
Copy link

ZCU102: PYNQ Linux, based on Ubuntu 18.04 (GNU/Linux 4.19.0-xilinx-v2019.1 a)
FINN: v0.9
Xilinx tools: 2022.2
Ubuntu: 22.04.1 LTS
Start the docker container with the command: ./run-docker.sh notebook

commit b3bdff1 (HEAD -> main, origin/main, origin/HEAD)
Merge: cdc5ec4 9847528
Author: auphelia <[email protected]>
Date: Mon Feb 13 11:55:42 2023 +0000

Merge pull request #762 from Xilinx/fix/nb_tests

Fix known issues for release

commit 9847528
Author: auphelia <[email protected]>
Date: Mon Feb 13 11:52:15 2023 +0000

[Notebooks/Tests] Fix typo in nb and fix build_dataflow test

commit cdc5ec4 (tag: v0.9)
Merge: 41740ed 17af0c3
Author: auphelia <[email protected]>
Date: Fri Feb 10 12:00:49 2023 +0000

Merge pull request #760 from Xilinx/dev

Summary
I have been working with the cnv_end2end_example and successfully modified it to build the Accelerator on a different dataset. The brevitas model was trained on a dataset with a shape of 1x1x14x14, dtype torch.float32 and values ranging between 0 and 1.

Following the cnv_end2end_example, the first layer that exists does the quantization and the ONNX conversion includes pre-processing (ToTensor(), i.e., division by 255 for normalization UINT8 inputs to FLOAT [0,1]) and post-processing (TopK=1). The ONNX model, after create_dataflow_partition, provides all the blocks converted into HLS_Layers, except the initial Transpose.

Given that the first Transpose was not converted to an HLS layer, and the accelerator works with a dataset of shape 1x14x14x1 and dtype UINT8, I scaled the original float32 dataset to np.uint8 (dataset*255.astype(np.uint8))) for inference on ZCU102. Though the generated validated file includes reshaping the data to the desired shape, I tried to input the data by reshaping and without reshaping. The results were the same in both cases i.e. 68%.

During the built process I included the verification steps that show the successful results again sample input and expected output even the built accelerator presents the correct output for the sample input but for the overall dataset, the accuracy drops to 68% rather than 86%. And even for the whole dataset After performing the Initial Tidyup Transformations below, the accuracy of the brevitas model exported to ONNX gives 86% accuracy.

Initial Tidyup Transformation:
bo.export_finn_onnx(brevitas_model, (1, 1, 14, 14), "export.onnx");
model = ModelWrapper("export.onnx")
model = model.transform(InferShapes())
model = model.transform(FoldConstants())
...
output_dict = oxe.execute_onnx(model_t, input_dict)

Accelerator Built Steps After Brevitas Model Exported to onnx
import brevitas.onnx as bo
bo.export_finn_onnx(model, (1, 1, 14, 14), "export.onnx");

from finn.util.pytorch import ToTensor
from qonnx.transformation.merge_onnx_models import MergeONNXModels
from qonnx.core.modelwrapper import ModelWrapper
from qonnx.core.datatype import DataType
from qonnx.transformation.insert_topk import InsertTopK
import finn.builder.build_dataflow as build
def custom_step_add_post_proc(model: ModelWrapper, cfg: build.DataflowBuildConfig):
    model = model.transform(InsertTopK(k=1))
    return model

def custom_step_add_pre_proc(model: ModelWrapper, cfg: build.DataflowBuildConfig):
    ishape = model.get_tensor_shape(model.graph.input[0].name)
    # preprocessing: torchvision's ToTensor divides uint8 inputs by 255
    preproc = ToTensor()
    bo.export_finn_onnx(preproc, ishape, "preproc.onnx", opset_version=11)
    preproc_model = ModelWrapper("preproc.onnx")
    # set input finn datatype to UINT8
    preproc_model.set_tensor_datatype(preproc_model.graph.input[0].name, DataType["UINT8"])
    # merge pre-processing onnx model with cnv model (passed as input argument)
    model = model.transform(MergeONNXModels(preproc_model))
    return model

import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil

model_file = "export.onnx"

rtlsim_output_dir = "output"

#Delete previous run results if exist
if os.path.exists(rtlsim_output_dir):
    shutil.rmtree(rtlsim_output_dir)
    print("Previous run results deleted!")

cfg_stitched_ip = build.DataflowBuildConfig(
    output_dir          = rtlsim_output_dir,
    mvau_wwidth_max     = 160,
    synth_clk_period_ns = 20.0,
    target_fps          = 2000000,
    board               = "ZCU102",
    fpga_part           = "xczu9eg-ffvb1156-2-e",
    shell_flow_type     = build_cfg.ShellFlowType.VIVADO_ZYNQ,

    folding_two_pass_relaxation = True,
    folding_config_file = "auto_folding_config.json",
    steps=[custom_step_add_pre_proc,
           custom_step_add_post_proc,
           "step_qonnx_to_finn",
           "step_tidy_up",
           "step_streamline",
           "step_convert_to_hls",
           "step_create_dataflow_partition",
           "step_target_fps_parallelization",
           "step_apply_folding_config",
           "step_generate_estimate_reports",
           "step_hls_codegen",
           "step_hls_ipgen",
           "step_set_fifo_depths",
           "step_create_stitched_ip",
           "step_measure_rtlsim_performance",
           "step_out_of_context_synthesis",
           "step_synthesize_bitfile",
           "step_make_pynq_driver",
           "step_deployment_package",
          ],
    generate_outputs=[
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
        build_cfg.DataflowOutputType.STITCHED_IP,
        build_cfg.DataflowOutputType.RTLSIM_PERFORMANCE,
        build_cfg.DataflowOutputType.OOC_SYNTH,
        build_cfg.DataflowOutputType.BITFILE,
        build_cfg.DataflowOutputType.PYNQ_DRIVER,
        build_cfg.DataflowOutputType.DEPLOYMENT_PACKAGE,
    ],
    verify_steps=[
        build_cfg.VerificationStepType.QONNX_TO_FINN_PYTHON,
        build_cfg.VerificationStepType.TIDY_UP_PYTHON,
        build_cfg.VerificationStepType.STREAMLINED_PYTHON,
        build_cfg.VerificationStepType.FOLDED_HLS_CPPSIM,
        build_cfg.VerificationStepType.STITCHED_IP_RTLSIM,
    ]
)

build.build_dataflow_cfg(model_file, cfg_stitched_ip)

Moreover, Runtime_writeable_weights are enabled (set to 1) in the .json file for MVAU of CNV and Linear Layers, following the guidelines in 4_advanced_builder_settings and cnv-w1a1_folding_config.

I would appreciate any assistance in debugging this issue.

@fpjentzsch, you mentioned in your reply #995 that reshaping alone might not be sufficient. Could you please provide further guidance, considering my specific setup, to achieve the desired accuracy on the accelerator?

Thank you in advance for your help.

@shakeelakram00 shakeelakram00 added the bug Something isn't working label Mar 4, 2024
@shakeelakram00
Copy link
Author

shakeelakram00 commented Mar 13, 2024

Hi there,
I've been diligently verifying each stage of FINN Flow for the above query, and I've run into a perplexing issue that I could use some guidance on.

Initially, during the ONNX execution, I achieved a commendable accuracy of 86% after applying tidy-up transformations, pre and post-processing transformations.
However, upon proceeding with the streamline transformations, I encountered a significant drop in accuracy to 68%. This drop persisted when deploying the model onto an FPGA.

To give you a clearer picture, here are the streamline transformations I've implemented:
model = model.transform(MoveScalarLinearPastInvariants())
model = model.transform(Streamline())
model = model.transform(LowerConvsToMatMul())
model = model.transform(MakeMaxPoolNHWC())
model = model.transform(Streamline())
model = model.transform(absorb.AbsorbTransposeIntoMultiThreshold())
model = model.transform(ConvertBipolarMatMulToXnorPopcount())
model = model.transform(Streamline())
model = model.transform(absorb.AbsorbScalarMulAddIntoTopK())
model = model.transform(InferDataLayouts())
model = model.transform(RemoveUnusedTensors())

I also tried the finn.builder.build_dataflow, it still showed the same issue i.e. when streamline transformations are applied there is a drop in accuracy.

Only when I take "model = model.transform(LowerConvsToMatMul())" this trasnformation off, I get the same 86% accuracy. And I know to convert the model to hls-compatible node we have to convert convs to matmul and we need this transformation. And the only difference other than this I see with and without transformation multithreshold_1 and multithreshold_2 finn_datatype are Binaray (with LowerConvsToMatMul: giving an accuracy of 68%), and are Bipolar (without LowerConvsToMatMul: giving an accuracy of 86%) respectively.

I'm at a loss as to why this transformation is causing such a significant accuracy drop. Is it due to the Multithreshold finn_dtypes or even Kernel Size i.e 6x6 I am using in quantconv2d? Any insights or suggestions you could offer would be greatly appreciated.

Thank you for your time and assistance.

@auphelia
Copy link
Collaborator

auphelia commented Apr 3, 2024

Hi @shakeelakram00, could you try the latest release with your flow? Note that you will need to change your flow for the new structure, this blog post might be helpful: #1020
Great, that you were able to narrow down the problem even further. If the error persists, could you put a minimal example together to reproduce your error?

@shakeelakram00
Copy link
Author

shakeelakram00 commented Apr 3, 2024

Hi @auphelia ,
I really appreciate your response. Thanks, a lot.
I somehow managed to sort out the error by changing the quantidentity layer's bitwidth to 2, associated with the convolution layers in cnv_end2end_example and kept the bitwitdth to 1 associated with the linear layers. The result has produced the same accuracy i.e 86% now after the transformations. The error I suppose was due to the zero padding, that is when the transformations were applied that changed the datatypes to binary from the bipolar, hence giving the accuracy drop.

But moving forward when I apply the following Partitioning, Conversion to HW Layers and Folding transformations I get an error AssertionError: MultiThreshold_3: Signed output requires actval less than 0. which I suppose was due to the multithreshold_3 generated for the quantidentity layer associated with last convolution layer before linear layers. So, I tried to update the attributes of that node by making out_bias == -1.0 manually in the onnx generated after streamline transformations. This got me rid of the error but dropped the accuracy to even down to 51%, which I suppose is due to the force change of out_bias.

So what you suggest, if I have the same convolution layers with same quantidentity layer associated with them except the second conv has an additional layer maxpool, why the MultiThreshold_3 doesn't automatically takes the out_bias = -1.0 as MultiThreshold_2 and 1 does when all three of them are coming from the same quantidentity layer.
""""""
self.conv_features.append(QuantIdentity( act_quant=CommonActQuant, bit_width=8, min_val=- 1.0, max_val=1.0 - 2.0 ** (-7),
narrow_range=True, restrict_scaling_type=RestrictValueType.POWER_OF_TWO))
for out_ch, is_pool_enabled in CNV_OUT_CH_POOL:
self.conv_features.append(QuantConv2d(kernel_size=KERNEL_SIZE, in_channels=in_ch, out_channels=out_ch,
bias=True, padding=4, weight_quant=CommonWeightQuant, weight_bit_width=weight_bit_width))
in_ch = out_ch
self.conv_features.append(BatchNorm2d(in_ch, eps=1e-4))
self.conv_features.append(QuantIdentity(act_quant=CommonActQuant,bit_width=2))#MultiThreshold123
if is_pool_enabled:
self.conv_features.append(MaxPool2d(kernel_size=2))
""""""
Secondly, do I have to keep all the quantidentity layers with the same bitwidth except the first one which is 8. or should I make the bitwidth == 1 for the quantidientity layer associated with convolution layer 3.

import finn.transformation.fpgadataflow.convert_to_hw_layers as to_hw
from finn.transformation.fpgadataflow.create_dataflow_partition import (
CreateDataflowPartition,
)
from finn.transformation.move_reshape import RemoveCNVtoFCFlatten
from finn.transformation.fpgadataflow.specialize_layers import SpecializeLayers
from qonnx.custom_op.registry import getCustomOp
from qonnx.transformation.infer_data_layouts import InferDataLayouts

model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_streamlined.onnx")
model = model.transform(to_hw.InferBinaryMatrixVectorActivation())
model = model.transform(to_hw.InferQuantizedMatrixVectorActivation())
model = model.transform(to_hw.InferLabelSelectLayer())
model = model.transform(to_hw.InferThresholdingLayer())
model = model.transform(to_hw.InferConvInpGen())
model = model.transform(to_hw.InferStreamingMaxPool())
model = model.transform(RemoveCNVtoFCFlatten())
model = model.transform(absorb.AbsorbConsecutiveTransposes())
model = model.transform(InferDataLayouts())
parent_model = model.transform(CreateDataflowPartition())
parent_model.save(build_dir + "/end2end_cnv_w1a1_dataflow_parent.onnx")
sdp_node = parent_model.get_nodes_by_op_type("StreamingDataflowPartition")[0]
sdp_node = getCustomOp(sdp_node)
dataflow_model_filename = sdp_node.get_nodeattr("model")
dataflow_model = ModelWrapper(dataflow_model_filename)
dataflow_model.save(build_dir + "/end2end_cnv_w1a1_dataflow_model.onnx")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants