Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to change clock_period in sim_input.yml #650

Closed
Waxpple opened this issue May 6, 2022 · 32 comments
Closed

How to change clock_period in sim_input.yml #650

Waxpple opened this issue May 6, 2022 · 32 comments

Comments

@Waxpple
Copy link
Contributor

Waxpple commented May 6, 2022

Hi,

After I change the config in example-asap7.yml

...
vlsi.inputs.clocks: [
 {name: "clock_clock", period: "1.25ns", uncertainty: "0.1ns"}
]

The design sdc and sim-syn-input.json are using 1.25ns setting without problem but the sim-input.yml stays at the 1ns setting.
Where can I change the sim-input.yml to follow the vlsi.inputs.clocks?

Thanks!

@harrisonliew
Copy link
Contributor

This should be a Chipyard issue, but here's the answer.

vlsi.inputs.clocks is the clock period used for synthesis, and gets translated into a timing constraint. In contrast, the clock period in simulation is only dependent on the testbench, which in the case of Chipyard is via the CLOCK_PERIOD variable, defined here: https://github.com/ucb-bar/chipyard/blob/main/vcs.mk#L21.

Of course, matching the two clock periods only matters if you are doing timing-annotated gate-level simulations.

@Waxpple
Copy link
Contributor Author

Waxpple commented May 6, 2022

Got it. Thanks!

@Waxpple Waxpple closed this as completed May 6, 2022
@Waxpple
Copy link
Contributor Author

Waxpple commented May 6, 2022

Hi,

  1. Is there anything else that should be modified if I want to set the clock frequency at 800MHz?
    I have changed the vcs.mk and example-asap7.yml.
    The synthesized design timing is clean (TNS >0) and functional simulation passed.
    However, in timing-annotated gate-level simulations, the design failed with timing violation at frontend.icache.
    And the critical path is not seen in the genus.
    image
    Even if I change the simulation clock period to 2.0ns and the design constraint stays at 1.25ns, the timing won't pass.
    Edit: I think it is hold time violation:
    image
    The problem is genus cannot see the hold time violation, I cannot fix hold in the synthesis stage.

  2. I read the issue on (asap-7 dual port SRAM #644). Soon we can perform post-syn-power analysis? That is great! We can read the saif file and get the benchmark power analysis result. I hope I can be part of the PR if it is possible?
    Edit: I realized I don't have joules... lol

Edit:
3. Sometimes, vcs fgp does not finish across sub-threads when the main thread is finished. Have you encountered the same situation?
Thanks.

@Waxpple Waxpple reopened this May 6, 2022
@harrisonliew
Copy link
Contributor

  1. Ah, due to the fact that the ASAP7 SRAM models are completely fake, they have no timing annotation for gate-level simulation, and the LIBs have fake timing too. So, there cannot be an expectation for the timing-annotated simulations to pass if there are any dummy SRAMs in the design. In this case, you see a hold violation because in synthesis, the SRAMs have a positive clk-q delay while in simulation, this delay is 0.
  2. Yes, the post-syn-power PRs are coming very soon. We'll make sure to tag you for review, even if you don't have Joules in case you want to comment.
  3. I have not encountered FGP getting out of sync. It looks like there are some options for FGP such as sync:busywait, sync:mutex, and sync:serial. It looks like it's in decreasing order of performance and increasing order of safety. Can you try these and see what changes? You might also need to combine it with the diag:ruse option.

@Waxpple
Copy link
Contributor Author

Waxpple commented May 7, 2022

  1. Ah, due to the fact that the ASAP7 SRAM models are completely fake, they have no timing annotation for gate-level simulation, and the LIBs have fake timing too. So, there cannot be an expectation for the timing-annotated simulations to pass if there are any dummy SRAMs in the design. In this case, you see a hold violation because in synthesis, the SRAMs have a positive clk-q delay while in simulation, this delay is 0.

I don't understand why would the delay be 0 in the simulation? Don't we have SRAM lib which delay can be calculated and annotated in the simulation?
Edit: Is it because we don't specify the delay in sram.v model?

  1. Yes, the post-syn-power PRs are coming very soon. We'll make sure to tag you for review, even if you don't have Joules in case you want to comment.

That would be great.

  1. I have not encountered FGP getting out of sync. It looks like there are some options for FGP such as sync:busywait, sync:mutex, and sync:serial. It looks like it's in decreasing order of performance and increasing order of safety. Can you try these and see what changes? You might also need to combine it with the diag:ruse option.

Ok. I will test it out.

@harrisonliew
Copy link
Contributor

harrisonliew commented May 7, 2022

Oh, I see. Yes, indeed the SDF file encodes the SRAM timing, sorry for not checking this.

You are also correct that Genus can't check hold time, hence you see the violation in Tempus (we are planning to add a Tempus plugin for STA soon). However, Tempus understands the uncertainty (100ps) but VCS does not, so that path in Tempus should not cause a hold violation in VCS...

Another thing to check: what is the effective timescale resolution? I'm wondering if the resolution is somehow 100ps instead of the default 10ps (from the Makefiles), hence the setup/hold limits of 100 in the $setuphold task? The SDF file indicates the DFFHQx4 cell should hae a hold time of no more than 14ps... maybe the resolution should be decreased to 1ps, to match the SDF?

@Waxpple
Copy link
Contributor Author

Waxpple commented May 7, 2022

Oh, I see. Yes, indeed the SDF file encodes the SRAM timing, sorry for not checking this.

You are also correct that Genus can't check hold time, hence you see the violation in Tempus (we are planning to add a Tempus plugin for STA soon). However, Tempus understands the uncertainty (100ps) but VCS does not, so that path in Tempus should not cause a hold violation in VCS...

Another thing to check: what is the effective timescale resolution? I'm wondering if the resolution is somehow 100ps instead of the default 10ps (from the Makefiles), hence the setup/hold limits of 100 in the $setuphold task? The SDF file indicates the DFFHQx4 cell should hae a hold time of no more than 14ps... maybe the resolution should be decreased to 1ps, to match the SDF?

I think both issues exist!
I override the timescale on vcs compile time and get this:
image
However, the sdf annotation failed on SRAM module for some reason.
image
I am still figuring out why the "timing check annotation is not enable"

@Waxpple
Copy link
Contributor Author

Waxpple commented May 13, 2022

Edit:

Is it because sram.v did not use specify and the sdf file cannot be back annotated?

Edit:
Because I add some random specify code in SRAM1RW64x22 for test like:
image
And It won't complain about O[*] sdf IOPATH not being found.
Which means the SRAM behavioral models should contain the specify block for sdf annotation?

@Waxpple
Copy link
Contributor Author

Waxpple commented May 18, 2022

  1. I added the auto-generated specify block in the SRAM compiler.
    The sdf back-annotation is functioning properly now.
  2. I added the auto-generated tempus script for fixing hold time in max delay view and it will write out netlist and sdf.
  3. TODO: run tempus automatically after genus finishes the job and maybe also fix setup time. Also test the script robustness in terms of fixing timing.

@Waxpple
Copy link
Contributor Author

Waxpple commented May 18, 2022

image
In the SRAM compiler, we didn't get bytemask in our sram Verilog model. And it exists in the LEF file which causes post-syn sim failed

@Waxpple
Copy link
Contributor Author

Waxpple commented May 18, 2022

I add the bytemask for 256x64 and 256x128.
In netlist, bytemask signal is connected to tielo cell and RTL doesn't even declare the port, so I only add the dummy port for post-syn simulation.

@harrisonliew
Copy link
Contributor

Thanks for catching the extra BYTEMASK in those 2 memories. I'm not sure how they ended up there. I will try to see if I can remove it since it's confusing if it's there, but I will need to see if I still have the ability to generate the collateral.

@harrisonliew
Copy link
Contributor

harrisonliew commented May 19, 2022

I was able to remove BYTEMASK from the LEF/GDS/LIB views for those 2 SRAMs (see the PR). Please try it out and see if it works. You will need to re-synthesize.

@harrisonliew
Copy link
Contributor

I added the auto-generated specify block in the SRAM compiler.
The sdf back-annotation is functioning properly now.

Also, can you make a PR with this?

@Waxpple
Copy link
Contributor Author

Waxpple commented May 26, 2022

I added the auto-generated specify block in the SRAM compiler.
The sdf back-annotation is functioning properly now.

Also, can you make a PR with this?

Yes, I have made a PR with the patch.
Sorry for the late response, this week or two is our final week.

@harrisonliew
Copy link
Contributor

Merged, thanks! Good luck with your finals, and hope you can test out #653 afterwards.

@Waxpple
Copy link
Contributor Author

Waxpple commented Jun 16, 2022

When I run par with floorplan.tcl, the instance name of the SRAMs is not aligned with the design. Where can I change the inst name prefix? I guess the path is fetched from par-input.json.

@harrisonliew
Copy link
Contributor

The inst name of SRAMs is generated from the MacroCompiler step, and cannot be changed unless you change it in the generated Verilog.

As for the floorplan, you must provide a position for each SRAM instance (see vlsi.inputs.placement_constraints in the example_asap7.yml file) that you have. You can also edit this key in par-input.json directly for debugging first, but this file is generated by Hammer during the flow. Otherwise they will be all be placed outside the design and Innovus will fail.

@Waxpple
Copy link
Contributor Author

Waxpple commented Jun 16, 2022

In this case, if I change a design from smallboomconifg to mediumboomconfig. I will have to do the following steps?

  1. Check the design SRAMs like which SRAM macro I used and its hierarchical name/path. The used SRAMs can only be traced in the RTL code. (Or maybe there is any JSON I can read directly?)
  2. Change the placement_constraints in the example-asap7.yml.
  3. P&R

Edit: I think the same question is asked here: https://groups.google.com/d/msgid/chipyard/e8f6e896-2f58-495d-a7b0-c29acedeb82cn%40googlegroups.com.

Edit: Is the "coreSite" the same as the asap7 site? Because the "coreSite" is not defined yet.

@harrisonliew
Copy link
Contributor

harrisonliew commented Jun 16, 2022

Correct. It is definitely a bit tedious. As for placement guidelines, you should cluster ICache and DCache, and within those clusters, the tag and data arrays should be placed together. Placement/orientation needs to leave room for routing and cannot block pin access. We are working on a research project to automatically generate placement constraints from Chisel.

Ack - good catch on the coreSite. With the ASAP7 r1.7 update, they changed the site to asap7sc7p5t. I'll update the #653 PR with this fix. Edit: done, please test!

@Waxpple
Copy link
Contributor Author

Waxpple commented Jun 16, 2022

I just use genus to dump the SRAM list from ``get_db insts -if {.is_macro==true}] > sram.log.
There are 316 instances in SmallBoomConfig, so should I manually type every macro placement constraint?
It is not easy to know whether the SRAM placement is following DRC.
I know there is an open-source framework (https://github.com/google-research/circuit_training), which uses the RL model to place macro or std cells. I haven't dived in yet. Maybe It can be used in SRAM placement?

@harrisonliew
Copy link
Contributor

harrisonliew commented Jun 16, 2022

You could also just have Innovus do auto-floorplanning with the key par.innovus.floorplan_mode: auto (implementation here - it ignores all your floorplan constraints). These will not be DRC clean, however, since the plan_design doesn't place it on the valid grid.

Yes, we've also been interested in Google's placer, but we haven't had the resources to play with it.

@Waxpple
Copy link
Contributor Author

Waxpple commented Jun 16, 2022

Ack - good catch on the coreSite. With the ASAP7 r1.7 update, they changed the site to asap7sc7p5t. I'll update the #653 PR with this fix. Edit: done, please test!

I will test this PR today and let you know the result.

You could also just have Innovus do auto-floorplanning with the key par.innovus.floorplan_mode: auto (implementation here - it ignores all your floorplan constraints). These will not be DRC clean, however, since the plan_design doesn't place it on the valid grid.

Nice, I will also test this out.

Yes, we've also been interested in Google's placer, but we haven't had the resources to play with it.

I think it will be a very nice topic/feature! About resources, do you mean not enough researchers to play with it?

@harrisonliew
Copy link
Contributor

I will test this PR today and let you know the result.

Thanks!

About resources, do you mean not enough researchers to play with it?

Correct. But also, there are some claims that this doesn't actually improve upon the auto-placers in commercial tools, so that's an entire research project in itself to analyze its performance in generalized designs.

@Waxpple
Copy link
Contributor Author

Waxpple commented Jun 17, 2022

I am trying to run Genus physical flow. Have you tried this flow before?
I am still trying to figure this flow out.
If I directly add only predict_floorplan before syn_generic -physical, It will crash.
So what I am trying now is setting top and bottom routing layer and read def from innovus plan design in hammer par.tcl before syn_generic -phy. Edit: Still not working.

If we can get some physical information at the synthesis stage before really run PAR, it would be useful.

@harrisonliew
Copy link
Contributor

No, we have not yet, but is something we do want to support.

I'm not familiar with how it works, but can vlsi.inputs.placement_constraints be used to constrain the initial floorplan?

@Waxpple
Copy link
Contributor Author

Waxpple commented Jun 17, 2022

Yes, you can manually place the hard macro and let genus optimize the data path.
I got the congestion score and its hot-spot score.
After I figure this out, I will create a PR.
image

*I guess crashing when using the entire physical flow in genus is the tool bugs at version 20.10.001 or earlier versions.
But there is some workaround.

@Waxpple
Copy link
Contributor Author

Waxpple commented Jun 22, 2022

  1. I am verifying the design, but I still encounter the VCS non-stop issue in sim-syn-timing-debug.
    If I use keyboard interrupt (ctrl + c), the simulation will pass.
    If I use max-cycle to time out the simulation, the simulation will fail.
    What could cause this weird behavior?
    Edit: The behavior only occurred in timing annotation gate-sim.
  2. I confirmed that GENUS 20.10.001 is buggy with the physical flow, GENUS 21.12 runs the same flow without any problem.

@harrisonliew
Copy link
Contributor

  1. I don't know why this is happening. I suspect it could have something to do with VCS FGP? See some discussion in ASAP7 tutorial: make buildfile CONFIG=TinyRocketConfig #643 about that. Maybe try turning FGP off entirely first.
  2. Great - would you be willing to PR that into the Genus plugin? The functionality/step can be dependent on Genus >21.12.

@Waxpple
Copy link
Contributor Author

Waxpple commented Jun 23, 2022

  1. Originally, I use FGP in compilation time and did not use it in run-time. After I remove the FGP in compilation time, the simulation can stop itself with this failed message.
    Not sure what does (tohost = 5) mean.
    image
    *The timing violation at the reset stage can be ignored since we reset our design for a long period. This reset stage timing violation will cause the python subprocess thinks it is finished like this:
    image

If want to avoid the first reset cycle using only half clock period causing timing violation, modify TestDriver.v like this:
image

The waveform will be:
image

The original testbench will look like this:
image
The waveform will be
image
*PR chipsalliance/rocket-chip#3000
2. Yes, after I verify the flow and I will make a PR!

@Waxpple
Copy link
Contributor Author

Waxpple commented Jun 23, 2022

I saw a lot of people encountered this tohost=????. ucb-bar/chipyard#753
But I don't know the solution yet.
I adjust the precision of the simulation timescale from 1ns/1ps to 1ns/100ps. The tohost problem is gone.
I will test the design with the different timescales today.

VCS 2021.09

Tables timescale rv64ui-p-add dhrystone
Scenario 1 1ns/100fs N N
Scenario 2 1ns/1ps N N
Scenario 3 1ns/10ps N N
Scenario 4 1ns/100ps Y N

Edit: Maybe I should upgrade my vcs from 2021.09 to 2022.06, I will try this too.

VCS 2022.06

Tables timescale rv64ui-p-add dhrystone
Scenario 1 1ns/100fs - -
Scenario 2 1ns/1ps - -
Scenario 3 1ns/10ps - -
Scenario 4 1ns/100ps Y N

Edit: If I turn off the override timescale, the simulation will fail. I am currently turning off my physical flow to verify maybe whether is my flow causing this failure.
Edit: If I turn off my genus ispatial flow, It can pass the simulation. My theory now is that Tempus might also need to read physical data to fix timing.
Edit: Maybe the root cause of this issue is genus physical flow will use clock skew to fix timing, after I turn off the skew, It can run without issue. I am still verifying this solution.

@Waxpple
Copy link
Contributor Author

Waxpple commented Jul 10, 2022

I have a question, I am still solving the simulation failed with SDF annotation.
I try to define input/output delay (25% clock period) constraints for synthesis. It doesn't work.
I cannot see any setup time violation during synthesis but the simulation failed with timing violations.
Any suggestion?
It will fail even if I use 1ns for synthesis constraints and run the simulation with a 2ns clock period.

Edit: 1ns/10ps

--
If I use 1ns/100ps, it will pass the simulation in rv64-ui-p-add.

*I am still running other tests

Edit: Update some waveform file on chipyard mail group: https://groups.google.com/g/chipyard/c/s41ND6XJtik

@ucb-bar ucb-bar locked and limited conversation to collaborators Mar 8, 2023
@harrisonliew harrisonliew converted this issue into discussion #719 Mar 8, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants