Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fw ram 4k poc #251

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Fw ram 4k poc #251

wants to merge 2 commits into from

Conversation

secworks
Copy link
Contributor

Description

This is a Proof of Concept PR that tries to double the size of the FW_RAM by adding two more banks of instantiated block RAM.

The change allocates 70 LCs for more complex decode logi. And four extra EBRs, as expected.
The big issue is the effect on max clock frequency. From 22.59 MHz in main, to 20.49 MHz for the PoC.

The problem quite probably is due to routing. The EBRs are spread out evenly over the FPGA fabric in the die. We already allocate 70% of all EBRs, and with the PoC, we push this to 80% (24 out of 30). This means that the address bus and the data buses have to stretch long distances (and through several switches). One could try and mitigate this by adding registers at each end of the buses. This would allow for one cycle for transport. The consequence is allocation of more LCs, and that access latency to the FW_RAM would at least double. But we don't execute code from the FW_RAM, so this may be ok.

The PoC doesn't include such an attempt. The description here is just to document the result of the PoC and possible fix of the clock frequency issues.

Fixes # (issues)
A cramped FW_RAM

Type of change

Please tick any that are relevant to this PR and remove any that aren't.

  • Bugfix (non breaking change which resolve an issue)
  • Feature (non breaking change which adds functionality)
  • Breaking Change (a change which would cause existing functionality to not work as expected)
  • Documentation (a change to documentation)

Submission checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my changes
  • I have tested and verified my changes on target
  • My changes are well written and CI is passing
  • I have squashed my work to relevant commits and rebased on main for linear history
  • I have added a "Co-authored-by: x" if several people contributed, either pair programming or by squashing commits from different authors.
  • I have updated the documentation where relevant (readme, dev.tillitis.se etc.)
  • QEMU is updated to reflect changes

Signed-off-by: Joachim Strömbergson <[email protected]>
Signed-off-by: Joachim Strömbergson <[email protected]>
@mchack-work
Copy link
Member

Does this implementation come at the expense of the possibility of more ROM for firmware? I think that might be needed with the expanded functionality expected in firmware for flash access.

Current firmware is at ~4 kByte and available ROM is ~6 kByte. Feels tight.

@secworks
Copy link
Contributor Author

secworks commented Aug 5, 2024

Does this implementation come at the expense of the possibility of more ROM for firmware? I think that might be needed with the expanded functionality expected in firmware for flash access.
Current firmware is at ~4 kByte and available ROM is ~6 kByte. Feels tight.

Yes, as we can see from the PoC, the clock frequency drops quite substantially with increased FW_RAM. It is probably the case that allocating more space for FW_ROM will have a similar effect. And that is without also increasing FW_RAM.

But in general it seems that when it comes to EBRs, we are at the limit of what can reasonable be allocated. And it probably caused by routing. The EBRs are spread out over the die. And the place & route-tool (NextPNR) can't allocate EBRs close to the logic that uses them, and need to allocate EBRs far away (on the die). And we are probably using a lot of interconnect, routing resources.

My guess is that allocating more space for FW_ROM will have less severe impact, since it doesn't need a data bus for writing. We could do a PoC to find out.

@secworks
Copy link
Contributor Author

secworks commented Aug 5, 2024

A fast speedrun of different FW_ROM sizes:
6 kByte: 21 EBRs. 22.59 MHz (what we have today)
7 kByte: 23 EBRs. 22.58 MHz
8 kByte: 25 EBRs. 20.09 MHz

@mchack-work
Copy link
Member

Wow. 8 kByte ROM means we can't use the new 21 MHz clock we want. Dammit.

Is this speedrun on extended FW_ROM done on this branch with extended FW_RAM? What is we keep FW_RAM at 2 kByte and expand ROM instead? Can we do 21 MHz then?

After that we can possibly decide which is more important for new persistent storage handling.

@secworks
Copy link
Contributor Author

The RAM expansion is done without expanded fw_ram - not expanded fw_ram and expanded ROM.
We can increase ROM slightly, but basically not fw_ram.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants