Replies: 1 comment 2 replies
-
Hey @biosbob!
I think the bus architecture is the relevant factor here. I have no idea about your ibex setup but maybe it uses a Harvard architecture with separated instruction and data memories and buses. So there are no wait state if instruction fetch and data port access memory at the same time. NEORV32 uses a von-Neumann approach. Instruction fetch and data access use a single bus so there is a congestion when both ports are accessing memories at the same time. The caches can help to reduce thos congestion. Another interesting factor would be the FPGA utilization of the two setups. Furthermore, the maximum clock frequency of both cores would be interesting. Of course, these can only be compared when both setups were synthesized for the same technology/platform.
I feel you 😉 The Lattice ice FPGAs are one the tiniest FPGAs available out there. I like like them and they are capable enough for small RISC-V SoCs. But if you want to implement more complex SoCs you'll might need to switch to a different FPGA / vendor. |
Beta Was this translation helpful? Give feedback.
-
i have a (portable) CFFT that operates on Q15 real/imag values, which i've used to benchmark cycles-per-instruction on neorv32 versus ibex (zero_riscy).... for the latter, i'm using the x-heep project....
the CFFT itself is written in (portable) Em and has been used on a variety of CPUs (RISC-V, ARM M0+, etc).... the code itself is quite compact (<300 bytes) and executes a 128-point transform in ~1-3 ms depending upon the CPU architecture and its clock-speed....
in my neorv32 vs ibex benchmark, the total number of instructions (captured via
minstret
) in both executable images is 22564.... the architecture of both CPUs isrv32imc
; and in the case of neorv32, i've configuredFAST_MULT_EN
andFAST_SHIFT_EN
as true.... both images are executing out of SRAM (IMEM+DMEM); needless to say, the images are quite small....here are the measured cycle counts (captured via
mcycle
):one factor that might explain the ~2x difference is the FPGA itself: ibex is running on a PYNQ-Z2 board (which i believe has multiple SRAM banks); neorv32 is running on an iCEBreaker board (and using a single SRAM bank as IMEM and DMEM)....
question: does the neorv32 CPU (like ibex) have a "harvard architecture" in which IMEM and DMEM could be accessed simultaneously??? and if so, what suggestions would you have for an FPGA board (other than PYNQ!!!!) i could use for this and other benchmarks....
FWIW -- i'm using >90% of the LUTs on my iCEBreaker board, and have LOTS of other stuff i still want to add into my SoC....
Beta Was this translation helpful? Give feedback.
All reactions