Replies: 2 comments 2 replies
-
See you there! 🍻 😉
I really like the idea of having a tiny microcontroller-like setup with a compact address space. Somehow, this was the initial idea of the entire NEORV32 project. So I am highly interested in your concept! Implementing a real/modified Harvard architecture would definitively boost execution speed as instruction fetch and load/store accesses could run in parallel accessing their unique memory instances. Having a real crossbar instead of the current bus mux is still on my TODO list 😉 However, I am not really sure how to implement this in an efficient way... Btw, I think you'll need a modified Harvard architecture here. The CPU's load/store interface needs to access the IMEM as well - for writingthe executable during boot and also for reading constant data (like strings) during execution.
I moved all the IO devices to the very end of the address space so they do not "pollute" the rest of the address space. Adding memories (with custom sizes and custom base addresses) seems to be much more straight forward (and easier as well) as you do not have to worry about overlapping address regions. When implementing the on-chip debugger I finally recognized the "beauty" of this concept since the OCD firmware also makes extensive use of "zero + immediate" addressing 😉 |
Beta Was this translation helpful? Give feedback.
-
i'll be presenting a poster at the upcoming risc-v europe summit which introduces a programming language (Em) designed explicitly for resource-constrained embedded processors.... in the case of risc-v, i'm especially interested in working with projects that address the "low-end" of the eco-system: simple CPUs; limited memory; small silicon footprint; etc....
here are links to my presentation abstract and poster for reference....
assuming (as is often the case with Em) that 32K of memory is sufficient for all embedded code+data, how might we approach the design of a risc-v mcu that is equally compact????
for one, we do NOT require a cache (which adds plenty of logic elements); and the flash memory is really present as "bulk storage" for code+data that will ultimately be loaded into (a pair of) "tightly-coupled SRAMs".... boot-ROM functions that copy between flash storage and SRAM would not only be used at reset, but would be available to the application itself to dynamically load additional code+data.... boot-ROM functions can also support erasing/writing the bulk flash, enabling the application to log data at runtime as well as updating firmware images....
another implication of Em is that we can use a compact address space which enables optimizations in SW and HW.... since code+data space is "small", the instructions required to access this code+data are likewise more compact; in practice, all of memory can be reached by immediate (12-bit) offsets in the instruction itself.... the impact of a compact address space is further magnified by enabling the risc-v compressed instruction extension....
at the HW level, early experiments suggest that many (most??) of the 32-bit address vectors in the current RTL can be narrowed to 16-bits or less.... coupled with the elimination of the I/D caches, i'd also like to consider having my tightly-coupled IMEM/DMEM blocks actually support the CPU's harvard architecture -- which would certainly increase execution efficiency....
i've also discovered a rather efficient pattern for optimizing the peripheral devices.... while i was originally thinking of moving the peripheral address space down to lower memory, their current location near the top of the address space has some real benefits.... often, peripherals are placed "somewhere" in the upper-half of the address space; but by placing ALL of the peripheral registers within the last 4K of the address space, individual registers can be accessed with a single instruction using a negative 12-bit offset from the
zero
register!!!!! @stnolting -- i don't know if this was intentional, but it's surely brilliant 😉anyway, the general plan is to create my own
neorv32_em
sandbox and then start hacking away at some top-level modules.... my ultimate benchmark will be the number of LUTs needed to synthesizeneorv32_em
.... at this time, i have no idea how much of a silicon footprint reduction is even possible with this design....finally, i would welcome any insights/suggestions/questions about the idea of a compact address space....
Beta Was this translation helpful? Give feedback.
All reactions