compact address space #619

biosbob · 2023-05-22T18:29:27Z

biosbob
May 22, 2023
Collaborator

i'll be presenting a poster at the upcoming risc-v europe summit which introduces a programming language (Em) designed explicitly for resource-constrained embedded processors.... in the case of risc-v, i'm especially interested in working with projects that address the "low-end" of the eco-system: simple CPUs; limited memory; small silicon footprint; etc....

here are links to my presentation abstract and poster for reference....

assuming (as is often the case with Em) that 32K of memory is sufficient for all embedded code+data, how might we approach the design of a risc-v mcu that is equally compact????

for one, we do NOT require a cache (which adds plenty of logic elements); and the flash memory is really present as "bulk storage" for code+data that will ultimately be loaded into (a pair of) "tightly-coupled SRAMs".... boot-ROM functions that copy between flash storage and SRAM would not only be used at reset, but would be available to the application itself to dynamically load additional code+data.... boot-ROM functions can also support erasing/writing the bulk flash, enabling the application to log data at runtime as well as updating firmware images....

another implication of Em is that we can use a compact address space which enables optimizations in SW and HW.... since code+data space is "small", the instructions required to access this code+data are likewise more compact; in practice, all of memory can be reached by immediate (12-bit) offsets in the instruction itself.... the impact of a compact address space is further magnified by enabling the risc-v compressed instruction extension....

at the HW level, early experiments suggest that many (most??) of the 32-bit address vectors in the current RTL can be narrowed to 16-bits or less.... coupled with the elimination of the I/D caches, i'd also like to consider having my tightly-coupled IMEM/DMEM blocks actually support the CPU's harvard architecture -- which would certainly increase execution efficiency....

i've also discovered a rather efficient pattern for optimizing the peripheral devices.... while i was originally thinking of moving the peripheral address space down to lower memory, their current location near the top of the address space has some real benefits.... often, peripherals are placed "somewhere" in the upper-half of the address space; but by placing ALL of the peripheral registers within the last 4K of the address space, individual registers can be accessed with a single instruction using a negative 12-bit offset from the zero register!!!!! @stnolting -- i don't know if this was intentional, but it's surely brilliant 😉

anyway, the general plan is to create my own neorv32_em sandbox and then start hacking away at some top-level modules.... my ultimate benchmark will be the number of LUTs needed to synthesize neorv32_em.... at this time, i have no idea how much of a silicon footprint reduction is even possible with this design....

finally, i would welcome any insights/suggestions/questions about the idea of a compact address space....

stnolting · 2023-05-31T19:18:59Z

stnolting
May 31, 2023
Maintainer

i'll be presenting a poster at the upcoming risc-v europe summit

See you there! 🍻 😉

assuming (as is often the case with Em) that 32K of memory is sufficient for all embedded code+data, how might we approach the design of a risc-v mcu that is equally compact????

I really like the idea of having a tiny microcontroller-like setup with a compact address space. Somehow, this was the initial idea of the entire NEORV32 project. So I am highly interested in your concept!

Implementing a real/modified Harvard architecture would definitively boost execution speed as instruction fetch and load/store accesses could run in parallel accessing their unique memory instances. Having a real crossbar instead of the current bus mux is still on my TODO list 😉 However, I am not really sure how to implement this in an efficient way...

Btw, I think you'll need a modified Harvard architecture here. The CPU's load/store interface needs to access the IMEM as well - for writingthe executable during boot and also for reading constant data (like strings) during execution.

CPU.I----->MUX----->IMEM
            ^
            |
CPU.D-------+------>DMEM

@stnolting -- i don't know if this was intentional, but it's surely brilliant 😉

I moved all the IO devices to the very end of the address space so they do not "pollute" the rest of the address space. Adding memories (with custom sizes and custom base addresses) seems to be much more straight forward (and easier as well) as you do not have to worry about overlapping address regions.

When implementing the on-chip debugger I finally recognized the "beauty" of this concept since the OCD firmware also makes extensive use of "zero + immediate" addressing 😉

2 replies

biosbob Jun 1, 2023
Collaborator Author

one major architectural decision resolves around "addr-width" on the various busses.... right now, of course, all addrs are 32-bits in width....

my current thinking is that all addressible entities (IMEM, DMEM, BOOT, PERI) could live within a (logical) 17-bit address space organized as follows:

    0x00000 - 0x07FFF    IMEM  (rw)
    0x08000 - 0x0BFFF    XIP window  (ro)
    0x0C000 - 0x0FFFF    boot ROM  (ro)
    0x10000 - 0x17FFF    DMEM  (rw)
    0x18000 - 0x1FFFF    peripherals  (rw)

from a software perspective (including the compiler), the program lives in a full (although VERY sparsely populated) 32-bit address space in which "instruction space" starts at 0x00000000, "data space" starts at 0x80000000, and "i/o space" starts at 0xFFFFFF00.... for reasons noted above, this allocation enables efficient access to critical i/o registers through negative immediate offsets....

assuming that CPU.I and CPU.D have 32-bit addresses, one could imagine some sort "address compressor" that effectively maps a 32-bit addr vector into a 17-bit equivalent.... using today's architecture, one could essentially reduce the soc_bus to just 17 address lines; everything connected to the bus today (IMEM, DMEM, BOOT, PERI) could now decode read/write requests in the same manner....

presumably there is some savings here -- fewer wires connecting components external to the CPU????

taking this one step further, maybe the "address compressor" also could serve as a router that is directly connected to IMEM/DMEM/BOOT as well as an even smaller PERI bus.... even the connections to IMEM/DMEM/BOOT could get by with fewer than 17 address lines.... and as @stnolting pointed out above, this could be where the MUX is implemented....

standing back, would this approach of an external address compressor yield some significant savings in terms of size????

looking further down the road, one could imagine some internal changes to the CPU complex -- knowing, for instance, that the PC will always be between 0x0000 - 0xFFFF.... obviously the CPU registers will remain 32-bits in width; but any storage of the current/next PC could be reduced.... said another way, once the CPU complex itself becomes aware of the logical 17-bit address space, it then becomes possible to drive optimizations deeper in the design....

thoughts????

biosbob Jun 10, 2023
Collaborator Author

i had a chance to meet @stnolting face-to-face this past week, where we talked in greater detail about some of the ideas suggested above.... bottom line -- we're moving forward 😉

i learned about an upcoming change that will enable customization of peripheral base addresses.... can't wait for this to be released, as it will enable me to further compress the PERI region of my compact 17-bit address space even more.... once this feature is released, i share some results on the general impact of this feature on instruction selection by the compiler....

@stnolting and i also talked about further compression of the PERI region by effectively removing the lower b"00" bits from the address of each memory-mapped register.... said another way, each register address would be byte-aligned even though the read/write data will still be 32-bits wide.... more compression, fewer wires needed to connect peripherals, simpler address checking, etc....

@stnolting and i finally talked about the impact of "narrowing" 32-bit addresses to just 17-bits inside the CPU complex itself.... at this time, i'm study the VHDL in the relevant modules and making some simple experiments.... having never done anything like this before, i hope to be able to tap the expertise of @stnolting here 😉

from what i learned, the CPU complex will remain relatively stable for the rest of this year -- enabling exploration of these ideas on a branch that should be relatively easy to keep in sync with the rest of neorv32.... if we eventually want to add CAS (compact address support) to neorv32, we should try to make this capability yet-another parameter selected when instantiating the CPU....

i'll open up a separate "feature request" with a (hopefully) concise description of a proposed CAS_EN generic added to neorv32_cpu; implementation of this capability will undoubtedly require expertise from @stnolting .... as a working process, i'll continue to use this thread for "high-level" thoughts while creating specific feature requests as they materialize....

stnolting · 2023-06-30T16:52:02Z

stnolting
Jun 30, 2023
Maintainer

-> #629

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compact address space #619

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

compact address space #619

biosbob May 22, 2023 Collaborator

Replies: 2 comments · 2 replies

stnolting May 31, 2023 Maintainer

biosbob Jun 1, 2023 Collaborator Author

biosbob Jun 10, 2023 Collaborator Author

stnolting Jun 30, 2023 Maintainer

biosbob
May 22, 2023
Collaborator

Replies: 2 comments 2 replies

stnolting
May 31, 2023
Maintainer

biosbob Jun 1, 2023
Collaborator Author

biosbob Jun 10, 2023
Collaborator Author

stnolting
Jun 30, 2023
Maintainer