You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with #619 providing some background, here's a proposal for adding a new CAS_EN generic to neorv32_cpu (with a default value of false for backward-compatibility).... if CAS_EN <= true, then all of the neorv32_cpu_* modules can assume we're operating within a logical 17-bit address space described in #619.... for review, bits (16 downto 15) of a compact address select among four distinct sub-regions:
b"00" is the IMEM space
b"01" is the BOOT space
b"10" is the DMEM space
b"11" is the PERI space
bits (14 downto 0) are then used within respective implementations of these sub-spaces.... note that all of the 15 lower-order bits are not necessarily needed when reading/writing an addressed location; the PERI space, for instance, can surely use fewer bits to reduce wires as well as to simplify address decode....
when CAS_EN => true, the (32-bit) addresses contained within the ibus_req_o and dbug_req_o ports of neorv32_cpu are effectively 17-bit compressed addresses with bits (31 downto 17) <= '0'.... rather than adding a parallel cas_req_t with only 17-bits of address, i believe we can simply reuse the current bus_req_t type and simply re-define the semantics of its 32-bit address.... outside of neorv32_cpu, only bits (16 downto 0) of these ports will actually be used; hopefully the synthesis tools can figure out that bits (31 downto 17) are always 0 and never used.... as noted above, we might route even fewer of the remaining address lines to other modules....
inside the CPU complex, we can hopefully reduce synthesized logic when CAS_EN => true.... for example, we "know" that any instruction address (eg, current/next PC) only requires a 16-bit register; and any data address being fetched can be staged in a 17-bit register.... note that this optimization does NOT reduce the size of the data (32 bits) being read/written to these compact addresses....
an alternative (easier???) implementation might simply assume that ALL instruction fetches are within 0x0000 up to 0xffff.... this would presumably still eliminate some logic elements within the CPU complex.... i could then perform the final reduction from 32-bits down to 17-bits outside of the CPU complex -- in a re-designed neorv32_buswitch module for instance....
perhaps this is the smallest step we can take that would still yield some reduction in LUTs.... said another way, CAS_EN => true simply asserts that ALL instruction fetches will address the lower 64K of the address space....
comments????
The text was updated successfully, but these errors were encountered:
I really like the concept of the "compressed" address space. But I am not sure how to integrate that into the current setup of the core / project.
I have been thinking about a rework of the internal bus system. @agamez made some great suggestions in #576 (which is still pending). A centralized interconnect that takes care of the address decoding might be a good thing to start with. This would make customization of the address map (as for your proposal) much simpler as there would be only one instance that needs to be customized.
I have been testing some VHDL constructs to setup the processor's address map as single array of records, but I am still not sure how to handle some language-specific aspects.
Anyway, a central address decoding should be the first thing to do for supporting your approach (and also for implementing #576).
I have been thinking about simplifying the address space and decoding... Right now the address space of the IO modules / peripheral is densely packed making it hard to add further addresses or to relocate entire modules.
So how about this: 256 bytes of address space for every module.
right now only the debug memory and the CFS really use an address space of 256 bytes
if a module just implements 2 32-bit registers the remaining addresses will just "mirror" these two registers
we can use bits [12:8] of the address word for easy selection of the accessed IO module (allowing 2^32 modules)
with #619 providing some background, here's a proposal for adding a new
CAS_EN
generic toneorv32_cpu
(with a default value offalse
for backward-compatibility).... ifCAS_EN <= true
, then all of theneorv32_cpu_*
modules can assume we're operating within a logical 17-bit address space described in #619.... for review, bits(16 downto 15)
of a compact address select among four distinct sub-regions:b"00"
is theIMEM
spaceb"01"
is theBOOT
spaceb"10"
is theDMEM
spaceb"11"
is thePERI
spacebits
(14 downto 0)
are then used within respective implementations of these sub-spaces.... note that all of the 15 lower-order bits are not necessarily needed when reading/writing an addressed location; thePERI
space, for instance, can surely use fewer bits to reduce wires as well as to simplify address decode....when
CAS_EN => true
, the (32-bit) addresses contained within theibus_req_o
anddbug_req_o
ports ofneorv32_cpu
are effectively 17-bit compressed addresses with bits(31 downto 17) <= '0'
.... rather than adding a parallelcas_req_t
with only 17-bits of address, i believe we can simply reuse the currentbus_req_t
type and simply re-define the semantics of its 32-bit address.... outside ofneorv32_cpu
, only bits(16 downto 0)
of these ports will actually be used; hopefully the synthesis tools can figure out that bits(31 downto 17)
are always 0 and never used.... as noted above, we might route even fewer of the remaining address lines to other modules....inside the CPU complex, we can hopefully reduce synthesized logic when
CAS_EN => true
.... for example, we "know" that any instruction address (eg, current/next PC) only requires a 16-bit register; and any data address being fetched can be staged in a 17-bit register.... note that this optimization does NOT reduce the size of the data (32 bits) being read/written to these compact addresses....an alternative (easier???) implementation might simply assume that ALL instruction fetches are within
0x0000
up to0xffff
.... this would presumably still eliminate some logic elements within the CPU complex.... i could then perform the final reduction from 32-bits down to 17-bits outside of the CPU complex -- in a re-designedneorv32_buswitch
module for instance....perhaps this is the smallest step we can take that would still yield some reduction in LUTs.... said another way,
CAS_EN => true
simply asserts that ALL instruction fetches will address the lower 64K of the address space....comments????
The text was updated successfully, but these errors were encountered: