x_mem_result_ready needs to be added for subsystem sharing #56

andermattmarco · 2022-05-06T12:25:40Z

In order to support the sharing of a subsystem between multiple cores using the x-interface, an x_mem_result_ready would have to be added. The current spec states that there is no ready because the subsystem must be ready to accept the memory result at any time. With multiple cores, this cannot be guaranteed as it would then even be possible for multiple cores to raise the x_mem_valid at the same time. The subsystem would in this case only be able to react to one of those data packets, violating the property that it must be ready to accept the memory result at any time for every core raising the valid.

davideschiavone · 2022-05-06T12:28:53Z

I would add that if you want, you can instantiate multiple FIFOs or so, however, this won't solve the problem even with a single CPU as you force the accelerator to pay the AREA overhead of FIFOs, especially for those FPUs which are slow and take multiple cycles to be ready - so I suggest to add the READY signal anyway, and for those accelerators that wants to go faster, they can keep READY equal to 1 all the time

davideschiavone · 2022-05-06T12:29:19Z

@moimfeld , @Silabs-ArjanB , @michael-platzer what do you think?

Silabs-ArjanB · 2022-05-06T14:56:38Z

Hi @andermattmarco Adding a ready on this interface would be really difficult to support on a core like the CV32E40P or CV32E40X as there is also no ready signal related to its data_rvalid_i (which gets forwarded onto x_mem_result_valid). If we would add a ready signal on the on the memory result interface we need a ready signal on the OBI data interface, then we need one on the interconnect, etc.

Of course the X interface has been defined as a point-to-point protocol so in principle the issue you describe cannot occur. If you want to build an interconnect nonetheless, then you can use the memory (request/response) interface (which has both valid and ready signals) to limit the number of possible x_mem_result_valid that can possibly occur and you can provide the necessary buffering inside your interconnect to make sure you can always immediately accept a x_mem_result_valid.

moimfeld · 2022-05-07T09:09:50Z

Hi @Silabs-ArjanB
I disagree that it would be difficult to add cv32e40p (and probably also in the cv32e40x). You could add a buffer for x-interface memory result transactions on the core-side, and only accept a certain number of memory requests from the x-interface (the number of memory requests that can be accepted should be equal to the buffer depth). However, this buffering would add unwanted/unnecessary logic in the case we only use the core in a point-to-point situation, or if the coprocessor is always ready.

But I like your solution of handling the buffering / adding the handshaking on the interconnect level, such that the core-side implementation stays lightweight.

michael-platzer · 2022-05-09T12:11:21Z

Hi,

I agree with @moimfeld that it would be possible to implement an x_mem_result_ready by buffering the memory results within the main core. However, for systems with long memory access latency that could either have an impact on performance or consume a significant amount of resources, particularly if the attached coprocessor is a vector unit which might issue a bunch of memory requests in a row.

If the memory result buffer within the main core is small, then the main core will only be able to accept a few memory requests before de-asserting mem_ready, thus stalling further requests until the first memory results are received from the memory bus. For a vector coprocessor this would imply frequent stalling during vector loads and stores and thus a reduced performance.

If the memory result buffer within the main core is large, such that a series of contiguous memory requests can be accepted without de-asserting mem_ready, then the performance would not be degraded. However, that buffer could consume a significant amount of resources, particularly if the memory bus is wider than 32 bit.

@andermattmarco I am wondering in which situation it is beneficial to connect the XIF memory interface to multiple cores? I guess that the memory requests of the individual cores will end up being routed to the same memory bus, so maybe it makes more sense to have only one core attached to the XIF memory interface that takes care of all XIF memory requests? I understand that in a multi-core system you might want to have several cores that use the XIF issue, commit and result interfaces, but the memory and memory result interfaces could be connected to one core only.

davideschiavone · 2022-05-09T16:34:14Z

it actually makes sense what you suggest @michael-platzer ! however, it still makes sense to me explore also the sharing of the mem interface, but invite @andermattmarco to take your option into account

Silabs-ArjanB · 2022-05-09T17:00:14Z

If one core would handle the XIF memory requests that are related to the issue interface of another core, then we would be completely changing the protocol. I still think that the most logical place to address the issue is within the interconnect (which introduced the issue).

davideschiavone assigned Silabs-ArjanB May 6, 2022

davideschiavone added the enhancement New feature or request label May 6, 2022

Silabs-ArjanB removed their assignment Jun 14, 2022

christian-herber-nxp assigned zarubaf Oct 4, 2023

christian-herber-nxp added post-v1.0 To be fixed after v1.0.0 release memory-if Memory Interface labels Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x_mem_result_ready needs to be added for subsystem sharing #56

x_mem_result_ready needs to be added for subsystem sharing #56

andermattmarco commented May 6, 2022

davideschiavone commented May 6, 2022 •

edited

Loading

davideschiavone commented May 6, 2022

Silabs-ArjanB commented May 6, 2022

moimfeld commented May 7, 2022

michael-platzer commented May 9, 2022

davideschiavone commented May 9, 2022

Silabs-ArjanB commented May 9, 2022

x_mem_result_ready needs to be added for subsystem sharing #56

x_mem_result_ready needs to be added for subsystem sharing #56

Comments

andermattmarco commented May 6, 2022

davideschiavone commented May 6, 2022 • edited Loading

davideschiavone commented May 6, 2022

Silabs-ArjanB commented May 6, 2022

moimfeld commented May 7, 2022

michael-platzer commented May 9, 2022

davideschiavone commented May 9, 2022

Silabs-ArjanB commented May 9, 2022

davideschiavone commented May 6, 2022 •

edited

Loading