[RFC][DNM] DMM: Skeleton for a DMA Memory Management subsystem #57602

carlocaione · 2023-05-05T13:01:10Z

This is a hopefully-to-be-discussed proposal for an actual implementation of what was discussed in #57220 (please read that before keep reading).

TL;DR: Zephyr is currently missing a way to allocate and prepare buffers to be used by DMA engines and DMA-enabled peripherals on platforms with a complex memory organization. This PR is a proposal for a subsystem interface dealing with that.

In memory bus architectures, CPU running software is one of the masters accessing the bus, while DMA engines are other masters accessing the same (or other, but connected) bus. In basic MCUs there is a simple bus architecture, where CPU and DMA engines can access the whole memory range, no caches are present, and memory addressing is consistent between all the bus masters. The memory access in these devices can be summarized with the following rules: * CPU and DMA have access to the whole memory space * memory address used by CPU to point to any memory location A is equal to the memory address used by DMA to point to the memory location A * any data stored in memory by DMA is immediately visible to the CPU * any data stored in memory by the CPU is immediately visible to DMA In more complex systems with a more complex memory architecture the interaction between CPU and DMA, or between multiple CPUs, can be complicated: * DMA can be restricted to access only part of the system memory (or only part of the memory can be accessed efficiently by DMA) * CPU may need to translate its buffer address (virtual or physical) to an address (or a set of addresses if the buffer is not continuous in the bus memory space) usable by DMA before passing it to a DMA engine (to take into account cache line alignment, memory region allocation, etc...) * CPU cache can contain stale data and CPU may need to invalidate cache to read the data that DMA or other CPU updated in the memory * Data intended by a CPU to be stored in memory can get stuck in a CPU's write-back cache and be invisible to DMA or other CPUs so the CPU has to flush caches before informing DMA or other CPUs about data availability in the memory All of the discussed challenges must be addressed by the software running in a system with a complex memory architecture. In this patch I'm proposing to add a new sub-system called DMM - DMA Memory Management with the following responsabilities: * allocating and freeing "bounce buffers" if the buffer provided by a driver user cannot be used by the DMA engine (memory accessible by DMA, aligned to DMA requirements, if cacheable: aligned and padded to cache lines), * copying data to and from the bounce buffers * translation of CPU buffer addresses (virtual or physical) to DMA-usable addresses * cache management if buffers are allocated in cacheable memory Signed-off-by: Carlo Caione <[email protected]>

hubertmis · 2023-05-05T13:54:21Z

include/zephyr/dmm/dmm.h

+ *  - copy the data from the user-provided buffer to the bounce buffers
+ *  - translate CPU buffer addresses to DMA addresses if needed
+ *  - do the proper cache management if buffers are allocated in cacheable
+ *    memory (for example flushing cache lines associated with the memory


nit: flushing is not necessary

Suggested change

* memory (for example flushing cache lines associated with the memory

* memory (for example writing back cache lines associated with the memory

hubertmis · 2023-05-05T13:56:19Z

include/zephyr/dmm/dmm.h

+ *    memory (for example flushing cache lines associated with the memory
+ *    backing the DMA accessible buffers)
+ *
+ *  @param[in] dev Device pointer


Which device is this? The device that is to perform DMA transfer from the buffer pointed by dma_handle?

Correct, that is the idea.

Could you add this information somewhere to the docstring to make it clear?

hubertmis · 2023-05-05T13:57:02Z

include/zephyr/dmm/dmm.h

+ *    backing the DMA accessible buffers)
+ *
+ *  @param[in] dev Device pointer
+ *  @param[in] flags Flags


Are these flags defined somewhere?

Could you clarify here sth like

Suggested change

* @param[in] flags Flags

* @param[in] flags Flags reserved for future use

hubertmis · 2023-05-05T13:59:38Z

include/zephyr/dmm/dmm.h

+ *  This function is used to translate the user-provided buffer into something
+ *  that is usable for a DMA outbound data transfer and return that to the user
+ *  as an opaque DMA handle (usually a physical/bus address).
+ *


I would add a paragraph that the user cannot access buffer pointed by buffer or dma_handle from the time this function is called until dmm_release_buffer_for_dma_out() is called. Access includes reading from this buffer.

Also I would clarify that dmm_release_buffer_for_dma_out() must be called after the DMA transfer is finished. Not calling this function can lead to memory leaks.

hubertmis · 2023-05-05T14:05:06Z

include/zephyr/dmm/dmm.h

+ *  @param[in] dev Device pointer
+ *  @param[in] flags Flags


Similar ambiguity like in the outbound transfer function

hubertmis · 2023-05-05T14:08:18Z

include/zephyr/dmm/dmm.h

+ *  This function is used to translate the user-provided buffer into something
+ *  that is usable for a DMA inbound data transfer and return that to the user
+ *  as an opaque DMA handle (usually a physical/bus address).
+ *


I would add a paragraph that the user cannot access buffer pointed by buffer or dma_handle from the time this function is called until dmm_release_buffer_for_dma_in() is called.

Also I would clarify that dmm_release_buffer_for_dma_in() must be called after the DMA transfer is finished, before data is read by CPU.

teburd

The inherent problem seems to be someone wants to use a buffer that isn't aligned or in the right region for DMA. Shouldn't we make that easier to sort out statically rather than defining a new allocation interface for this task?

Can we use device tree to describe the requirement for a buffer with a particular peripheral + dma pair I wonder. Something like...
DT_SHARED_BUF(dma0, spi0) uint8_t some_spi_buf[128];
Where all the correct compiler attributes are supplied for alignment, placement, and size.

andyross · 2023-05-05T15:10:00Z

Can we use device tree to describe the requirement for a buffer

That idea I like very much. Puts a lot of preprocessor work onto the platform/controller layer, but cleanly extends to all sorts of nonsense like alternate memory spaces (even madness like iommu's or 24 bit ISA DMA!) that are otherwise really hard to abstract. And it falls back to obvious behavior for "it's just memory" systems that tend to be the common case.

hubertmis · 2023-05-05T15:15:05Z

@teburd, do I understand correctly that you suggest to move the responsibility to manage memory buffers from a device driver to its user? If that's the case, let's consider following application code:

struct spi_buf_set prepare_spi_tx_buffer(void) {
  static const struct spi_buf buf = {
    .buf = "Hello world",
    .len = sizeof("Hello world"),
  };

  struct spi_buf_set set = {
    .buffers = &buf,
    count = 1,
  }

  return set;
}

If SPI device driver takes care of copying buf content to a bounce buffer accessible by SPI's DMA, like @carlocaione suggested, then this code is portable. However, if the SPI API user was responsible for managing memory buffers, it would need to be modified to sth like:

struct spi_buf_set prepare_spi_tx_buffer(void) {
  static const struct spi_buf buf = {
    .buf = allocate_spi_buf(sizeof("Hello world")),
    .len = sizeof("Hello world"),
  };

  if (!buf.buf) {
    // Handle fault
  }

  memcpy(buf.buf, "Hello world", sizeof("Hello world"));

  cache_write_back(buf.buf, sizeof("Hello world"));

  buf.buf = translate_cpu_address_to_bus_address(buf.buf);

  struct spi_buf_set set = {
    .buffers = &buf,
    count = 1,
  }

  return set;
}

void release_spi_tx_buffer(struct spi_buf_set set) {
  void *cpu_address = translate_bus_address_to_cpu_address(set.buffers[0].buf); // If such translation is possible
  free_spi_buf(cpu_address);
}

Portability is an important software attribute, especially in hardware agnostic projects like Zephyr. That's why I think the responsibility for managing memory buffers for DMA should be hidden from application code and implemented in device drivers.

Applications should have a possibility to improve performance and use buffer allocation methods meeting buffer memory management requirements to prevent device drivers from copying data to bounce buffers. But it should be an option, not a requirement for an application.

andyross · 2023-05-05T17:38:49Z

@hubertmis the problem with that kind of scheme is that once we decide on a dynamic allocation API we can't go back. This isn't linux, and we don't get to assume we're running on devices with copious virtual memory available for stuff. Existing Zephyr paradigms work very hard (to be fair, occasionally too hard) to enable static allocation for everything an application might want.

@teburd's idea is very much in that vein: it's an API that allows an app to say "I want a buffer that will work for DMA with this device", and get it, without having to do much more than know that the device exists and call whatever acquire/release API we decide on around code that mutates it.

hubertmis · 2023-05-08T08:53:56Z

@teburd , could you draft your proposal? I would like to understand it better to be able to compare properties of both solutions

teburd · 2023-05-11T14:10:02Z

@teburd , could you draft your proposal? I would like to understand it better to be able to compare properties of both solutions

I think looking at the imxrt 685 is useful here, as it has many bus masters including two dmas, usb hs, sdhc, and one arm m33 core, one xtensa hifi4 core.

E.g. lets consider configuring dma0 with i2s0 (flexcomm) and now add in some buffer requirements to a device tree node...

i2c0: &flexcomm1 {
    status = "okay";                                                                                                                                                                                                                                                                 
    compatible = "nxp,lpc-i2s";                                                                                                                                                                                                                                                      
    #address-cells = <1>;                                                                                                                                                                                                                                                            
    #size-cells = <0>;                                                                                                                                                                                                                                                               
    dmas = <&dma0 2>;                                                                                                                                                                                                                                                                
    dma-names = "rx";
    buf-align = <4>; /* byte alignment */
    buf-size-align = <4>; /* byte size alignment, e.g. must be a multiple of X bytes in length */
    buf-region = <&sram0 4>; /* choose a bank of sram for placement, or maybe this uses a span of addresses in the sram, but usually its banks in the data sheets */
    buf-cache = <cache-flush cache-invalidate>; /* What to do when shared with the device and from the device */                                                                                                                                                                                                                                                       
    pinctrl-0 = <&pinmux_flexcomm1_i2s>;                                                                                                                                                                                                                                             
    pinctrl-names = "default";
}

Then build some macros

/**
 * @brief Shared buffer
 */
#define DT_SHARED_BUF(bus_ctlr, bus_periph) __attributes__(aligned(DT_PROP_OR(buf_periph, buf-align, DT_PROP_OR(buf_ctlr, buf-align, WORD_ALIGN))), section(DT_SRAM_SECTION_OR(buf_periph, buf-region, DT_SRAM_SECTION_OR(bus_ctlr, buf-region, BUF_DEFAULT_SECTION)))

/* Later on can be used, without requiring bounce buffers... */
DT_BUF(dma0, i2s0) uint16_t audio_rx_buf[128];

/* Configure receive with above described buffer */
dma_config(dma_dev, ....);

I'm not being very precise here, and I very much doubt any of this works... I hope the idea is clear here though.

Describe what the hardware demands of a buffer in device tree, use that to setup statically allocated buffers or block allocators. That's it. Caching could be a totally separate thing or maybe its considered a sram region thing (often you can configure uncached/cached regions). Or perhaps the buffer is placed in a struct container with some metadata about caching and state. Would be interesting to look at the sram bank properties if provided in DT as the source of truth on this.

No allocation of bounce buffers needed, and tests that utilize DMA no longer have to have ifdefs for part specifics about this sort of thing as they do today. Caching is understood by the peripheral/dma/cpu in some manner to flush and invalidate when most appropriate or by a new API like you've proposed here.

teburd · 2023-05-11T15:17:55Z

Related prs/issues/code where having a DT_BUF macro might be incredibly helpful for setting up buffers for alignment, placement, and with cache properties.

#57786
https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/drivers/spi/spi_loopback/src/spi.c#L41
https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/drivers/dma/scatter_gather/src/test_dma_sg.c#L27
https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/drivers/adc/adc_dma/src/test_adc.c#L82

There's probably many more...

hubertmis · 2023-05-12T12:54:09Z

Thanks @teburd for these details. I like the idea of describing memory buffer requirements in the device tree and creating macros to easily create buffers meeting them.

It looks the main open question in this discussion is about the device driver model. I don't think Zephyr's device driver model currently puts any requirements on the memory buffers passed to the drivers' functions by their users: users can pass any buffers allocated in any memory (writable or not, cacheable or not, aligned or not, dynamically or statically allocated, ...) and they expect the drivers to correctly perform requested operations using the passed memory buffers.

It seems we are discussing several options on how to incorporate DMA requirements regarding used memory buffers in a unified way for all device drivers in Zephyr:

Keep the device driver model as it is. Create responsibility for drivers of devices with DMA with specific requirements to meet these requirements.
Modify the device driver model to require driver users to pass to the drivers only the memory buffers created with a designated (static or dynamic) allocator.
Keep the device driver model as it is. Create an opportunity for applications to allocate buffers meeting DMA requirements. Create responsibility for device drivers to check if the passed buffer meets the requirements, if not the driver would be responsible for copying the buffer's content to a bounce buffer.

The proposed device tree properties and macros can be used in each of these solutions. The difference would be which layer in the system would be able and/or responsible to use the macros to allocate the buffers correctly.

Is this summary correct? Do you happen to see any other options I didn't list?

When we decide the list of options is around being completed, we should start listing properties of these options that would help us to find out which one is the most aligned with Zephyr's needs.

teburd · 2023-05-12T15:36:48Z

Thanks @teburd for these details. I like the idea of describing memory buffer requirements in the device tree and creating macros to easily create buffers meeting them.

It looks the main open question in this discussion is about the device driver model. I don't think Zephyr's device driver model currently puts any requirements on the memory buffers passed to the drivers' functions by their users: users can pass any buffers allocated in any memory (writable or not, cacheable or not, aligned or not, dynamically or statically allocated, ...) and they expect the drivers to correctly perform requested operations using the passed memory buffers.

It seems we are discussing several options on how to incorporate DMA requirements regarding used memory buffers in a unified way for all device drivers in Zephyr:
1. Keep the device driver model as it is. Create responsibility for drivers of devices with DMA with specific requirements to meet these requirements.

I think this is mostly the right approach, the interfaces mostly remain the same, and with asserts enabled it should be considered a programming error to give an invalid buffer pointer.

2. Modify the device driver model to require driver users to pass to the drivers only the memory buffers created with a designated (static or dynamic) allocator.

This could be accomplished by supplying asserts and perhaps helper macros to validate buffer X
is aligned and placed in the right location. We have a great cache API now with the work of @carlocaione that drivers can use to flush/invalidate buffer passed in or out.

3. Keep the device driver model as it is. Create an opportunity for applications to allocate buffers meeting DMA requirements. Create responsibility for device drivers to check if the passed buffer meets the requirements, if not the driver would be responsible for copying the buffer's content to a bounce buffer.

A bounce buffer is the wrong approach in my humble opinion. It should be a programming error if this is needed.

The proposed device tree properties and macros can be used in each of these solutions. The difference would be which layer in the system would be able and/or responsible to use the macros to allocate the buffers correctly.

I would place the responsibility for correctly allocating buffers on the application. With helpers for setting up static buffers and block allocators. Assertions can validate that the buffer pointer is aligned and in the correct region.

Is this summary correct? Do you happen to see any other options I didn't list?

When we decide the list of options is around being completed, we should start listing properties of these options that would help us to find out which one is the most aligned with Zephyr's needs.

I think those options are right. I believe pretty strongly that memory should be managed by an application for Zephyr. Any sort of automatic memory management should be entirely opt in and not the default. It's unclear why a bounce buffer would ever be needed in Zephyr if the buffers are correctly setup, and that should be a goal.

hubertmis · 2023-05-15T10:35:22Z

@teburd, thanks for your comment.

User code with modified device driver model

I'm looking at selected Zephyr samples to assess changes needed if we decided to modify the device driver model so that the application is responsible for correct buffer allocation.

spi_bitbang

zephyr/samples/drivers/spi_bitbang/src/main.c

Lines 32 to 35 in c48422f

    
           uint16_t buff[5] = { 0x0101, 0x00ff, 0x00a5, 0x0000, 0x0102}; 
        
           int len = 5 * sizeof(buff[0]); 
        
           struct spi_buf tx_buf = { .buf = buff, .len = len };

Currently buff is used directly in the tx_buf structure. I understand with your recommendation this sample (and all other code using SPI driver API) would need to be modified to something like

 DT_BUF(dma0, dev) uint16_t buff[5] = { 0x0101, 0x00ff, 0x00a5, 0x0000, 0x0102}; 
 int len = 5 * sizeof(buff[0]); 
  
 struct spi_buf tx_buf = { .buf = buff, .len = len }; 

  dt_buf_acquire(buff); // To do cache management

I'm not sure if the DT_BUF() macro definition can somehow describe the padding of buff, so it might be necessary to use another macro:

DT_BUF(dma0, dev) uint16_t buff[DT_BUF_SIZE(dma0, dev, 5)] = { ... };

Using sizeof(buff) would not return the intended size of the buffer anymore. This change of the sizeof() return value could result in obscure bugs.

uart/echo_bot

This sample includes both RX and TX paths of UART communication.

RX

It's pretty easy because the following code

zephyr/samples/drivers/uart/echo_bot/src/main.c

Line 24 in c48422f

static char rx_buf[MSG_SIZE];

would need to be modified to

DT_BUF(dma0, uart_dev) static char rx_buf[DT_BUF_SIZE(dma0, uart_dev, MSG_SIZE)];

assuming these macros would work correctly for global static objects.
And some _acquire() / _release() calls for proper cache management.

TX

We can't use anymore just any char *buf to be printed like in

zephyr/samples/drivers/uart/echo_bot/src/main.c

Lines 64 to 71 in c48422f

    
           void print_uart(char *buf) 
        
           { 
        
           	int msg_len = strlen(buf); 
        
           	for (int i = 0; i < msg_len; i++) { 
        
           		uart_poll_out(uart_dev, buf[i]); 
        
           	} 
        
           }

To use this kind of API the sample code would need to allocate a bounce buffer meeting the UART TX and use it here:

DT_BUF(dma1, uart_dev) static char tx_buf[DT_BUF_SIZE(dma1, uart_dev, 1)];

void print_uart(char *buf)
{
	int msg_len = strlen(buf);

	for (int i = 0; i < msg_len; i++) {
                tx_buf[0] = buf[I];
                dt_buf_tx_acquire(tx_buf);
		uart_poll_out(uart_dev, tx_buf);
                dt_buf_tx_release(tx_buf);
	}
}

This bounce buffer would be allocated by the application for any platform, even if given UART does not have DMA, or includes a DMA capable of printing any application. This potentially unnecessary bounce buffer makes usage of driver APIs harder and is wasteful.

blinky_pwm

Some hardware peripherals like PWM, even though it's a rare case, can also include DMA. To be consistent with the device driver model and potentially handle DMA here, the API user would need to pass period and pulse arguments to the pwm_set_dt() function in a buffer meeting DMA requirements. So we could again allocate a bounce buffer in the application to do that. However, it's not obvious how the period and pulse fields shall be placed in the buffer, because each hardware peripheral might require a different data structure.

uart/echo_bot again

Assume we have a UART peripheral that uses DMA to read data from RAM in the following structure:

Byte 1: the size of the transfer
Byte 2: the 1st byte to be transferred
Byte 3: the 2nd byte to be transferred
Byte 4: ...

The solution discussed above with DT_BUF(dma1, uart_dev) static char tx_buf[DT_BUF_SIZE(dma1, uart_dev, 1)]; would not work with a peripheral like this. The tx_buf buffer would miss byte 1 containing the size of the transfer. The device driver would anyway allocate another bounce buffer and copy data around to add byte 1 at the beginning.

Conclusion

It seems impossible to impose responsibility on an application to allocate correct buffers for each potential DMA usage in any system. It would require an application (or a subsystem) to structure data in a buffer in such a way that is required for its target platform, rendering the application non-portable.

This makes option 2 I described in #57602 (comment) invalid.

Do you agree?

teburd · 2023-05-15T13:52:29Z

Comments inline, but no I don't agree, and I don't think we've fully explored this. I realized when re-reading this the dma node probably isn't needed in the macros, only the device tree node representing the peripheral. That should alleviate portability concerns. That and providing a define macro rather than macros to generate attributes.

@teburd, thanks for your comment.

User code with modified device driver model

I'm looking at selected Zephyr samples to assess changes needed if we decided to modify the device driver model so that the application is responsible for correct buffer allocation.

spi_bitbang

zephyr/samples/drivers/spi_bitbang/src/main.c

Lines 32 to 35 in c48422f

uint16_t buff[5] = { 0x0101, 0x00ff, 0x00a5, 0x0000, 0x0102};

int len = 5 * sizeof(buff[0]);

struct spi_buf tx_buf = { .buf = buff, .len = len };

Currently buff is used directly in the tx_buf structure. I understand with your recommendation this sample (and all other code using SPI driver API) would need to be modified to something like
 DT_BUF(dma0, dev) uint16_t buff[5] = { 0x0101, 0x00ff, 0x00a5, 0x0000, 0x0102}; 
 int len = 5 * sizeof(buff[0]); 
  
 struct spi_buf tx_buf = { .buf = buff, .len = len }; 

  dt_buf_acquire(buff); // To do cache management
I'm not sure if the DT_BUF() macro definition can somehow describe the padding of buff, so it might be necessary to use another macro:
DT_BUF(dma0, dev) uint16_t buff[DT_BUF_SIZE(dma0, dev, 5)] = { ... };
Using sizeof(buff) would not return the intended size of the buffer anymore. This change of the sizeof() return value could result in obscure bugs.

uart/echo_bot

This sample includes both RX and TX paths of UART communication.

RX

It's pretty easy because the following code

zephyr/samples/drivers/uart/echo_bot/src/main.c

Line 24 in c48422f

static char rx_buf[MSG_SIZE];

would need to be modified to
DT_BUF(dma0, uart_dev) static char rx_buf[DT_BUF_SIZE(dma0, uart_dev, MSG_SIZE)];
assuming these macros would work correctly for global static objects. And some _acquire() / _release() calls for proper cache management.

TX

We can't use anymore just any char *buf to be printed like in

zephyr/samples/drivers/uart/echo_bot/src/main.c

Lines 64 to 71 in c48422f

void print_uart(char *buf)

{

int msg_len = strlen(buf);

for (int i = 0; i < msg_len; i++) {

uart_poll_out(uart_dev, buf[i]);

}

}

To use this kind of API the sample code would need to allocate a bounce buffer meeting the UART TX and use it here:
DT_BUF(dma1, uart_dev) static char tx_buf[DT_BUF_SIZE(dma1, uart_dev, 1)];

void print_uart(char *buf)
{
	int msg_len = strlen(buf);

	for (int i = 0; i < msg_len; i++) {
                tx_buf[0] = buf[I];
                dt_buf_tx_acquire(tx_buf);
		uart_poll_out(uart_dev, tx_buf);
                dt_buf_tx_release(tx_buf);
	}
}
This bounce buffer would be allocated by the application for any platform, even if given UART does not have DMA, or includes a DMA capable of printing any application. This potentially unnecessary bounce buffer makes usage of driver APIs harder and is wasteful.

blinky_pwm

Some hardware peripherals like PWM, even though it's a rare case, can also include DMA. To be consistent with the device driver model and potentially handle DMA here, the API user would need to pass period and pulse arguments to the pwm_set_dt() function in a buffer meeting DMA requirements. So we could again allocate a bounce buffer in the application to do that. However, it's not obvious how the period and pulse fields shall be placed in the buffer, because each hardware peripheral might require a different data structure.

uart/echo_bot again

Assume we have a UART peripheral that uses DMA to read data from RAM in the following structure:

Byte 1: the size of the transfer Byte 2: the 1st byte to be transferred Byte 3: the 2nd byte to be transferred Byte 4: ...

The solution discussed above with DT_BUF(dma1, uart_dev) static char tx_buf[DT_BUF_SIZE(dma1, uart_dev, 1)]; would not work with a peripheral like this. The tx_buf buffer would miss byte 1 containing the size of the transfer. The device driver would anyway allocate another bounce buffer and copy data around to add byte 1 at the beginning.

Then change the macro

/**
 * Defines a buffer with a given symbolic name such that it is aligned, placed, sized, and
 * cached appropriately. The symbol buf_name may not be the actual start address of the buffer
 * if the device requires a prefixed space for metadata about the operation.
 */
#define DT_BUF_DEFINE(node, buf_name, buf_len) \
    DT_BUF_ATTRS(node) uint8_t __buf_name[DT_BUF_LEN(node, buf_len)];
    uint8_t *buf_name = &__buf_name[DT_BUF_OFFSET(node)];

Just like that, I have a macro appropriately created. How could we verify the buffer is correctly done though for that device you ask? A marker (well understood sequence like a stack sentinel, e.g. 0xDEADBEAD) could be prefixed or suffixed and asserted on when CONFIG_ASSERT=y.

Conclusion

It seems impossible to impose responsibility on an application to allocate correct buffers for each potential DMA usage in any system. It would require an application (or a subsystem) to structure data in a buffer in such a way that is required for its target platform, rendering the application non-portable.

That's the whole point of applying macros and device tree to the problem. To make it portable.

This makes option 2 I described in #57602 (comment) invalid.

Do you agree?

No I do not agree, I don't think we've fully explored options to statically allocate buffers correctly.

hubertmis · 2023-05-16T17:10:01Z

I have a macro appropriately created

Thanks, indeed it would cover much more cases.

I'll reiterate some and discuss a few more examples to see if we have a common understanding of how this macro would change the Zephyr codebase.

blinky_pwm

I believe if a PWM peripheral uses DMA then the PWM API user could not create a buffer and fill it with period and pulse because the buffer structure is unknown to the user.

It would mean that the allocation of buffers by the API user would be available only for the device drivers that are streaming data (UART, SPI, I2C, USB, radios, ...), but for other drivers, it would be the driver's responsibility to allocate a bounce buffer and fill necessary data like period and pulse.

UART (again) with a dynamic payload offset in DMA buffer

Some peripherals have a dynamic structure of data consumed by DMA. One example would be UART which requires the following structure if the transfer is shorter than 255:

{length, byte 0, byte 1, byte 2, ...}

and following structure if the transfer is longer:

{0xff, length 0, length 1, length 2, length 3, payload 0, payload 1, ...}

Do you think we could somehow capture it in a macro, or would we need to treat such peripherals as exceptions and let device drivers allocate bounce buffers?

UART transferring constant data

Let's discuss an application that transfers constant byte-strings using UART. Currently, the application code would define these byte-strings as something like:

static const uint8_t str1[] = {0xde, 0xad, 0xbe, 0xef};
static const uint8_t str2[] = {0xca, 0xfe, 0xfa, 0xce};

Potentially placing data in NVM where some DMA have access, but others not necessarily.

With the new macro definition, I understand the code would need to be refactored to something like

DT_BUF_DEFINE(dev, str1_buf, sizeof(str1));
DT_BUF_DEFINE(dev, str2_buf, sizeof(str2));

static void copy_buf_content(void) {
  memcpy(str1_buf, str1, sizeof(str1));
  memcpy(str2_buf, str2, sizeof(str2));
}

what is functionally correct. However, it would cause unnecessary RAM buffers allocation and memcpy overhead for platforms where UART can access NVM directly.

spi_flash

SPI flash sample uses SPI (potentially with DMA) indirectly: main -> drivers/flash -> drivers/spi.
I'm not sure which module should allocate SPI DMA buffers in this sample.

I don't think it should be done by main, because main is decoupled from the const struct device *spi - it is aware of const struct device *flash only. So it cannot define a buffer like DT_BUF_DEFINE(spi, spi_buf, size);.

An alternative would be to define a bounce buffer in drivers/flash and copy the payload passed by main, but I understand your goal is to avoid bounce buffers. Do you think we can somehow achieve it in a more complex DMA usage like this one?

teburd · 2023-05-16T18:38:48Z

It seems like there's some special metadata header that needs to exist inline with the data in the cases you've laid out. Fair enough, lets work through it.

I have a macro appropriately created

Thanks, indeed it would cover much more cases.

I'll reiterate some and discuss a few more examples to see if we have a common understanding of how this macro would change the Zephyr codebase.

blinky_pwm

I believe if a PWM peripheral uses DMA then the PWM API user could not create a buffer and fill it with period and pulse because the buffer structure is unknown to the user.

This seems like a pretty special case, and perhaps a specialized PWM set of buffer setup macros would be useful that understand the period and pulse. We have C and while that limits our ability to express some compile time things, I think this could be done with PWM specific macros. Is there some datasheet/ref manual or driver in the tree that does this today I could look at?

It would mean that the allocation of buffers by the API user would be available only for the device drivers that are streaming data (UART, SPI, I2C, USB, radios, ...), but for other drivers, it would be the driver's responsibility to allocate a bounce buffer and fill necessary data like period and pulse.

UART (again) with a dynamic payload offset in DMA buffer

Some peripherals have a dynamic structure of data consumed by DMA. One example would be UART which requires the following structure if the transfer is shorter than 255:
{length, byte 0, byte 1, byte 2, ...}
and following structure if the transfer is longer:
{0xff, length 0, length 1, length 2, length 3, payload 0, payload 1, ...}
Do you think we could somehow capture it in a macro, or would we need to treat such peripherals as exceptions and let device drivers allocate bounce buffers?

Why is a bounce buffer needed here vs multiple length prefixed transfers in a list? Heap allocations and memcpy aren't free. Is there some hardware datasheets or drivers I can look at?

UART transferring constant data

Let's discuss an application that transfers constant byte-strings using UART. Currently, the application code would define these byte-strings as something like:
static const uint8_t str1[] = {0xde, 0xad, 0xbe, 0xef};
static const uint8_t str2[] = {0xca, 0xfe, 0xfa, 0xce};
Potentially placing data in NVM where some DMA have access, but others not necessarily.

With the new macro definition, I understand the code would need to be refactored to something like
DT_BUF_DEFINE(dev, str1_buf, sizeof(str1));
DT_BUF_DEFINE(dev, str2_buf, sizeof(str2));

static void copy_buf_content(void) {
  memcpy(str1_buf, str1, sizeof(str1));
  memcpy(str2_buf, str2, sizeof(str2));
}
what is functionally correct. However, it would cause unnecessary RAM buffers allocation and memcpy overhead for platforms where UART can access NVM directly.

That's true today to with bounce buffers isn't it?

spi_flash

SPI flash sample uses SPI (potentially with DMA) indirectly: main -> drivers/flash -> drivers/spi. I'm not sure which module should allocate SPI DMA buffers in this sample.

I don't think it should be done by main, because main is decoupled from the const struct device *spi - it is aware of const struct device *flash only. So it cannot define a buffer like DT_BUF_DEFINE(spi, spi_buf, size);.

An alternative would be to define a bounce buffer in drivers/flash and copy the payload passed by main, but I understand your goal is to avoid bounce buffers. Do you think we can somehow achieve it in a more complex DMA usage like this one?

In this case wouldn't the flash device tree node define things appropriately for applications? I'm a little lost on this example in particular.

hubertmis · 2023-05-17T07:17:30Z

blinky_pwm

This seems like a pretty special case,

I would like to highlight here that every single peripheral in the system could be a special case like this. It is possible that any peripheral, including basic ones like PWM, GPIO, or timers, can use DMA internally to offload a CPU from accessing peripheral registers.

perhaps a specialized PWM set of buffer setup macros would be useful that understand the period and pulse

Do I understand correctly that it would require deprecating pwm_set_dt() and similar functions and replacing them with something like

DT_PWM_DATA(dev, name);

void set_pwm(period, pulse) {
  DT_PWM_SET_PERIOD(name, period);
  DT_PWM_SET_PULSE(name, pulse);
  pwm_set_apply(dev, name);
}

Solutions like this would cause an overhead I described in "UART transferring constant data" for every single device driver.

Is there some datasheet/ref manual or driver in the tree that does this today I could look at?

An example of PWM using DMA to load pulse values is here: https://infocenter.nordicsemi.com/topic/ps_nrf5340/pwm.html?cp=4_0_0_6_22
However, nRF53 has a basic memory hierarchy so no special handling of memory access is needed there (no cache: no need for alignment and padding and writing-back; common addressing across the system: no need for address translation; PWM's EasyDMA can access all memories in the system). If the memory hierarchy was more complicated, e.g. caches present, then it would be more difficult to handle and currently we are missing a systematic way in Zephyr to do that.

UART (again) with a dynamic payload offset in DMA buffer

Why is a bounce buffer needed here vs multiple length prefixed transfers in a list?

I'm not sure if a bounce buffer is needed. However, if the transfer size is determined in run-time, I cannot imagine a macro that could set the size and the offsets in the static buffer correctly. Maybe it would be possible with more complicated macros.

Is there some hardware datasheets or drivers I can look at?

I'm not aware of any publicly available datasheet or driver. Assuming an 802.15.4 radio's DMA expects PHR immediately followed by PSDU in RAM (like nRF5 radios), for 802.15.4 MSK PHY described in 17.1.1 of IEEE 802.15.4-2020, the length of PHR (encoding of the frame length) is one or two bytes long depending on the payload (PSDU) length. It's similar to the example I drafted in the previous comment.

UART transferring constant data

However, it would cause unnecessary RAM buffers allocation and memcpy overhead for platforms where UART can access NVM directly.

That's true today to with bounce buffers isn't it?

Some SoCs require special handling of RAM buffers for DMAs, others do not. Currently, Zephyr works perfectly fine with devices that do not require any special handling. DMAs in devices like this can access static const uint8_t str1[] = {0xde, 0xad, 0xbe, 0xef}; buffers directly, and Zephyr's driver model allows them to do that.

The RFC created by @carlocaione and this PR aim to design a systematic way to support SoCs requiring special handling of DMA RAM buffers. To answer your question: Yes, in these more complicated devices the additional buffer overhead and memcpy is most likely required for both bounce buffers and application-managed buffers solutions.

But my point is that we should avoid creating this overhead for SoCs which work fine without it.

I'm curious if we could define a driver API that allows its user to avoid the discussed overhead in simple SoCs, but still statically allocate buffers to avoid bounce buffers in complex SoCs.

spi_flash

In this case wouldn't the flash device tree node define things appropriately for applications? I'm a little lost on this example in particular.

I'll try to show my concern in a little more detailed example

Currently, the spi_flash sample contains code like:

const uint8_t expected[] = { 0x55, 0xaa, 0x66, 0x99 };
uint8_t buf[sizeof(expected)];
rc = flash_write(flash_dev, SPI_FLASH_TEST_REGION_OFFSET, expected, len);
rc = flash_read(flash_dev, SPI_FLASH_TEST_REGION_OFFSET, buf, len);

The expected and buf buffers are later translated by a chosen flash driver to something like data below (using spi_nor.c as an example):

        uint8_t buf[5] = { 0 };
	struct spi_buf spi_buf[2] = {
		{
			.buf = buf,
			.len = 1,
		},
		{
			.buf = data,
			.len = length
		}
	};

I understand the spi_nor.c code would need to be modified a little to use a static allocator like DT_BUF_DEFINE() for the buf buffer:

        DT_BUF_DEFINE(&driver_cfg->spi, buf, 5);
        memset(buf, 0, 5);

	struct spi_buf spi_buf[2] = {
		{
			.buf = buf,
			.len = 1,
		},
		{
			.buf = data,
			.len = length
		}
	};

However, it's not clear to me how the data buffer could be allocated with DT_BUF_DEFINE().

One option would be to allocate a buffer in the main.c:

const uint8_t expected[] = { 0x55, 0xaa, 0x66, 0x99 };
DT_BUF_DEFINE(/* how to get spi dev, if it's decoupled from flash API user? */, expected_buf, sizeof(expected));
memcpy(expected_buf, expected, sizeof(expected));

rc = flash_write(flash_dev, SPI_FLASH_TEST_REGION_OFFSET, expected_buf, len);

But to make it work we would need to increase coupling between software modules so that flash API user gets access to the bus device to which the flash module is connected. I don't think increasing coupling like this is a good idea.

Another solution would be to define a bounce buffer in spi_nor.c:

        DT_BUF_DEFINE(&driver_cfg->spi, bounce_buf, length);
        memcpy(bounce_buf, data, length);

But this solution relies on a bounce buffer that we discuss how to avoid.

carlocaione · 2023-05-17T12:06:34Z

My 2 cents.

Shortly, I think that the problem here is that there are cases where the application layer has no idea about the requirement at driver / hardware level about buffer allocation / overhead / metadata / etc...

Now, in many cases this information can indeed be obtained from DT, but there are cases where (for example the overhead, the internal buffer organization, etc...) the information is only known at run-time and not compile-time. For all these cases a static buffer allocation is probably not optimal (because we would end up introducing a different macro for each exception) or not achievable at all, so we must resort to a dynamic allocation of sorts (aka bounce buffers).

Now, I understand @andyross @teburd concerns but I don't see an easy way out. If the buffer cannot be statically defined at app layer, then it must be allocated by drivers and filled up with a user buffer provided by app (when needed, plus all the other needed DMA bits and pieces).

teburd · 2023-05-17T13:29:38Z

My 2 cents.

Shortly, I think that the problem here is that there are cases where the application layer has no idea about the requirement at driver / hardware level about buffer allocation / overhead / metadata / etc...

That's what should be solved, making that available to the application.

Now, in many cases this information can indeed be obtained from DT, but there are cases where (for example the overhead, the internal buffer organization, etc...) the information is only known at run-time and not compile-time. For all these cases a static buffer allocation is probably not optimal (because we would end up introducing a different macro for each exception) or not achievable at all, so we must resort to a dynamic allocation of sorts (aka bounce buffers).

All the cases in Zephyr I've seen thus far can be solved with a macro that does just that. Please help me understand where this is coming from.

Now, I understand @andyross @teburd concerns but I don't see an easy way out. If the buffer cannot be statically defined at app layer, then it must be allocated by drivers and filled up with a user buffer provided by app (when needed, plus all the other needed DMA bits and pieces).

To repeat what @andyross said, this isn't Linux. To have this setup where the default would be to malloc+memcpy silently on a peripheral API call just because the application author didn't have the tools to setup a buffer right is massively wasteful and I'll -1 it all day. If you have special hardware with special needs because it has some unique logic as being described here, well that to me that seems like a vendor specific problem not a Zephyr problem. That should be solved in a vendor specific way not in a broad way that implies this is how Zephyr wishes to handle DMA buffers generally by default.

I'm happy to be wrong if examples can be found and given or some docs can be linked so I can better understand what problem is trying to be solved here. I haven't seen any good examples in the tree after some grepping around.

UART:

nrf has a memcpy from the source to a ram buffer, seemingly to avoid doing DMA from flash? Solvable with DT_BUF seemingly.

SPI:

esp32_spim mallocs and copies if the buffer isn't aligned. Solvable with DT_BUF, should not be done this way.
nrfx_spim copies if not in RAM. Solvable with DT_BUF

I2C:

nrfx_twim has a memcpy to seemingly work with easydma (which I believe does not have a traditional transfer list). Unsolvable in a general way. Unclear exactly to me the reasoning.

FLASH: (harder to grep for memcpy as many peripherals allow for mapping the flash into a bus address space...)

nrf_qspi does a memcpy to seemingly add a prefix. Solvable with DT_BUF potentially.
xmc4xxx does a memcpy to align. DT_BUF solvable
nios2 qspi does a memcpy to add a length prefix from a quick glance. Potentially DT_BUF solvable with a flash wrapper (as would nrf_qspi).

Happy to be shown something differently but I just don't see where this problem actually is. I do see the problem with setting up aligned and placed buffers. That is a well trod problem I think anyone who's written a test case with DMA involved can attest to. But a macro could solve it. I've linked to ones I know of first hand already.

hubertmis · 2023-05-17T13:52:13Z

malloc+memcpy silently on a peripheral API call just because the application author didn't have the tools to setup a buffer right is massively wasteful and I'll -1 it all day.

Agreed. But from the analysis I presented in previous comments, it seems forcing the application authors to use DT_BUF on the platforms in which such allocation is not necessary is also massively wasteful, so I could -1 it all day :)

I'm curious if we can modify any of these solutions or find a new one that is not wasteful in systems where DMA can use any buffers, and in systems where DMAs have special needs.

andyross · 2023-05-17T14:45:51Z

Driving by again: I don't have the DMA depth of expertise, but I'm still with @tomb here. This seems like a needlessly heavyweight solution for a problem that's well-constrained and solvable statically for almost everything we're talking about.

Note also that there's a natural abstraction layering here that's being violated: it's possible (if not trivially) to take an API designed for static allocation and indirect it at runtime with a dynamic buffer. It's not possible to magically make a runtime allocation into a static one in the general case (c.f. the undecidability of the Halting Problem, Marvin Minsky et. al.). This is what I meant upthread about "not being able to go back".

My general worry is that we'll continue to see use on devices and applications where static buffer allocation is a natural fit (SOF is exactly such an app, even though right now it has its own heap to allocate them), and they'll continue to use some kind of backdoor anyway to avoid the dynamic overhead. Let's not tempt them?

hubertmis · 2023-05-17T15:56:12Z

Maybe it will be easier to find common ground if we decide on the basic expected properties of the system we would like to get.

I believe for simplicity we can discuss an application that sends through UART two strings:

"Hello world": static const uint8_t hello_world[] = "Hello world";
a generated random number: static uint8_t rand_num[RN_LEN];

Both buffers are configured statically by an application.

Now, let's consider two types of SoCs for which we compile this application

STM32 where UART driver sends data byte by byte using USART's TDR (Transmit data register)
nRF52 which contains UARTE requiring data to be placed in RAM
I would love to discuss a device with a data cache, but could not find one supported by Zephyr yet.

For STM32 we would like to:

Have hello_world buffer stored in ro-data in flash. The UART device driver copies data byte by byte from flash to USART's TDR.
Have rand_num buffer stored in bss in RAM. The UART device driver copies data byte by byte from this buffer to USART's TDR.

For nRF52 we would like to:

Copy hello_world from flash to RAM and configure UARTE's EasyDMA engine to point to the copy.
Have rand_num buffer stored in bss in RAM. The UARTE's EasyDMA engine points directly to rand_num buffer without copying.

Do you agree these are the system properties we would like to get in a single portable application?

hakehuang · 2023-05-18T07:08:50Z

include/zephyr/dmm/dmm.h

+ *  @retval 0 on success
+ *  @retval other errno codes depending on the implementation
+ */
+int dmm_release_buffer_for_dma_out(const struct device *dev, uint32_t flags,


if release a buffer why I need to know the flags?

hakehuang · 2023-05-18T07:13:42Z

include/zephyr/dmm/dmm.h

+ *  @retval 0 on success
+ *  @retval other errno codes depending on the implementation
+ */
+int dmm_get_buffer_for_dma_out(const struct device *dev, uint32_t flags,


would this possible to define where is the buffer comes from? say a memory pool point?or just a fixed physical address

hakehuang · 2023-05-18T07:15:52Z

include/zephyr/dmm/dmm.h

+ *  @param[out] dma_handle DMA handle.
+ *  @param[in] buffer User-provided buffer.
+ *  @param[in] buffer_size Size of the user-provided buffer.
+ *


same as out, can we add a parameter for where the buffer is comming from?

hubertmis · 2023-06-07T14:19:18Z

I was thinking a little about the discussion we had during the last arch meeting.

We would like to achieve multiple quality attributes of the system:

low memory consumption
stability and accessibility
performance
developer-friendly API
and more

Changing priorities of these attributes leads to different solutions.

I'm thinking if we could merge (make available) all proposed solutions to make it possible for the application developer to choose which of the attributes is the highest priority for the application.

Here is what came to my mind looking at the example we discussed: audio application: getting audio data from SPI and sending them to I2S output.

Proposal 1: drivers responsible to manage buffers

The application code would be something like this:

uint8_t audio_frame[AUDIO_FRAME_SIZE];

spi_read(spi_dev, audio_frame, AUDIO_FRAME_SIZE);
i2s_send(i2s_dev, audio_frame, AUDIO_FRAME_SIZE);

User-friendliness is perfect - code is easy to write, understand, and maintain. Performance varies: in some socs spi_read() and/or i2s_send() would need to allocate bounce buffers and memcpy() data around; in other socs audio_frame buffer could be used directly by DMAs. Memory consumption and stability vary, depending if bounce buffer allocators are static or dynamic.

Proposal 2: application responsible to manage buffers

Code example:

DMM_BUFFER(spi_dev, spi_frame, AUDIO_FRAME_SIZE);
DMM_BUFFER(i2s_dev, i2s_frame, AUDIO_FRAME_SIZE);

spi_read(spi_dev, spi_frame, AUDIO_FRAME_SIZE);
memcpy(i2s_frame, spi_frame, AUDIO_FRAME_SIZE);
i2s_send(i2s_dev, i2s_frame, AUDIO_FRAME_SIZE);

User-friendliness is a little worse - using macros to allocate buffers is not intuitive. Performance is worse: we always have at least one memcpy() even if it's not needed in a target SoC. Stability is perfect with static memory allocation we cannot run out of buffers. Memory consumption may be better or worse than in proposal 1 - it depends if the target SoC would require bounce buffers.

Merged proposal: the application may manage buffers using macros, but the responsibility is in the drivers.

With this approach, the application could be implemented in multiple ways. One example would be exactly like "Proposal 1", another exactly like "Proposal 2", sharing all of its attributes.

One more which I think is the most interesting would be:

// Allocate statically buffer for SPI, because buffers written by DMA are usually more restricted than those read by DMA
DMM_BUFFER(spi_dev, audio_frame, AUDIO_FRAME_SIZE);

// Application ensures that the buffer sent to read SPI data can be used by SPI's DMA
spi_read(spi_dev, audio_frame, AUDIO_FRAME_SIZE);
i2s_send(i2s_dev, audio_frame, AUDIO_FRAME_SIZE);
// The buffer passed to I2S is not guaranteed to be usable by I2S's DMA

User-friendliness is similar to "Proposal 2" - using unintuitive macros to allocate buffers. Performance is perfect - the SoC-specific code in the drivers will ensure that we avoid memcpy() if possible. Stability and memory consumption vary between platforms depending on the buffer allocation strategy for I2S.

But what I think is most important in this solution is that we do not enforce "stability over performance" or "performance over user-friendliness". It's the application developer who decides which property is the most important for their application.

Other benefits of the merged proposal are:

no changes in the device driver model and existing device driver APIs
no changes in existing device drivers
no changes needed in existing samples or subsystems

Required changes:

new macros in device driver APIs to allow applications to allocate DMA-compliant buffers

github-actions · 2023-09-03T00:29:54Z

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

carlocaione marked this pull request as ready for review May 5, 2023 13:01

carlocaione requested review from nashif, carlescufi, galak and MaureenHelm as code owners May 5, 2023 13:01

carlocaione requested review from npitre, hubertmis, teburd, andyross, stephanosio, gmarull and henrikbrixandersen May 5, 2023 13:02

carlocaione linked an issue May 5, 2023 that may be closed by this pull request

[RFC] Proposing a new DMA Memory Management (DMM) subsystem #57220

Open

carlocaione self-assigned this May 5, 2023

carlocaione added RFC Request For Comments: want input from the community DNM This PR should not be merged (Do Not Merge) labels May 5, 2023

teburd mentioned this pull request May 5, 2023

drivers: dma: esp32s3: Add DMA support for esp32s3 #57495

Merged

hubertmis reviewed May 5, 2023

View reviewed changes

teburd reviewed May 5, 2023

View reviewed changes

carlocaione requested a review from teburd May 11, 2023 07:56

teburd requested review from hakehuang and erwango May 17, 2023 14:39

hakehuang reviewed May 18, 2023

View reviewed changes

carlescufi mentioned this pull request Jun 1, 2023

[RFC] Proposing a new DMA Memory Management (DMM) subsystem #57220

Open

microbuilder self-requested a review June 6, 2023 08:39

henrikbrixandersen removed their request for review July 4, 2023 11:02

dgastonochoa mentioned this pull request Jul 13, 2023

RFC: drivers: stm32: SPI: Check that SPI buffers are in a nocache region #57786

Merged

github-actions bot added the Stale label Sep 3, 2023

github-actions bot closed this Sep 18, 2023

	* memory (for example flushing cache lines associated with the memory
	* memory (for example writing back cache lines associated with the memory

	* @param[in] flags Flags
	* @param[in] flags Flags reserved for future use

[RFC][DNM] DMM: Skeleton for a DMA Memory Management subsystem #57602

[RFC][DNM] DMM: Skeleton for a DMA Memory Management subsystem #57602

Conversation

carlocaione commented May 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

teburd left a comment

Choose a reason for hiding this comment

andyross commented May 5, 2023

hubertmis commented May 5, 2023 • edited Loading

andyross commented May 5, 2023

hubertmis commented May 8, 2023

teburd commented May 11, 2023 • edited Loading

teburd commented May 11, 2023

hubertmis commented May 12, 2023

teburd commented May 12, 2023 • edited Loading

hubertmis commented May 15, 2023 • edited Loading

User code with modified device driver model

spi_bitbang

uart/echo_bot

RX

TX

blinky_pwm

uart/echo_bot again

Conclusion

teburd commented May 15, 2023 • edited Loading

User code with modified device driver model

spi_bitbang

uart/echo_bot

RX

TX

blinky_pwm

uart/echo_bot again

Conclusion

hubertmis commented May 16, 2023

blinky_pwm

UART (again) with a dynamic payload offset in DMA buffer

UART transferring constant data

spi_flash

teburd commented May 16, 2023 • edited Loading

blinky_pwm

UART (again) with a dynamic payload offset in DMA buffer

UART transferring constant data

spi_flash

hubertmis commented May 17, 2023

blinky_pwm

UART (again) with a dynamic payload offset in DMA buffer

UART transferring constant data

spi_flash

carlocaione commented May 17, 2023

teburd commented May 17, 2023 • edited Loading

hubertmis commented May 17, 2023

andyross commented May 17, 2023

hubertmis commented May 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hubertmis commented Jun 7, 2023

Proposal 1: drivers responsible to manage buffers

Proposal 2: application responsible to manage buffers

Merged proposal: the application may manage buffers using macros, but the responsibility is in the drivers.

github-actions bot commented Sep 3, 2023

carlocaione commented May 5, 2023 •

edited

Loading

hubertmis commented May 5, 2023 •

edited

Loading

teburd commented May 11, 2023 •

edited

Loading

teburd commented May 12, 2023 •

edited

Loading

hubertmis commented May 15, 2023 •

edited

Loading

teburd commented May 15, 2023 •

edited

Loading

teburd commented May 16, 2023 •

edited

Loading

teburd commented May 17, 2023 •

edited

Loading