Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: i2c: add an option to skip auto-sending stop on last message #76654

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

fabiobaltieri
Copy link
Member

Hey folks, bumped into a bit of an issue with some legacy code after #75801. I understand the motivation but I suspect I may not be the only one depending on this and some use case is particularly hard to convert: I happen to have some code that really wants to build the transactions a bit at a time. :-(

How about a flag to restore the old behavior with a note that things may break real bad?


The I2C transfer API has been recently changed to always automatically set a STOP on the last message, which was well documented but implemented only by few drivers.

Unfortunately, while documented, this is a change in the current behavior and it turns out that some applications depended on it for some complex operations.

Add a flag to allow to explicitly request skipping the last message stop and request the previous behavior, with a note that that may result in leaving the bus in an unexpected state.

@bjarki-andreasen
Copy link
Collaborator

bjarki-andreasen commented Aug 2, 2024

This I belive can be a bit of an issue for I2C controllers which can't leave the bus in an unknown state, like the nRF TWIM, AFAIK it can't "pause" a transfer, it will send a stop cond...

Is this something that can't be fixed in whatever is using the API? If we are changing stuff I have other requests as well to remove some ambiguity :)

@fabiobaltieri
Copy link
Member Author

fabiobaltieri commented Aug 2, 2024

Is this something that can't be fixed in whatever is using the API? If we are changing stuff I have other requests as well to remove some ambiguity :)

Tried pretty hard the whole day but it's a bit of an issue, basically I have some code path that changes conditions mid transaction and applies it on subsequent messages, which is impossible to reproduce, plus I have some pretty tangled emulation testing code that depends on it, and this is all for hardware I don't quite have access to for testing. I think it could be fixed, but I would not mind an easy way out rather than risk causing a regression in production for some mysterious combination of hardware and options.

I'd happily take your ambiguiation removal requests though, sounds like it should be mentioned that not all controller supports it, as a start.

@teburd
Copy link
Collaborator

teburd commented Aug 2, 2024

Hey folks, bumped into a bit of an issue with some legacy code after #75801. I understand the motivation but I suspect I may not be the only one depending on this and some use case is particularly hard to convert: I happen to have some code that really wants to build the transactions a bit at a time. :-(

How about a flag to restore the old behavior with a note that things may break real bad?

The I2C transfer API has been recently changed to always automatically set a STOP on the last message, which was well documented but implemented only by few drivers.

Unfortunately, while documented, this is a change in the current behavior and it turns out that some applications depended on it for some complex operations.

Add a flag to allow to explicitly request skipping the last message stop and request the previous behavior, with a note that that may result in leaving the bus in an unexpected state.

This is the expected behavior, and most drivers implemented it this way. There were a few that did not implement the implied stop for each transfer which should have been considered buggy/out of compliance.

How would this work in the case of mulitple devices accessing a single i2c bus controller if the stop is not sent? There's no external locking for an i2c bus controller.

This is a fix being applied in 4.x an as yet unreleased development branch. I'm not sure how best to deal with this. Have a Kconfig that allows for the buggy behavior to be turned on, but defaulted to off? How long should we keep this buggy behavior around?

@fabiobaltieri
Copy link
Member Author

fabiobaltieri commented Aug 2, 2024

How would this work in the case of mulitple devices accessing a single i2c bus controller if the stop is not sent? There's no external locking for an i2c bus controller.

Yeah, that code path I have has explicit locking. I could mention it in more details.

@fabiobaltieri
Copy link
Member Author

This is a fix being applied in 4.x an as yet unreleased development branch. I'm not sure how best to deal with this. Have a Kconfig that allows for the buggy behavior to be turned on, but defaulted to off? How long should we keep this buggy behavior around?

Well this would be a bit that says "unsafe, use at your own risk", unless one uses it nothing changes.

@teburd
Copy link
Collaborator

teburd commented Aug 2, 2024

How would this work in the case of mulitple devices accessing a single i2c bus controller if the stop is not sent? There's no external locking for an i2c bus controller.

Yeah, that code path I have has explicit locking. I could mention it in more details.

The problem here is then every i2c device would need to have a way of locking around the i2c bus controller. The implied atomic action of i2c_transfer is no longer possible if the bus is left in a non-idle state as this would allow.

I'm ok with a Kconfig to disable the enforcement if that's whats needed, but frankly it seems like something else needs fixing here and the Kconfig should be viewed entirely as a big ole bandaid to allow for the existing broken behavior to continue temporarily while things are fixed. I do not want to carry this more than one release.

Copy link
Collaborator

@teburd teburd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please supply a Kconfig to disable the behavior of adding stop instead with a big fat warning it will be removed for 4.1, adding code here means all the correctly behaving firmware pays this cost.

@bjarki-andreasen
Copy link
Collaborator

This note

* @note Not all scatter/gather transactions can be supported by all
allows the application create a tranfer for a single consequtive write of say, 10 bytes, and split it into 10 separate i2c_msg structs of len 1. This should not be allowed...

It adds a ton of overhead for drivers which either just start 10 separate transactions, and run them with a delay in between (using a DMA, that implies preparing it 10 times...), or either for efficientcy or need (the TWIM for example) have to copy and stitch together messages before doing the ONE transfer...

It makes little sense to impose such overhead on drivers, especially when not all hardware even can do it.

@fabiobaltieri
Copy link
Member Author

This note

* @note Not all scatter/gather transactions can be supported by all

allows the application create a tranfer for a single consequtive write of say, 10 bytes, and split it into 10 separate i2c_msg structs of len 1. This should not be allowed...

Ok but I'm not sure I understand what it has to do with this.

@teburd
Copy link
Collaborator

teburd commented Aug 2, 2024

This note

* @note Not all scatter/gather transactions can be supported by all

allows the application create a tranfer for a single consequtive write of say, 10 bytes, and split it into 10 separate i2c_msg structs of len 1. This should not be allowed...
It adds a ton of overhead for drivers which either just start 10 separate transactions, and run them with a delay in between (using a DMA, that implies preparing it 10 times...), or either for efficientcy or need (the TWIM for example) have to copy and stitch together messages before doing the ONE transfer...

It makes little sense to impose such overhead on drivers, especially when not all hardware even can do it.

Like all things here we have a variety of hardware and are fighting to find a common behavior. Please propose a change in an issue/PR if you wish for this to change. I don’t disagree with the sentiment. Agree with @fabiobaltieri seems like a comment best served in an issue.

@fabiobaltieri
Copy link
Member Author

Please supply a Kconfig to disable the behavior of adding stop instead with a big fat warning it will be removed for 4.1, adding code here means all the correctly behaving firmware pays this cost.

Sure let's give it a shot, converted to a Kconfig option, I'll try to rework our drivers see if I get to the bottom of it, my fear is that some other user would bump into the same issue as well but then this would give us a chance to find out.

@fabiobaltieri fabiobaltieri changed the title drivers: i2c: add a flag to skip auto-sending stop on last message drivers: i2c: add an option to skip auto-sending stop on last message Aug 2, 2024
teburd
teburd previously approved these changes Aug 2, 2024
@bjarki-andreasen
Copy link
Collaborator

Your suggestion is changing the "stated" promise of the API, which clearly indicates that there is disagreement/misalignment between API and drivers. This is a good time to agree fully and unambiguously on all of it. That is why I think we should discuss the entire API here before making changes.

I think the NO_STOP flag should be a kconfig option, which is marked as deprecated immediately so users can adapt to the actual promise of the API within this release.

The NO_STOP behavior can not be implemented by TWIM, which means users/drivers setting this flag will not work with nordic chips, its a major issue...

Copy link
Collaborator

@bjarki-andreasen bjarki-andreasen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking since it breaks compatibility with in-tree drivers (TWIM)

@fabiobaltieri
Copy link
Member Author

Your suggestion is changing the "stated" promise of the API, which clearly indicates that there is disagreement/misalignment between API and drivers. This is a good time to agree fully and unambiguously on all of it. That is why I think we should discuss the entire API here before making changes.

Oh ok I see, I'd prefer if we can get this through and ridiscuss the whole API seprately, happy to change anything related to this option.

As for the NO_STOP specifically: yes I'm not trying to argue that it would have been a good API to have the set bit, my point was that the change introduced a regression. Now that's because some application code depended on the opposite of what the API was supposed to do and there's good reason for that, but that does not take it from the fact that now some code will break and needs substatal rework. I would have been ok with a "let it burn I know the risks flag" (which would have inevitably messed me up down the road anyway, I think right now it works for us because we have dedicated busses), but a Kconfig option to buy time to sort things out should do.

By the way another intresting thing of the new API I bumped into is that it modifies the user supplied data structure by adding the stop bit, which I haven't found to be quite intuitive. If a code path prepares a message set, sends only part of it, then extends it and sends it again without re-setting the flags on the previously-last message, it's going to find a stop that it has not set there.

@fabiobaltieri
Copy link
Member Author

Blocking since it breaks compatibility with in-tree drivers (TWIM)

Guess I can invert the logic and depends on !I2C_NRFX_TWIM?

@fabiobaltieri
Copy link
Member Author

Swapped the option, added a "DEPRECATED" text, added few depends on on drivers that were modified in the original PR.

teburd
teburd previously approved these changes Aug 2, 2024
@bjarki-andreasen
Copy link
Collaborator

bjarki-andreasen commented Aug 3, 2024

By the way another intresting thing of the new API I bumped into is that it modifies the user supplied data structure by adding the stop bit, which I haven't found to be quite intuitive. If a code path prepares a message set, sends only part of it, then extends it and sends it again without re-setting the flags on the previously-last message, it's going to find a stop that it has not set there.

Yeah, the msgs should be const, and the user should be passing valid transfers (setting the stop bit of the last msg for example).

I will create a new issue/pr to hopefully make this change as well :)

@fabiobaltieri
Copy link
Member Author

Yeah, the msgs should be const, and the user should be passing valid transfers (setting the stop bit of the last msg for example).

And then just ensuring that the bit is set rather than changing it automatically? Sounds good to me, way more defensive.

@bjarki-andreasen
Copy link
Collaborator

bjarki-andreasen commented Aug 7, 2024

@teburd if RTIO included a timeout action similar to the callback action, to be included in a sequence, could one theoretically define a sequence which performs a transfer which ends with the bus busy/not stopped, waits for a timeout, then releases the bus?

@teburd
Copy link
Collaborator

teburd commented Aug 7, 2024

@teburd if RTIO included a timeout action similar to the callback action, to be included in a sequence, could one theoretically define a sequence which performs a transfer which ends with the bus busy/not stopped, waits for a timeout, then releases the bus?

Yes

@bjarki-andreasen
Copy link
Collaborator

bjarki-andreasen commented Aug 8, 2024

On a high level, creating a sequence of actions to be performed on the bus, where each sequence is "complete", starting when the bus is idle, and ending with the bus idle, should fix the locking/sync issue. (RTIO seems perfect here if we add a timeout action)

If this can be combined with a set of "supported" flags which bus controllers can expose, the entere sequencecan be validated before passing it to the controller, to provide a simple failure path, removing such checks from the device drivers themselves :)

@fabiobaltieri @teburd @henrikbrixandersen WDYT?

@fabiobaltieri
Copy link
Member Author

fabiobaltieri commented Aug 9, 2024

If we need to add a feature that allows for non-implied stop bit being sent, fine... but I think all bets are off in terms of how much help and portability you are going to get from drivers.

Sure, this is just stuff we have to deal with as part of generic API design. Look at GPIO controllers, half of them don't support either trigger on both front or level trigger interrupt, would you imagine changing the doc saying that this not supported at API level and then blocking it entirely? Some device use it, if your combination of device and mcu does not support it, you get an error and have to figure a way around it.

The change here is now breking any potential implementation of an smbus block read apparently, which may not have any in-tree usage but there's apparently applications out there the relies on it, and sure it's not just @henrikbrixandersen and I, we are just the first to have hit it. Now I understand that the previous API had plenty of race condition potential, but then that's the problem that should be fixed, not restrict it in a case that is going to potentially break downstream applications that were previously working for whatever reasons.

On a high level, creating a sequence of actions to be performed on the bus, where each sequence is "complete", starting when the bus is idle, and ending with the bus idle, should fix the locking/sync issue. (RTIO seems perfect here if we add a timeout action)

Yeah sure, can't quite picture what you have in mind but I'm sure it'll work. Or have a locking API like the one for SPI. My question is whether it's be cleaner to start from before #75801 or not, considering that as it turns out the documented behavior was only part of 3.7 and the actual enforcing is only live now. Plus the more I look at it the more I think that silently modifying the user supplied transfer struct is pretty bad api design, it should really return an error if the provided operation is not supported.

@bjarki-andreasen I think you mentioned that this is not supported by the Nordic TWAI controller but it sounds like it was? Just reading back more issues and bumped into #75771 and #69488

@fabiobaltieri fabiobaltieri removed the DNM This PR should not be merged (Do Not Merge) label Sep 11, 2024
@fabiobaltieri
Copy link
Member Author

fabiobaltieri commented Sep 11, 2024

@teburd @bjarki-andreasen hey I should have some time in the next few days to work on a proposal for the i2c stop dilemma, that said I'm now thinking it make sense to have this one as a temporary stop gap, rebased and removed dnm, will clean it up myself asap.

@fabiobaltieri
Copy link
Member Author

Sorry there was a question in my last message: you mind if we still go ahead with this one while I work on the rework? Saves me from holding it back downstream and some other user may use it as well, I expect the rework to be under discussion for a bit anyway.

@bjarki-andreasen
Copy link
Collaborator

Sorry there was a question in my last message: you mind if we still go ahead with this one while I work on the rework? Saves me from holding it back downstream and some other user may use it as well, I expect the rework to be under discussion for a bit anyway.

oh, misunderstood the comment, yes, we should merge this as a stop gap :) I though you where going to add the rework to this PR :)

teburd
teburd previously approved these changes Sep 12, 2024
@fabiobaltieri
Copy link
Member Author

Thanks folks.

ubieda
ubieda previously approved these changes Sep 12, 2024
The I2C transfer API has been recently changed to always automatically
set a STOP on the last message, which was well documented but
implemented only by few drivers.

Unfortunately, while documented, this is a change in the current
behavior and it turns out that some applications depended on it for some
complex operations.

Add a flag to temporarily restore the old behavior, buying time to fix
the application code depending on this.

Signed-off-by: Fabio Baltieri <[email protected]>
@fabiobaltieri
Copy link
Member Author

@teburd can you stamp this again please?

@fabiobaltieri fabiobaltieri merged commit 61f4ba2 into zephyrproject-rtos:main Sep 23, 2024
23 checks passed
@fabiobaltieri fabiobaltieri deleted the i2c-nostop branch September 23, 2024 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants