Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SoundWire locking issues #5052

Open
plbossart opened this issue Jun 10, 2024 · 8 comments
Open

SoundWire locking issues #5052

plbossart opened this issue Jun 10, 2024 · 8 comments

Comments

@plbossart
Copy link
Member

plbossart commented Jun 10, 2024

There are 2 theoretical cases where the current locking is broken.

stream locking

the use of sdw_acquire_bus_lock() can lead to an AB/BA deadlock.

If two streams use the same link and the the m_rt information is added in opposite order, this loop would dead-lock.
Example Stream A uses link0 and link1, and Stream B uses link1 and link0.

The following loop would result in
StreamA lock link0
Stream B link link1
Deadlock.

        /* Iterate for all Master(s) in Master list */
	list_for_each_entry(m_rt, &stream->master_list, stream_node) {
		bus = m_rt->bus;

		mutex_lock(&bus->bus_lock);
	}

SyncArm/SyncGo locks on Intel platforms

On Intel platforms, the mechanism is to do a 'Sync Arm' in the pre-bank switch for all m_rt, then do a 'Sync Go' in the post_bank_switch callback of the first m_rt.

Assuming Stream A uses link 0 and link1, and stream B uses link2 and link3, it's theoretically possible to have the following sequence:

Stream A link0 sync arm
Stream A link1 sync arm
Stream B link2 sync arm
Stream A link0 sync go -> link 0, 1, 2 would start when only link0 and link1 should start...
Stream B link3 sync arm
Stream B link2 sync go

It's anyone's guess when might happen after the first sync go. At worst, link3 never starts. At best, link3 starts but wouldn't be synchronized with link2. Both are bad options.

It's not clear if these two cases explain the problems we're seeing in #4823 but it's rather of a concern that this locking looks rather broken.

@bardliao @ujfalusi let me know if this makes any sense.

@plbossart
Copy link
Member Author

plbossart commented Jun 10, 2024

https://sof-ci.ostc.intel.com/#/result/planresultdetail/42223?model=LNLM_SDW_AIOC&testcase=check-playback-3times is also interesting, we can see a regular transfer starting before a bank switch completes

[  397.255338] kernel: soundwire sdw-master-0-0: do_transfer_defer
[  397.255342] kernel: soundwire_intel soundwire_intel.link.0: intel_sync_go_unlocked: started
[  397.255348] kernel: soundwire_intel soundwire_intel.link.0: intel_sync_go_unlocked: done
[  397.255349] kernel: soundwire_intel soundwire_intel.link.0: intel_post_bank_switch: sync_go wait start
[  397.257453] kernel: soundwire_intel soundwire_intel.link.0: intel_post_bank_switch: sync_go wait done
[  397.279332] kernel: soundwire sdw-master-0-0: sdw_transfer  <<<<  THIS SHOULD NOT HAPPEN!!!!
[  400.270302] kernel: soundwire sdw-master-0-0: Controller Timed out on bank switch
[  400.270416] kernel: soundwire sdw-master-0-0: multi link bank switch failed: -110
[  400.270474] kernel: _sdw_prepare_stream: do_bank_switch failed for stream subdevice #0-Playback: -110
[  400.270620] kernel: sdw_release_bus_lock: releasing locks for stream subdevice #0-Playback
[  400.270629] kernel: sdw_release_bus_lock: released locks for stream subdevice #0-Playback
[  400.270633] kernel:  SDW0-Playback: ASoC: error at snd_soc_link_prepare on SDW0-Playback: -110
[  400.270737] kernel:  SDW0-Playback: ASoC: error at __soc_pcm_prepare on SDW0-Playback: -110
[  400.270768] kernel:  Jack Out: ASoC: error at dpcm_be_dai_prepare on Jack Out: -110
[  400.270798] kernel:  Jack Out: ASoC: error at dpcm_fe_dai_prepare on Jack Out: -110

@plbossart
Copy link
Member Author

plbossart commented Jun 11, 2024

Another possible locking issue:

a) the bank switch starts and the bus takes the msg->lock
b) an interrupt happens and the workqueue schedules a read from IntStat0 register, which is blocked by the msg->lock
c) the manager writes the bank switch command, which is blocked by the read that won't go out.

So I think we need to disable interrupts coming from the peripherals during a bank switch.

Edit: after talking with @bardliao this isn't really possible.

@bardliao
Copy link
Collaborator

Another possible locking issue:

a) the bank switch starts and the bus takes the msg->lock b) an interrupt happens and the workqueue schedules a read from IntStat0 register, which is blocked by the msg->lock c) the manager writes the bank switch command, which is blocked by the read that won't go out.

So I think we need to disable interrupts coming from the peripherals during a bank switch.

I added 3 sec sleep after do_transfer_defer, and keep pressing button key while headphone playback. It looks like the sdw_transfer will be called after bank switch in normal case. Not sure what happens in the failed case.

[   61.834921] sdw_acquire_bus_lock: acquiring locks for stream subdevice #0-Playback
[   61.834923] sdw_acquire_bus_lock: acquired locks for stream subdevice #0-Playback
[   61.834925] soundwire sdw-master-0-0: sdw_transfer
[   61.835091] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   61.835166] soundwire sdw-master-0-0: sdw_transfer
[   61.835382] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   61.835446] soundwire sdw-master-0-0: sdw_transfer
[   61.835505] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   61.835522] soundwire sdw-master-0-0: sdw_transfer
[   61.835608] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   61.835624] soundwire sdw-master-0-0: sdw_transfer
[   61.835677] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   61.835692] soundwire sdw-master-0-0: sdw_transfer
[   61.835805] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   61.835867] soundwire sdw-master-0-0: sdw_transfer
[   61.835983] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   61.836046] soundwire sdw-master-0-0: sdw_transfer
[   61.836125] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   61.836185] soundwire sdw-master-0-0: sdw_transfer
[   61.836286] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   61.836365] soundwire sdw-master-0-0: sdw_transfer
[   61.836580] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   61.836590] soundwire sdw-master-0-0: sdw_transfer
[   61.836650] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   61.836673] soundwire sdw-master-0-0: do_transfer_defer
[   61.836679] soundwire sdw-master-0-0: bard: before sleep 3 sec
[   64.838910] soundwire sdw-master-0-0: bard: after sleep 3 sec
[   64.839577] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   64.839654] soundwire sdw-master-0-0: sdw_transfer
[   64.840052] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   64.840133] sdw_release_bus_lock: releasing locks for stream subdevice #0-Playback
[   64.840174] sdw_release_bus_lock: released locks for stream subdevice #0-Playback

@bardliao
Copy link
Collaborator

Same test as #5052 (comment) with #5047

[   43.664000] sdw_acquire_bus_lock: acquiring locks for stream subdevice #0-Playback
[   43.664002] sdw_acquire_bus_lock: acquired locks for stream subdevice #0-Playback
[   43.664006] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x130 flags 0x1
[   43.664009] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   43.664211] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   43.664281] soundwire sdw-master-0-0: sdw_transfer: done
[   43.664296] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x102 flags 0x0
[   43.664301] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   43.664477] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   43.664553] soundwire sdw-master-0-0: sdw_transfer: done
[   43.664556] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x102 flags 0x1
[   43.664561] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   43.664611] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   43.664621] soundwire sdw-master-0-0: sdw_transfer: done
[   43.664623] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x103 flags 0x1
[   43.664625] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   43.664678] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   43.664687] soundwire sdw-master-0-0: sdw_transfer: done
[   43.664689] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x132 flags 0x1
[   43.664690] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   43.664768] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   43.664830] soundwire sdw-master-0-0: sdw_transfer: done
[   43.664834] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x134 flags 0x1
[   43.664837] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   43.665017] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   43.665083] soundwire sdw-master-0-0: sdw_transfer: done
[   43.665088] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x138 flags 0x1
[   43.665092] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   43.665322] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   43.665394] soundwire sdw-master-0-0: sdw_transfer: done
[   43.665398] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x135 flags 0x1
[   43.665402] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   43.665470] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   43.665481] soundwire sdw-master-0-0: sdw_transfer: done
[   43.665482] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x137 flags 0x1
[   43.665484] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   43.665532] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   43.665542] soundwire sdw-master-0-0: sdw_transfer: done
[   43.665543] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x133 flags 0x1
[   43.665544] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   43.665595] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   43.665604] soundwire sdw-master-0-0: sdw_transfer: done
[   43.665605] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x136 flags 0x1
[   43.665607] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   43.665664] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   43.665673] soundwire sdw-master-0-0: sdw_transfer: done
[   43.665682] soundwire_intel soundwire_intel.link.0: intel_sync_arm: started
[   43.665687] soundwire_intel soundwire_intel.link.0: intel_sync_arm: done
[   43.665691] soundwire sdw-master-0-0: do_transfer_defer
[   43.665699] soundwire sdw-master-0-0: bard: before sleep 3 sec
[   46.726949] soundwire sdw-master-0-0: bard: after sleep 3 sec
[   46.726983] soundwire_intel soundwire_intel.link.0: intel_sync_go_unlocked: started
[   46.727002] soundwire_intel soundwire_intel.link.0: intel_sync_go_unlocked: done
[   46.727004] soundwire_intel soundwire_intel.link.0: intel_post_bank_switch: sync_go wait start
[   46.727537] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   46.729114] soundwire_intel soundwire_intel.link.0: intel_post_bank_switch: sync_go wait done
[   46.729124] soundwire sdw-master-0-0: sdw_transfer: dev 6 addr 0x120 flags 0x1
[   46.729128] soundwire sdw-master-0-0: sdw_transfer: after mutex
[   46.729277] soundwire_intel soundwire_intel.link.0: sdw_cdns_irq int_status 0x80000004
[   46.729323] soundwire sdw-master-0-0: sdw_transfer: done
[   46.729327] sdw_release_bus_lock: releasing locks for stream subdevice #0-Playback
[   46.729330] sdw_release_bus_lock: released locks for stream subdevice #0-Playback

@bardliao
Copy link
Collaborator

a) the bank switch starts and the bus takes the msg->lock
b) an interrupt happens and the workqueue schedules a read from IntStat0 register, which is blocked by the msg->lock
c) the manager writes the bank switch command, which is blocked by the read that won't go out.

Are you referring to mutex_lock(&bus->msg_lock);? mutex_lock(&bus->msg_lock); is used by do_bank_switch and sdw_transfer. So in theory, sdw_transfer will wait for do_bank_switch when an interrupt happens, right?

@plbossart
Copy link
Member Author

In theory yes, but there's a corner case as described above

a) the bank switch starts and the bus takes the msg->lock
b) an interrupt happens and the workqueue schedules a read from IntStat0 register, which is blocked by the msg->lock
c) the manager writes the bank switch command, which is blocked by the read that won't go out.

b) is blocked by a) and c) is blocked by b)

@plbossart
Copy link
Member Author

wait @bardliao are you saying that with a 3s sleep time you never see any bank switch errors?

@bardliao
Copy link
Collaborator

wait @bardliao are you saying that with a 3s sleep time you never see any bank switch errors?

No, I tested it manually which can not reproduce the issue anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants