Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lock() and unlock() #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MarkRivers
Copy link

GSECARS borrowed a Lambda detector from the detector pool as few weeks ago. We had some issues with the detector hanging up. I looked at the driver code today and I found some issues with mutex locking. These are the issues:

  • This driver uses 2 EPICS mutexes.

    • The first is the mutex inherited from asynPortDriver. It is controlled with this->lock() and this->unlock().
    • The second is the dequeLock mutex created in this driver.

    A very important rule when threads are using more than one mutex is that all threads must lock and unlock these mutexes in the same order. If they do not then a deadlock can occur. This is a problem in this driver.

  • All calls to the asynPortDriver parameter library (setIntegerParam(), callParamCallbacks(), etc.) must me made with the main asynPortDriver mutex locked. However, this driver does not have that mutex locked when it calls the parameter library in the following places:

I have no way to test this PR. There may be places that you do want to release the lock for computationally expensive operations when no memory shared with other threads is being accessed. But this must be done very carefully and only when needed.

@keenanlang
Copy link

A very important rule when threads are using more than one mutex is that all threads must lock and unlock these mutexes in the same order. If they do not then a deadlock can occur. This is a problem in this driver

While lock hierarchies are necessary with multiple mutexes, they only apply when multiple locks are being acquired at the same time. At any one point, you should not attempt to acquire a lock higher in the hierarchy than any lock that you already have, but you don't have to lock every lock in the hierarchy to get to the one that guards the resource you want.

Absent anything else in the code that could cause stalls, a thread that goes:

Lock A
Lock B
Code 1
Unlock B
Unlock A

and a thread going

Lock B
Code 2
Unlock B
Lock A
Code 3
Unlock A

does not have the possibility of a deadlock. Either thread holding any of the locks that they can will eventually make it to the point where they unlock the mutex, there is never a point where one thread will be waiting for a lock that wont eventually be unlocked (again, assuming that there is nothing in code 1, 2, or 3 that attempts to wait on A or B).

All calls to the asynPortDriver parameter library (setIntegerParam(), callParamCallbacks(), etc.) must me made with the main asynPortDriver mutex locked. However, this driver does not have that mutex locked when it calls the parameter library in the following places:

I've moved the locks around to fix the variable setting that was being done in the spots you mentioned. d630718.

@MarkRivers
Copy link
Author

@keenanlang my apologies you are correct there was not a problem with a deadlock.

However, your fix still leaves several problems in exportThread:

  • You are accessing class member variables (this->connected, this->pImage) in lines 509-521 without taking the lock to guard these against modification by other threads.

  • You are accessing the parameter library in lines 529-531 without the lock held. As I said previously this is a mistake.

In my opinion the correct way to write exportThread is to hold the lock the entire time except when calling epicsThreadSleep in line 511. Why not do this?

@keenanlang
Copy link

You are accessing class member variables (this->connected, this->pImage) in lines 509-521 without taking the lock to guard these against modification by other threads.

pImage isn't being modified by any other thread, so that isn't a problem; export_queue is protected by the mutex; and while connected isn't guarded, the worst that would happen from a race condition is that a single extra frame might be exported if someone tried to disconnect, but that isn't exactly a problem.

In my opinion the correct way to write exportThread is to hold the lock the entire time except when calling epicsThreadSleep in line 511. Why not do this?

While it wouldn't be a whole lot of extra code going on, I'd like to lock on as minimal amount of work as possible as the multi-module lambdas need as much processing time available for the stitching code as possible, especially as speeds and sizes increase.

You are accessing the parameter library in lines 529-531 without the lock held.

Adjusted.

@MarkRivers
Copy link
Author

The lock is also not being taken at the beginning of acquireThread where the parameter library is being accessed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants