Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add way to escape 'Another set in progress'. #901

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

danielballan
Copy link
Member

@danielballan danielballan commented Sep 1, 2020

See #757. This is one possible approach.

TO DO

  • Needs broad input about whether this is the right way to address Revisit 'another set in progress' #757
  • Needs tests
  • As written this leaves a polling thread dangling. To fix that properly, we'd need set_and_wait to accept a poison pill threading.Event or some other means of signaling that it should stop waiting.

ophyd/signal.py Outdated
Escape 'Another set in progress'.
"""
warnings.warn(
"A previous set() operaiton is being ignored. Only this do this "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: "operation"

@prjemian
Copy link
Contributor

prjemian commented Sep 1, 2020

So far, this PR makes a clean way to dump the thread in the same way we do now. We need to terminate those dangling threads. As is, console session does not exit cleanly, requiring one or more ^C to fully exit the process.

@danielballan
Copy link
Member Author

danielballan commented Sep 1, 2020

Good point, @prjemian. I took it further in this next commit. I tested against this pathological caproto IOC which takes forever to set:

#!/usr/bin/env python3
from caproto.server import pvproperty, PVGroup, ioc_arg_parser, run


class Stubborn(PVGroup):
    """
    When a PV is written to, write the new value into a file as a string.
    """

    async def my_write(self, instance, value):
        print(f"Ignored request to write {value}")
        return 0

    A = pvproperty(put=my_write, value=0)
    B = pvproperty(put=my_write, value=0)


if __name__ == '__main__':
    ioc_options, run_options = ioc_arg_parser(
        default_prefix='stubborn:',
        desc='Run an IOC with PVs that ignores write requests')
    ioc = Stubborn(**ioc_options)
    run(ioc.pvdb, **run_options)

Before, console does not exit cleanly:

In [1]: from ophyd import EpicsSignal                                                                                                                                                         

In [2]: s = EpicsSignal("stubborn:A", name='s')                                                                                                                                               

In [3]: s.set(1)                                                                                                                                                                              
Out[3]: Status(obj=EpicsSignal(read_pv='stubborn:A', name='s', timestamp=1598986036.873889, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False), done=False, success=False)

In [4]: _.done                                                                                                                                                                                
Out[4]: False

In [5]: s.set(2)                                                                                                                                                                              
<snipped>
RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.

In [6]: s.clear_set()                                                                                                                                                                         
/home/dallan/Repos/bnl/ophyd/ophyd/signal.py:318: UserWarning: A previous set() operaiton is being ignored. Only this do this when debugging or recovering from a hardware failure.
  "A previous set() operaiton is being ignored. Only this do this "

In [7]: exit                                                                                                                                                                                  
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 269, in set_thread
    rtol=self.rtolerance)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 263, in set_and_wait
    current_value = signal.get()
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1108, in get
    self._read_pv, timeout, connection_timeout, as_string, form)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1066, in _get_with_timeout
    info = pv.get_with_metadata(as_string=as_string, form=form, timeout=timeout)
  File "/home/dallan/.conda/envs/py37/lib/python3.7/site-packages/epics/pv.py", line 46, in wrapped
    raise RuntimeError('Expected CA context is unset')
RuntimeError: Expected CA context is unset

After, clear_set() causes an AbandonedSet exception to be raised in the in the background thread and waits for the thread to join() before returning, so now the console exits cleanly.

In [1]: from ophyd import EpicsSignal                                                                                                                                                         

In [2]: s = EpicsSignal("stubborn:A", name='s')                                                                                                                                               

In [3]: s.set(1)                                                                                                                                                                              
Out[3]: Status(obj=EpicsSignal(read_pv='stubborn:A', name='s', value=1, timestamp=1598986283.0511467, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False), done=False, success=False)

In [4]: _.done                                                                                                                                                                                
Out[4]: False

In [5]: s.set(2)                                                                                                                                                                              
<snipped>
RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.

In [6]: s.clear_set()                                                                                                                                                                         
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 272, in set_thread
    rtol=self.rtolerance, poison_pill=poison_pill)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet
ophyd.utils.epics_pvs.AbandonedSet
/home/dallan/Repos/bnl/ophyd/ophyd/signal.py:328: UserWarning: A previous set() operaiton is being ignored. Only this do this when debugging or recovering from a hardware failure.
  "A previous set() operaiton is being ignored. Only this do this "

In [7]: exit  # no errors at exit

@ZLLentz
Copy link
Member

ZLLentz commented Sep 2, 2020

I think this kind of event wait poison pill approach is great for killing these background set threads.
We've run into this a few times at SLAC, but in our case this indicated a problem with either the IOC or the device class- can't quite remember the details.

Do you think signals should automatically call this at python exit? It might smooth out the shutdown of some of the ipython sessions.

@danielballan
Copy link
Member Author

We've run into this a few times at SLAC, but in our case this indicated a problem with either the IOC or the device class-

Yes, I think that will be the case in general. My only reservation about adding this is that it might be used routinely, rather than as intended role: allowing you to salvage the ophyd object instance but prompting you to look into the device or IOC issue. But, the current situation, where if you have bad failed set you have to remake your instance or (as is common in practice) restart your whole process, seem worse.

Do you think signals should automatically call this at python exit? It might smooth out the shutdown of some of the ipython sessions.

Great suggestion. Thanks. Will add now....

@prjemian
Copy link
Contributor

prjemian commented Sep 2, 2020

Will add now....

... realizing that the current implementation will be changed from the current technique of putting black tape over the blinking red light. (That is, it sets the object reference to None.)

@danielballan
Copy link
Member Author

Before, unclean exit because _set_thread is dangling:

In [1]: from ophyd import EpicsSignal                                                                                                                                                         

In [2]: s = EpicsSignal("stubborn:A", name='s')                                                                                                                                               

In [3]: s.set(1)                                                                                                                                                                              
Out[3]: Status(obj=EpicsSignal(read_pv='stubborn:A', name='s', timestamp=1599074325.10357, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False), done=False, success=False)

In [4]: _.done                                                                                                                                                                                
Out[4]: False

In [5]: exit                                                                                                                                                                                  
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 272, in set_thread
    rtol=self.rtolerance, poison_pill=poison_pill)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 274, in set_and_wait
    current_value = signal.get()
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1117, in get
    self._read_pv, timeout, connection_timeout, as_string, form)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 1075, in _get_with_timeout
    info = pv.get_with_metadata(as_string=as_string, form=form, timeout=timeout)
  File "/home/dallan/.conda/envs/py37/lib/python3.7/site-packages/epics/pv.py", line 46, in wrapped
    raise RuntimeError('Expected CA context is unset')
RuntimeError: Expected CA context is unset

After, clear_set called at exit by a weakref.finalizer gives a clear error message:

In [5]: exit         
Traceback (most recent call last):
  File "/home/dallan/Repos/bnl/ophyd/ophyd/signal.py", line 274, in set_thread
    rtol=self.rtolerance, poison_pill=poison_pill)
  File "/home/dallan/Repos/bnl/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1599075154.647881, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.

@ZLLentz
Copy link
Member

ZLLentz commented Sep 2, 2020

My only reservation about adding this is that it might be used routinely

This is a very fair concern. In the past we've talked a lot about not leaving traps for our users.

I think this PR does a good job of making the failure at least very verbose with the warnings and exceptions raised, hopefully verbose enough that someone investigates the root cause of the issue rather than patching over it indefinitely.

In the context of "make it work for the user experiment" though, I think it's important to provide tools like this, else we end up doing some sort of kluge that makes the experiment work but is otherwise an accrual of technical debt. I'll take a standard bandaid over roll-your-own bandaid any day of the week.

@danielballan
Copy link
Member Author

Well said.

@prjemian
Copy link
Contributor

prjemian commented Sep 3, 2020

@danielballan : Your comment above shows exactly the same sort of error I was seeing recently. In that comment from USAXS, there is an attached text file with the thousands of lines output from the error dump on exiting the console session.

In a loop (with bps.repeat), I ran a plan that included six different bp.scan runs. Four of the scans involve a motor with the .STUP field as an added signal:

class GuardSlitMotor(EpicsMotor):
    status_update = Component(EpicsSignal, ".STUP")

To clear the problems previously reported, I include at the end of a set of scans a call to the yield from bps.abs_set(MOTOR.status_update, 1). The EPICS motor record receives that, does the status update in the motor record, and then sets .STUP back to zero. Ophyd is left waiting on the status_update signal so the status object for the .set() is waiting in a thread. To kill this thread, I call MOTOR.status_update._set_thread = None on each of these (4) motors. After some repetitions of this set of 6 scans, the time between starts of the set becomes noticeably longer as well as start times of each scan become longer, too. Observe in a plot that the longer this problem went, the slower the scan repeated itself.

I'm concerned that the fix of MOTOR.status_update._set_thread = None is not enough. I assume that what I see shown in the plot (cited in the details) as increasingly longer times to repeat the same plan is due to accumulated resource leakage with these abandoned threads.

@ZLLentz
Copy link
Member

ZLLentz commented Sep 3, 2020

@prjemian unless I'm misreading something here, this PR ensures that the set thread will die using a threading.Event to send it a kill signal, either when requested by the user or upon finalization. So I think this fix as implemented will help with console exit error dump quite a bit.

@danielballan
Copy link
Member Author

Right. In the earliest draft of this PR I just dropped the reference to the thread (“put black tape over the red warning light”) to prompt discussion on the question “Do we even want to expose this API?” But in the current draft I first ensure the thread will stop, as @ZLLentz describes, and only then drop the reference.

@prjemian
Copy link
Contributor

prjemian commented Sep 3, 2020

Ah, I see the additional code now. Should I patch locally and give it a try? Seems I have a stress test demonstrated above.

@danielballan
Copy link
Member Author

That would be great!

@prjemian
Copy link
Contributor

Ok, NeXus conferences are done, back in the Bluesky saddle again. Tested with caproto IOC per above using this test code:

from ophyd import EpicsSignal

s = EpicsSignal("stubborn:A", name='s')
st = s.set(1)
st.done
st = s.set(2)
st.done
exit
  1. Verified same result when using ophyd/master branch (1.5.4) that same result was received:
  2. Using danielballan:escape-another-set-in-progress branch, did not get a clean exit:
(bluesky_2020_9) prjemian@poof ~/.../Bluesky/ophyd $ git branch
* escape-another-set-in-progress
  master
(bluesky_2020_9) prjemian@poof ~/.../Bluesky/ophyd $ ipython
Python 3.8.2 (default, Mar 26 2020, 15:53:00) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ophyd import EpicsSignal 
   ...:  
   ...: s = EpicsSignal("stubborn:A", name='s') 
   ...: st = s.set(1) 
   ...: st.done 
   ...: st = s.set(2) 
   ...: st.done 
   ...: exit 
   ...:                                                                                                                                                      
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-548872e6ae6b> in <module>
      4 st = s.set(1)
      5 st.done
----> 6 st = s.set(2)
      7 st.done
      8 exit

~/Documents/projects/Bluesky/ophyd/ophyd/signal.py in set(self, value, timeout, settle_time)
   1706         '''
   1707         if not self._put_complete:
-> 1708             return super().set(value, timeout=timeout, settle_time=settle_time)
   1709 
   1710         # using put completion:

~/Documents/projects/Bluesky/ophyd/ophyd/signal.py in set(self, value, timeout, settle_time)
    305 
    306         if self._set_thread is not None:
--> 307             raise RuntimeError(
    308                 f"Another set() call is still in progress for {self.name}. "
    309                 "If this is due to some transient failure, verify that the "

RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.

In [2]: st.done 
   ...: exit 
   ...:                                                                                                                                                      
Out[2]: <IPython.core.autocall.ExitAutocall at 0x7f2acb4969a0>

In [3]: exit                                                                                                                                                 
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1603738132.896743, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.
(bluesky_2020_9) prjemian@poof ~/.../Bluesky/ophyd $ 

FYI

(bluesky_2020_9) prjemian@poof ~/.../Bluesky/ophyd $ conda list ophyd
# packages in environment at /home/prjemian/Apps/anaconda/envs/bluesky_2020_9:
#
# Name                    Version                   Build  Channel
ophyd                     1.5.3.post4+g14a4fd4           dev_0    <develop>

@prjemian
Copy link
Contributor

But, adding the call to clear_set() after the failed set() allows clean exit:

~/Documents/projects/Bluesky/ophyd/ophyd/signal.py in set(self, value, timeout, settle_time)
    305 
    306         if self._set_thread is not None:
--> 307             raise RuntimeError(
    308                 f"Another set() call is still in progress for {self.name}. "
    309                 "If this is due to some transient failure, verify that the "

RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.

In [2]: s.clear_set()                                                                                                                                        
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1603738591.861509, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.
/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py:334: UserWarning: A previous set() operation is being ignored. Only this do this when debugging or recovering from a hardware failure.
  warnings.warn(

In [3]: exit                                                                                                                                                 

@prjemian
Copy link
Contributor

So, this code is noisy (can't suppress the comments) and it goes against the advice provided, BUT it continues to operate past the reported problems:

from ophyd import EpicsSignal
from ophyd.utils.epics_pvs import AbandonedSet

s = EpicsSignal("stubborn:A", name='s')

for cycle in range(3):
    st = s.set(1)
    print(f"cycle {cycle}, status: {st.done}")

    s.clear_set()
    print(f"cycle {cycle}, status: {st.done}")

s.clear_set()

exit()

ipython session:

(bluesky_2020_9) prjemian@poof /tmp $ ipython
Python 3.8.2 (default, Mar 26 2020, 15:53:00) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ophyd import EpicsSignal 
   ...: from ophyd.utils.epics_pvs import AbandonedSet 
   ...:  
   ...: s = EpicsSignal("stubborn:A", name='s') 
   ...:  
   ...: for cycle in range(3): 
   ...:     st = s.set(1) 
   ...:     print(f"cycle {cycle}, status: {st.done}") 
   ...:  
   ...:     s.clear_set() 
   ...:     print(f"cycle {cycle}, status: {st.done}") 
   ...:  
   ...: s.clear_set() 
   ...:  
   ...: exit() 
   ...:                                                                                                                                                      
cycle 0, status: False
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1603747329.535178, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.
/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py:334: UserWarning: A previous set() operation is being ignored. Only this do this when debugging or recovering from a hardware failure.
  warnings.warn(
cycle 0, status: True
cycle 1, status: False
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1603747329.537657, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.
cycle 1, status: False
cycle 2, status: False
set_and_wait(value=1, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='stubborn:A', name='s', value=0, timestamp=1603747329.539203, auto_monitor=False, string=False, write_pv='stubborn:A', limits=False, put_complete=False) was set to 1 but it does not seem to have finished. We are no longer watching for it.
cycle 2, status: False
(bluesky_2020_9) prjemian@poof /tmp $ 

@prjemian
Copy link
Contributor

Now for the stress test.

Code

#!/usr/bin/env python3

from ophyd import EpicsSignal
from ophyd.utils.epics_pvs import AbandonedSet
import time

s = EpicsSignal("sky:m1.STUP", name='s', string=True)

for cycle in range(3):
    print(">"*40)
    st = s.set("ON")
    print(f"cycle {cycle}, status: {st.done}")
    time.sleep(0.05)

    s.clear_set()
    print(f"cycle {cycle}, status: {st.done}")
    print("<"*40)

s.clear_set()

exit()

without s.clear_set()

(bluesky_2020_9) prjemian@poof /tmp $ ./test_motor.py 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
cycle 0, status: False
cycle 0, status: False
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Traceback (most recent call last):
  File "./test_motor.py", line 11, in <module>
    st = s.set("ON")
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 1708, in set
    return super().set(value, timeout=timeout, settle_time=settle_time)
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 307, in set
    raise RuntimeError(
RuntimeError: Another set() call is still in progress for s. If this is due to some transient failure, verify that the device is configured the way you expect, and use clear_set() to ignore and abandon the previous set() operation.
set_and_wait(value=ON, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='sky:m1.STUP', name='s', value='OFF', timestamp=1603748190.873643, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False) was set to ON but it does not seem to have finished. We are no longer watching for it.

with s.clear_set()

(bluesky_2020_9) prjemian@poof /tmp $ ./test_motor.py 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
cycle 0, status: False
set_and_wait(value=ON, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='sky:m1.STUP', name='s', value='OFF', timestamp=1603748220.059571, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False) was set to ON but it does not seem to have finished. We are no longer watching for it.
/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py:334: UserWarning: A previous set() operation is being ignored. Only this do this when debugging or recovering from a hardware failure.
  warnings.warn(
cycle 0, status: False
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
cycle 1, status: False
set_and_wait(value=ON, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='sky:m1.STUP', name='s', value='BUSY', timestamp=1603748220.081061, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False) was set to ON but it does not seem to have finished. We are no longer watching for it.
cycle 1, status: False
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
cycle 2, status: False
CA.Client.Exception...............................................
    Warning: "Channel write request failed"
    Context: "op=1, channel=sky:m1.STUP, type=DBR_ENUM, count=1, ctx="sky:m1""
    Source File: ../oldChannelNotify.cpp line 158
    Current Time: Mon Oct 26 2020 16:37:00.132979121
..................................................................
set_and_wait(value=ON, timeout=None, atol=None, rtol=None)
Traceback (most recent call last):
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/signal.py", line 273, in set_thread
    set_and_wait(self, value, timeout=timeout, atol=self.tolerance,
  File "/home/prjemian/Documents/projects/Bluesky/ophyd/ophyd/utils/epics_pvs.py", line 271, in set_and_wait
    raise AbandonedSet(
ophyd.utils.epics_pvs.AbandonedSet: The signal EpicsSignal(read_pv='sky:m1.STUP', name='s', value='BUSY', timestamp=1603748220.081061, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False) was set to ON but it does not seem to have finished. We are no longer watching for it.
cycle 2, status: True
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Conclusion

The stress test passes with this addition. It is definitely noisy (lots of unpreventable console output). But the primary problem is solved.

@prjemian
Copy link
Contributor

Calling set() with an explicit timeout is less noisy:

In [30]: st = s.set("ON", timeout=0.1)                                                                                                                       

set_and_wait(value=ON, timeout=0.1, atol=None, rtol=None)
In [31]: st = s.set("ON", timeout=0.1)                                                                                                                       

set_and_wait(value=ON, timeout=0.1, atol=None, rtol=None)
In [32]: st = s.set("ON", timeout=0.1)                                                                                                                       

set_and_wait(value=ON, timeout=0.1, atol=None, rtol=None)

In the why didn't I think of that sooner category, isn't setting a timeout parameter the right way? Can the timeout=0.05 kwarg be used with bps.mv?

def plan9(signal):
    for _itr in range(3):
        try:
            yield from bps.mv(signal, "ON", timeout=0.01)
        except FailedStatus:
            pass

Yup, it works (if the exception is trapped):

In [13]: RE(plan9(s1))                                                                                                                                       
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
CA.Client.Exception...............................................
    Warning: "Channel write request failed"
    Context: "op=1, channel=sky:m1.STUP, type=DBR_ENUM, count=1, ctx="sky:m1""
    Source File: ../oldChannelNotify.cpp line 158
    Current Time: Mon Oct 26 2020 17:22:34.108184434
..................................................................
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
Out[13]: ()

@prjemian
Copy link
Contributor

The result can be quieter by suppressing the warning-level messages from the signal:

In [24]: s1.log.setLevel("ERROR")                                                                                                                            

In [25]: RE(plan9(s1))                                                                                                                                       
CA.Client.Exception...............................................
    Warning: "Channel write request failed"
    Context: "op=1, channel=sky:m1.STUP, type=DBR_ENUM, count=1, ctx="sky:m1""
    Source File: ../oldChannelNotify.cpp line 158
    Current Time: Mon Oct 26 2020 17:28:45.578939646
..................................................................
Out[25]: ()

In [26]: s1.log.setLevel("WARNING")                                                                                                                          

In [27]: RE(plan9(s1))                                                                                                                                       
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
CA.Client.Exception...............................................
    Warning: "Channel write request failed"
    Context: "op=1, channel=sky:m1.STUP, type=DBR_ENUM, count=1, ctx="sky:m1""
    Source File: ../oldChannelNotify.cpp line 158
    Current Time: Mon Oct 26 2020 17:28:56.273481707
..................................................................
set_and_wait(value=ON, timeout=0.01, atol=None, rtol=None)
Out[27]: ()

In [28]:                                                                                                                                                     

@prjemian
Copy link
Contributor

@danielballan @tacaswell
I'm sensing victory here by adding a timeout kwarg to the bps.mv() calls for metadata. We can suppress warnings (via signal.log.setLevel()) if they become a nuisance.

@prjemian
Copy link
Contributor

But now I ask the question why the timeout kwarg in EpicsSignal .set() method does not default to the .timeout value? Looking here:

st = Status(self, timeout=timeout, settle_time=settle_time)

This .set() will default the timeout kwarg to None. Since I've set the default timeout for the signal to be 10 s, I was expecting s.timeout to be 10 (and it is).

(bluesky_2020_9) prjemian@poof /tmp $ ipython
Python 3.8.2 (default, Mar 26 2020, 15:53:00) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from ophyd.signal import EpicsSignalBase                                                                                  

In [2]: EpicsSignalBase.set_default_timeout(timeout=10, connection_timeout=5)                                                     

In [3]: from ophyd import EpicsSignal                                                                                             

In [4]: s = EpicsSignal("sky:m1.STUP", name='s', string=True)                                                                     

In [5]: s.timeout                                                                                                                 
Out[5]: 10

Now, try st = s.set("ON") and wait 15 seconds (5 seconds beyond timeout) before printing the status. Expecting it be done.

import time
st = s.set("ON")
time.sleep(15)
print(st)

Expecting done, but no.

In [6]: import time 
   ...: st = s.set("ON") 
   ...: time.sleep(15) 
   ...: print(st) 
   ...:                                                                                                                           
Status(obj=EpicsSignal(read_pv='sky:m1.STUP', name='s', value='OFF', timestamp=1603771932.302131, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False), done=False, success=False)

Repeat but use a timeout=10 kwarg with .set() in a new ipython session:

In [1]: from ophyd import EpicsSignal 
   ...: import time 
   ...: s = EpicsSignal("sky:m1.STUP", name='s', string=True) 
   ...: st = s.set("ON", timeout=10) 
   ...: time.sleep(15) 
   ...: print(st) 
   ...:                                                                                                                           
set_and_wait(value=ON, timeout=10, atol=None, rtol=None)
Status(obj=EpicsSignal(read_pv='sky:m1.STUP', name='s', value='OFF', timestamp=1603772109.341146, auto_monitor=False, string=True, write_pv='sky:m1.STUP', limits=False, put_complete=False), done=True, success=False)

@tacaswell
Copy link
Contributor

This seems 👍 to me. It adds a bunch of complexity, but I think that this is the minimum complexity we can have if we are going to have background status threads like this.

@danielballan
Copy link
Member Author

Thanks for pushing on this @prjemian .

@prjemian
Copy link
Contributor

prjemian commented Jun 10, 2021 via email

prjemian added a commit to danielballan/ophyd that referenced this pull request Jun 10, 2021
prjemian added a commit to danielballan/ophyd that referenced this pull request Jun 10, 2021
@prjemian
Copy link
Contributor

I'm confused, .set() defines the poison_pill while .put() does not. Will there be troubles with .put()?

@prjemian
Copy link
Contributor

Why is the CI taking 6 hours to run?

@danielballan
Copy link
Member Author

Will there be troubles with .put()?

No, this is semi-internal. It operates one layer below Status objects so it should not have anything to do with "poison pill".

@danielballan
Copy link
Member Author

@tacaswell Can you help @prjemian get un-blocked here? This is close. I don't want to lose momentum on it again.

@tacaswell
Copy link
Contributor

The implementation of this reads reasonably, looking into if I can recreate the hang locally.

@prjemian
Copy link
Contributor

Based on the age of this PR, is this still viable?

@prjemian
Copy link
Contributor

We (@cooleyv and I) think we may be seeing this again.

@tacaswell tacaswell force-pushed the escape-another-set-in-progress branch from d2cc7bc to 82dadb2 Compare October 16, 2024 01:12
@tacaswell
Copy link
Contributor

did the mechanical rebase (90% just whitespace and quotes due to auto-formatting), lets see if the hangs have gone away.

@tacaswell tacaswell force-pushed the escape-another-set-in-progress branch from 9f50cfc to 711be9f Compare October 16, 2024 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants