Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alabos hot restart #75

Open
2 of 3 tasks
odartsi opened this issue Jul 2, 2024 · 2 comments
Open
2 of 3 tasks

Alabos hot restart #75

odartsi opened this issue Jul 2, 2024 · 2 comments
Assignees
Labels
bug not't working as it should be coding standards implement good coding practice feature request new feature to implement

Comments

@odartsi
Copy link
Contributor

odartsi commented Jul 2, 2024

Steps to follow:

  • 1. Remove https://github.com/CederGroupHub/alabos/blob/main/alab_management/device_manager.py completely. We will create the instances of devices in each task every time the task occupies the device. (@idocx )

    1. Possible issues: currently some devices have background threads running, e.g., LabmanQuadrant. But this is in general a bad way to handle background tasks as it is hard for debugging purpose.
    2. Device manager is only useful for monitoring state of devices such as glovebox(robotbox) argon flow. Therefore, in the future we will implement a separate thread to do this. I In each device we will have method call “check_status()” to check all parameters if its in the correct range.
    3. Steps:
      1. Package affected:
        1. DeviceManager: /scripts/launch_lab.py
          1. Find all that connects to DeviceManager
        2. DeviceClient: /lab_view.py → NOT GONE, just refactor
  • 2. Implement reload option for importing alab_one package. Currently, the alab_one package is imported to AlabOS process via https://github.com/CederGroupHub/alabos/blob/main/alab_management/utils/module_ops.py#L12. We will need to implement something similar to importlib.reload function. The new function should have such signature. (@bernardusrendy )

    def import_module_from_path(
        path: str | Path, 
        parent_package: str | None = None, 
        reload: bool = False
    ):
        ...
        ## get package_name from path
        package_name = ...
        if reload:
            importlib.reload(package_name)
  • 3. Implement process restart for AlabOS. This will be done via https://github.com/CederGroupHub/alabos/blob/main/alab_management/scripts/launch_lab.py#L70. Currently, there are four processes running. We will only need to restart them at a regular interval by adding a live_time argument to each manager class, e.g., (@odartsi )

class TaskManager:
    def __init__(self, live_time: float | None = None):
	      ...
	      self.live_time = live_time
	      
	  def run(self):
	      start = time.time()
	      while (time.time() - start) < self.live_time:
	          self._loop()

Then in the launch_lab function, we will need to start them process if it exits normally.

@odartsi odartsi added bug not't working as it should be coding standards implement good coding practice feature request new feature to implement labels Jul 2, 2024
@bernardusrendy
Copy link
Collaborator

bernardusrendy commented Jul 4, 2024

New problem:
For tasks that have already been created and under the status WAITING/READY, it has not been ran in dramatiq actor run_task.

Note that load_definition has not been called for those tasks.

Therefore, these WAITING tasks have a risk of mismatch in tasks parameters with what was defined when it was submitted.

For example, if we submitted a sample with Heating(time=720) and it is WAITING. We then update the Heating which does not accept time argument anymore, the old sample will run into an error.

Proposed solution:
This problem is fundamentally about versioning. We will solve this by keeping a local copy of older versions for each update of the task.

@bernardusrendy
Copy link
Collaborator

TODOs:

  1. Following this update, alab_one device definition should be updated to not contain any threading.
  2. More to come..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug not't working as it should be coding standards implement good coding practice feature request new feature to implement
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants