Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): Do not enqueue json commands on protocol load #14759

Merged
merged 22 commits into from
Apr 4, 2024

Conversation

TamarZanzouri
Copy link
Contributor

@TamarZanzouri TamarZanzouri commented Mar 29, 2024

Overview

closes https://opentrons.atlassian.net/browse/EXEC-352.
first step towards fixit commands. do not enqueue json protocol commands.

Test Plan

Tested with Json protocols and Postman:

  • Make sure loading a protocol and executing it are happening within order.
  • Make sure get run /commands returning the list properly with successful commands.
  • Make sure loading a failed protocol should fail the run and fail the command.
  • Make sure get run /commands for failed runs the list of commands, last command being the failed command.
  • Fixed e2e test to comply with these changes

Changelog

  • Do no enqueue commands in PE for Json command upon load.
  • Execute commands one by one when run get started - same way we do for python protocols.

Review requests

Changes make sense?
GET run /commands will not return the full list of commands if the run did not start - its a change we are doing to make json protocols run like python protocols. are we ok with this?

Risk assessment

Medium. need to do smoke tests for Json protocols and make sure these changes do not affect anything.

@TamarZanzouri TamarZanzouri changed the title Exec 352 do not enqueue json protocol commands chore(api): Do not enqueue json commands on protocol load Mar 29, 2024
@TamarZanzouri TamarZanzouri marked this pull request as ready for review March 29, 2024 20:38
@TamarZanzouri TamarZanzouri requested review from a team as code owners March 29, 2024 20:38
Copy link

codecov bot commented Mar 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.19%. Comparing base (0fbb4c7) to head (123f313).
Report is 41 commits behind head on edge.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             edge   #14759      +/-   ##
==========================================
+ Coverage   67.17%   67.19%   +0.02%     
==========================================
  Files        2495     2495              
  Lines       71483    71405      -78     
  Branches     9020     8992      -28     
==========================================
- Hits        48020    47984      -36     
+ Misses      21341    21305      -36     
+ Partials     2122     2116       -6     
Flag Coverage Δ
g-code-testing 92.43% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...i/src/opentrons/protocol_runner/protocol_runner.py 100.00% <ø> (ø)

... and 2 files with indirect coverage changes

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Nice work.

Comment on lines 360 to 368
async def _add_command_and_execute(self) -> None:
for command in self._queued_commands:
result = await self._protocol_engine.add_and_execute_command(command)
if result.error:
raise ProtocolCommandFailedError(
original_error=result.error,
message=f"{result.error.errorType}: {result.error.detail}",
)

Copy link
Contributor

@SyntaxColoring SyntaxColoring Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring queued commands for a moment, this doesn't look like it matches the existing implementation based on QueueWorker+get_next_to_execute(). This does not have:

Do we need to handle those things, or are they already handled elsewhere in some way that I'm missing?

Copy link
Contributor Author

@TamarZanzouri TamarZanzouri Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setup commands are added to the queue in PE but are being executed after the initial home commands instead of at the end of the run. adding prioritization will happen in a follow up pr.

Prioritization of "intent": "setup" commands
But according to the tests, this is still working, somehow?

Special handling of RunStoppedError to break instead of raising an error
Pausing on recoverable errors (added in #14646)
You can see #14753 for how I'm doing this for Python protocols. If we're making JSON behave more like Python now, maybe this should work the same way.

Will add logic in this pr

Yielding to the event loop on each command

I do not think we need this bc add_and_execute_command is async and add_command is synch but I will test this to make sure we hare not blocking anything.

Copy link
Contributor

@SyntaxColoring SyntaxColoring Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need this bc add_and_execute_command is async and add_command is synch but I will test this to make sure we hare not blocking anything.

We do need it, unfortunately, for the reasons described in this comment in the old implementation:

await self._command_executor.execute(command_id=command_id)
# Yield to the event loop in case we're executing a long sequence of commands
# that never yields internally. For example, a long sequence of comment commands.
await asyncio.sleep(0)

Unlike JavaScript, Python's await isn't guaranteed to yield to the event loop. It will only do so when it calls something that goes back to the event loop, like network I/O or an asyncio.sleep().


Other than that, that all makes sense, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I tested this by adding a background task that prints, then time.sleep(3) and was able to see the printing sequence while executing the commands. so this test is not efficient? @SyntaxColoring @sfoster1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the await because, crucially, what we're talking about here is in a different async stack than the lines you linked to @SyntaxColoring . This PR does not replace or even touch the execution queue worker.

The way the engine works, just so we agree, is that there's fundamentally at least three async stacks: the execution stack, which is controlled by the execution queue linked above and owned by the engine, and awaits command implementations; the runner task-queue stack, which is what is changed here and owned by the runner, and (now) dispatches commands via add_and_execute_command; and the stack that calls the runner's primary interface, which is I think owned by the server's run data manager.

The execution stack, as you mention, definitely needs to have an explicit yield in case it's running a bunch of commands that have no yield points... so it does, it's in what you linked above, which is still what's used and which this PR does not touch.

The server stack does not need to have an explicit yield as long as the runner uses the task queue, since if the runner uses a task queue it await self._task_queue.join(), which awaits a future, which is a yield point.

The task queue stack (which is what's changed here) does not need to have an explicit yield if it's using something that eventually uses wait_for because wait_for await (asyncio.Event())s, which is a sync point.

Copy link
Contributor

@SyntaxColoring SyntaxColoring Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense, thanks. I missed that this was still going through the QueueWorker under the hood.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means I was also wrong about this:

Special handling of RunStoppedError to break instead of raising an error

Per the documentation of ProtocolEngine.add_and_execute(), if the run is stopped, it actually doesn't raise a RunStoppedError—it returns the command still queued.

Copy link
Contributor Author

@TamarZanzouri TamarZanzouri Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SyntaxColoring
I had a feeling but I wanted to prove that through a test I am trying to add :-)

Which means I was also wrong about this:

Copy link
Contributor Author

@TamarZanzouri TamarZanzouri Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SyntaxColoring I added e2e tests to prove that python and json protocols now behave the same when stopping a protocol mid run

Copy link
Contributor

@mjhuff mjhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard my terrifying comments, tested and this does work. App & ODD don't appear to be affected as well!

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks excellent!

Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me to merge, thanks. Here are some comments on the tests that we've talked about in Slack.

Copy link
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, TY!

@SyntaxColoring SyntaxColoring changed the title chore(api): Do not enqueue json commands on protocol load feat(api): Do not enqueue json commands on protocol load Apr 4, 2024
@SyntaxColoring
Copy link
Contributor

Oh, and I renamed this from chore: ... to feat: ... because it's a deliberate change in the HTTP API, and we'll have to remember to describe it in the release notes.

@TamarZanzouri TamarZanzouri merged commit 65885b2 into edge Apr 4, 2024
23 checks passed
@TamarZanzouri TamarZanzouri deleted the EXEC-352-do-not-enqueue-json-protocol-commands branch April 4, 2024 17:47
Carlos-fernandez pushed a commit that referenced this pull request May 20, 2024
# Overview

closes https://opentrons.atlassian.net/browse/EXEC-352.
first step towards fixit commands. do not enqueue json protocol
commands.

# Test Plan

Tested with Json protocols and Postman:
- Make sure loading a protocol and executing it are happening within
order.
- Make sure get run `/commands` returning the list properly with
successful commands.
- Make sure loading a failed protocol should fail the run and fail the
command.
- Make sure get run `/commands` for failed runs the list of commands,
last command being the failed command.
- Fixed e2e test to comply with these changes


# Changelog

- Do no enqueue commands in PE for Json command upon load. 
- Execute commands one by one when run get started - same way we do for
python protocols.


# Review requests

Changes make sense? 
GET run` /commands` will not return the full list of commands if the run
did not start - its a change we are doing to make json protocols run
like python protocols. are we ok with this?

# Risk assessment

Medium. need to do smoke tests for Json protocols and make sure these
changes do not affect anything.

---------

Co-authored-by: Max Marrone <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants