-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: configure performance-metrics to work on robot #15316
Conversation
7e15359
to
20bbb43
Compare
7436749
to
b7274f1
Compare
make everything that is not TrackingFunctions private Wrap analyze in protocol_analyzer Wrap getting cached protocol logic
91360f8
to
0cb46ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes requested for performance-metrics/Makefile and robot-server/Pipfile (unless I'm missing something). Other than that, LGTM. Thanks!
protocol_store=protocol_store, | ||
analysis_store=analysis_store, | ||
analyses_manager=analyses_manager, | ||
@TrackingFunctions.track_getting_cached_protocol_analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it'll work for now, but it seems like it's turning out that we want the trackers to work as context managers instead of decorators, right? In other words, we shouldn't have to turn a chunk of code into an inner function just for the sake of packaging it up and handing to track_getting_cached_protocol_analysis()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think you and @sfoster1 have been trying hard to avoid performance_metrics
introducing even small overhead. I'm not sure if this is what we're doing now, but, for example, at one point you were checking the feature flag once on first import to avoid the overhead of an if
/else
every time the function under measurement was called. Is that still a goal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it'll work for now, but it seems like it's turning out that we want the trackers to work as context managers instead of decorators, right? In other words, we shouldn't have to turn a chunk of code into an inner function just for the sake of packaging it up and handing to track_getting_cached_protocol_analysis().
See #14834 (comment) for a previous discussion about this. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think you and @sfoster1 have been trying hard to avoid
performance_metrics
introducing even small overhead. I'm not sure if this is what we're doing now, but, for example, at one point you were checking the feature flag once on first import to avoid the overhead of anif
/else
every time the function under measurement was called. Is that still a goal?
@SyntaxColoring, you're correct that it should not be checking every time. But because _should_track
is evaluated at import time you have to restart the robot-server if you ever want to update. Should I set the ff restart_required to True maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the background!
So to back up, you've gone to pains in the design of performance_metrics
to avoid introducing overhead. But then it looks like the overhead is getting added here anyway?
Specifically—and I have not profiled this, but—if every call to create_protocol()
defines an inner function _get_cached_protocol_analysis()
, and then decorates it, and then calls it, I'm skeptical that all of that will be cheaper than it would have been to just enter and exit a context manager. Note that this particular decoration does not run at file parse time the way #14834 (comment) describes. It runs every time this block of _get_cached_protocol_analysis()
runs.
You could avoid this by refactoring the _get_cached_protocol_analysis()
inner function to be a module-level private function, and then the decoration really would only run once at module parse time.
But my larger point, which I've broached before and which I still believe, is: Are these pains really worth it? Are simpler approaches, like entering context managers and accessing os.environ
"too frequently," really not good enough? I mean, we're measuring protocol analysis here. The latency of starting the background thread alone is probably going to drown out whatever microseconds we shaved by doing the decorator thing. I believe there are use cases where it could matter, but I can easily live without them, personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I'm in both camps. I think the default approach, and the structural approach to implementing the reusable analysis side, should be oriented towards zero-cost abstractions even if they're inconvenient. But I think that at the point of integration, i.e. right here, sometimes it's okay to say "this is not zero-cost but the cost is marginal". For instance, here it's not zero-cost - like @SyntaxColoring says, you're creating an inner function each call - but this is not a hotpath; it's the top level call that will last a long time, not something called many times in a loop. We would jump through many more hoops than this to keep a zero-cost integration in some core protocol engine function, for a counterexample. And we need to have those hoops available to jump through, which is why we designed the metrics system as a function transformer that can run at import time.
I think it is reasonable to set restart_required to true for the feature flag for the general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand. Leave the creation of the inner function as is because it is not something we call a whole lot, and even if we did it is a call that takes a while so the impact would be negligible.
If we were wrapping some logic that is called many times then the wrapping of the logic should be done in a way that is as performant as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, thank you! Looking forward to using it.
Overview
Adds packaging scripts and configuration to enable pushing and running performance metrics on a robot.
Currently running analysis and getting a cached analysis are the only things that are tracked on the robot
Closes https://opentrons.atlassian.net/browse/EXEC-497
Test Plan
Follow the steps in README to setup the env
Run robot analysis for 2-3 different protocols by going to the setup screen for the protocols
Repeat running analysis for each protocol
SSH into the robot and go to /data/performance_metrics_data/
There should be 2 files
Check their content
After you are finished run
make unset-performance-metrics-ff
Run a protocol and make sure performance metrics are no longer being recorded. Also, make sure you can still run the analysis and get a cached analysis
Verify analysis and cached analysis on the OT-2 dev server
Verify analysis and cached analysis on the Flex dev server
Verify analysis and cached analysis on OT-2
Verify analysis and cached analysis on Flex Dev Kit
Verify analysis and cached analysis on Flex @skowalski08
Changelog
performance_helpers.py
performance-metrics/Makefile
update-robot-version
to hack the robot into thinking it is on the same version as the appset-performance-metrics-ff
andunset-performance-metrics-ff
performance-metrics/setup.py
Remove shared-data as a dep for performance-metrics as it does not use it
robot-server/Pipfile
Remove performance-metrics as a dependency as robot_server is not using it directly
robot-server/robot_server/protocols/protocol_analyzer.py
Wrap
analyze
and label it asANALYZING_PROTOCOL
robot-server/robot_server/protocols/router.py
Wrap logic that gets cached_analysis and label it as
GETTING_CACHED_PROTOCOL_ANALYSIS
Review requests
/opt/opentrons-robot-server
?Risk assessment
Medium?
performance-metrics
is now wrapping production code. But the package is not being built and installed on robots, it has to be pushed by a dev. Furthermore, nothing will be run without setting the feature flag