-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/maestro ai #1906
Feature/maestro ai #1906
Conversation
7b35e60
to
6ba3d0a
Compare
65fd303
to
2b29c53
Compare
9d929f8
to
d23a25f
Compare
675db0a
to
b76d5e1
Compare
fc919e5
to
52ffba1
Compare
52ffba1
to
33f0c84
Compare
0aed452
to
ba69273
Compare
maestro-cli/src/main/java/maestro/cli/report/HtmlAITestSuiteReporter.kt
Outdated
Show resolved
Hide resolved
* maestro-ai-demo foo_bad_1.png | ||
* ``` | ||
* | ||
* ### Input format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be the expected output? would be helpful to document that as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added expected output.
Also, I think we should add a link where the fixtures can be downloaded from. Otherwise the barrier entry to this DemoApp is very high, because of the need to take the screenshots manually.
Currently the fixtures dataset I use is from https://github.com/mobile-dev-inc/copilot/pull/188.
I will upload that dataset to GCP and paste the link here. Do you see any problem with it? I think the customer apps that are in https://github.com/mobile-dev-inc/copilot/pull/188 are okay with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not public the GCS link for now, you can make it work for any storage link. Lets write a runbook internally for this in case some one from us wants to evaluate. This repository should ideally have access to general data (from open source apps) instead of customer apps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, though I don't agree.
I'd prefer to have this dataset public, internal runbooks tend to rot and no one uses them unless really needed. But if we made this testing dataset public, anybody could play around with LLM outptus and submit a PR that improves it.
If this requires removing apps from our customers, and using screenshots from more popular, well known apps instead (like Uber, Bolt), then I still think it's worth the effort. We should strive to make working on Maestro possible and easy to people outside of @mobile-dev-inc.
PS 1 The perfect situation would be that generating screenshots for each app with takeScreenshot
. But I think it's too much work to automate this, and what we have now is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said: I will take no action for now.
maestro-cli/src/main/java/maestro/cli/report/HtmlAITestSuiteReporter.kt
Outdated
Show resolved
Hide resolved
view = resultView, | ||
commands = commands, | ||
debugOutput = FlowDebugOutput(), | ||
// TODO(bartekpacia): make AI outputs work in continuous mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to create issue as well after release!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #1972 to track this
2584b86
to
674a717
Compare
1efa308
to
0bfac77
Compare
9455f47
to
8668cad
Compare
#1222 (comment)
Intro
This PR adds 2 new commands:
assertWithAI
andassertNoDefectsWithAI
.Documentation PR here. Read it for intro to these 2 commands.
Notes
at least for now, we don't consider AI output to be part of the test results. It's an additional stream of information, just like logs or screenshots. Reason: we're not sure of relationship of AI output with e.g. JUnit - how it should be serialized into that format.
Data structure of AI output
In the longer run, I'd like to integrate normal test results with AI test results (integrate meaning e.g. link to each other from HTML reports. Or, preferably, maybe merge their HTML reports)
To test
works in Maestro Cloudtodo in the near futureOff-top things noticed
Output directories
coroutines
Maestro uses a lot of
runBlocking { }
, which sometimes is justified, but often it's a bad practice.I think most of code in
maestro-client
andmaestro-orchestra
(Maestro
,Driver
) should be using suspending functions.