Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/maestro ai #1906

Merged
merged 53 commits into from
Aug 29, 2024
Merged

Feature/maestro ai #1906

merged 53 commits into from
Aug 29, 2024

Conversation

bartekpacia
Copy link
Contributor

@bartekpacia bartekpacia commented Aug 12, 2024

#1222 (comment)

Intro

This PR adds 2 new commands: assertWithAI and assertNoDefectsWithAI.

Documentation PR here. Read it for intro to these 2 commands.

Notes

  • at least for now, we don't consider AI output to be part of the test results. It's an additional stream of information, just like logs or screenshots. Reason: we're not sure of relationship of AI output with e.g. JUnit - how it should be serialized into that format.

    Data structure of AI output
    flow 1 \
       assertVisualAI 1
         screenshot
           - defect 1
           - defect 2
           - defect 3
    flow 2 \
       assertVisualAI 1
         screenshot
           - defect 1
           - defect 2
           - defect 3
       assertVisualAI 2
         screenshot
           - defect 1
           - defect 2
           - defect 3
    
    
  • In the longer run, I'd like to integrate normal test results with AI test results (integrate meaning e.g. link to each other from HTML reports. Or, preferably, maybe merge their HTML reports)

To test

  • works in Maestro Studio
  • works in Maestro Cloud todo in the near future
  • works in continuous mode

Off-top things noticed

Output directories
  • Maestro test output + AI output should be placed in the root directory of the project. This will allow to use relative paths in e.g. test outputs. Benefit: possible to move files out of CI without breaking paths.
  • Maestro debug output can stay where it is right now.
coroutines

Maestro uses a lot of runBlocking { }, which sometimes is justified, but often it's a bad practice.

I think most of code in maestro-client and maestro-orchestra (Maestro, Driver) should be using suspending functions.

@bartekpacia bartekpacia force-pushed the feature/maestro_ai branch 4 times, most recently from 7b35e60 to 6ba3d0a Compare August 13, 2024 12:41
@bartekpacia bartekpacia force-pushed the feature/maestro_ai branch 2 times, most recently from fc919e5 to 52ffba1 Compare August 19, 2024 16:06
maestro-ai/README.md Show resolved Hide resolved
maestro-ai/README.md Outdated Show resolved Hide resolved
maestro-ai/src/main/java/maestro/ai/AI.kt Show resolved Hide resolved
maestro-ai/src/main/java/maestro/ai/AI.kt Outdated Show resolved Hide resolved
* maestro-ai-demo foo_bad_1.png
* ```
*
* ### Input format
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be the expected output? would be helpful to document that as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added expected output.

Also, I think we should add a link where the fixtures can be downloaded from. Otherwise the barrier entry to this DemoApp is very high, because of the need to take the screenshots manually.

Currently the fixtures dataset I use is from https://github.com/mobile-dev-inc/copilot/pull/188.

I will upload that dataset to GCP and paste the link here. Do you see any problem with it? I think the customer apps that are in https://github.com/mobile-dev-inc/copilot/pull/188 are okay with that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not public the GCS link for now, you can make it work for any storage link. Lets write a runbook internally for this in case some one from us wants to evaluate. This repository should ideally have access to general data (from open source apps) instead of customer apps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, though I don't agree.

I'd prefer to have this dataset public, internal runbooks tend to rot and no one uses them unless really needed. But if we made this testing dataset public, anybody could play around with LLM outptus and submit a PR that improves it.

If this requires removing apps from our customers, and using screenshots from more popular, well known apps instead (like Uber, Bolt), then I still think it's worth the effort. We should strive to make working on Maestro possible and easy to people outside of @mobile-dev-inc.

PS 1 The perfect situation would be that generating screenshots for each app with takeScreenshot. But I think it's too much work to automate this, and what we have now is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said: I will take no action for now.

maestro-cli/src/main/java/maestro/cli/App.kt Show resolved Hide resolved
view = resultView,
commands = commands,
debugOutput = FlowDebugOutput(),
// TODO(bartekpacia): make AI outputs work in continuous mode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to create issue as well after release!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #1972 to track this

@bartekpacia bartekpacia merged commit 582bdcc into main Aug 29, 2024
3 checks passed
@bartekpacia bartekpacia deleted the feature/maestro_ai branch August 29, 2024 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants