-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] CI scopes #78403
Comments
the focus of this RFC is CI, which needs to be addressed, but any solution needs to deal with the fact that developers should start using twister before pushing to CI and be able to run twister without having to wait for hours and hog the system. So, this needs to go beyond CI and address developer experience with twister running on local machine and encourage people to use twister locally more often, thus, reducing the load on CI.... Putting CI aside and thinking about coverage levels that can be invoked on the command line with increasing coverage is a better way to approch this, CI will just benefit from that. Also, it is very important to highlight the goal of testing and running twister for the zephyr project and as part of the zephyr project. The goal is feature testing and testing zephyr itself, it should never be seen as platform validation or quality control. This part needs to be supported but it should be mostly about feature testing. If I can verify a feature or a test a change on 1 target, there is no need for me run the same test on the remaining target. I still need to be able to run the same tests on other simulation and hw targets if applicable, but the benefit of "building" hello world on 1000 targets without actually running anything is close to zero, especially if we validate it on 1 or few in CI through emulators. If a board is broken or does not build, this can be validation with test (we have the build_on_all to catch exaxtly this type of problem).
We do not need this and and it does not really fix the problem as described. We already have most tests performing as expected using the default platform coverage which can be improved further using platform_key for example. If you use integration_platforms to solve the issue, then integration platforms loses the value it has right now and many tests will have to fill it with most simulators we have, how is this going to help? How will a test like this look like?
if all tests/samples are going to end up like this:
we did not solve any problem. The main problem driving any change in this area, as we can see from the linked PRs above is that everyone will want their architecture or platform to be listed and you end up with a huge and unmanageable list of platforms. A kernel test like the above is interesting because of how minimal it is...
We see many problems in ci (on push) when minimal testing is done on core features, unfortinatly there are variations in architectures and level of support and something that passes CI in a PR might fail when you expand coverage on push. So, switching those (majority of tests) to use integration platforms might be a bad idea if not done on a test by test basis and with a lof consideration.
There is no intention to get rid of filter, however, filter does not make sense in some cases, especially if the filter ends up delivering the same platform over and over again (or in some cases the filter is on something enabled int the test/sample itself, which is funny), see an example:
here we have a filter and Filters are not perfect and will not guarantee that if a platform matches a filter it actually will build/run a sample or pass the test. We need some other mechanism and while you do not like it, The purpose of the linked PRs was to limit coverage to only those platforms that a sample was written for, and where those platforms are documented. I know
Two options:
yeah, that is exaxtly what zephyr/doc/develop/test/twister.rst Line 507 in 86af9c4
Using samples as tests is the other issue we have. We now have guidelines that try to limit the abuse and also address some of the issues above, this needs to be taken into consideration. |
Also generating build artifacts that aren't subsequently used by Twister, e.g. UF2 files for any board that has |
As pointed, "integration_platforms" might not be solution to all problems and doing it for each test would be a greater effort. However, I believe we agreed on the benefits of using it wherever it makes sense to directly tell twister how to limit the scope:
I am for the change. I would just be more caution with "pick only 1 (customizable) from the list". As from my understanding of the idea behind integration platforms is to provide "minimal scope to get proper coverage". IMO it is fine if there are more than 1 platforms in integration, but it must mean that each platform there increase the coverage in the meaningful way. E.g. it was found that candidates from multiple architectures are needed due to differences. This is clearly not the case in many existing tests and those will have to be cleaned. I am also not against “depends_on”/”supported” keywords. I just don’t like it in the way how it is currently used, i.e. without any control on the "project" level. E.g. how anyone can make sure their platforms are up to date with PRs like this: #77823 (comment). To make those keywords reliable we should e.g. publish in docs (and update automatically) a list of all existing depends_on keywords in tests an samples, probably with link to their usage. We’ve done this some time ago on our side downstream to figure out what keywords are used and if we have them. But with keywords being constantly added it's becoming hard to maintain. As for samples, I agree with your points, that we should have tests covering things that samples do. However, it is sometimes not the case. I personally find executing tests for samples (if available) as a nice “acceptance level” tests and we found many errors thanks to including samples into our on-target scope. However, I also agree that in the upstream scope there is no need for testing samples on so many platform. IMO integration_platforms could have a good usage there (instead of platform_allow) |
Introduction
This RFC is meant to serve as a place to discuss how our twister-based CI should work. What should be the scope (which platforms) for testing PRs/on push/scheduled (weekly) builds. It also proposes a solution to optimize platform scope based on integration_platforms and platform_keys.
Problem description
The CIs are running too long. This reaches extremes with weekly build trying to build everything on every platform lasting ~20h https://github.com/zephyrproject-rtos/zephyr/actions/workflows/twister.yaml?query=event%3Aschedule. It seems much of the time is spent on redundant processes, in particular building the same tests on multiple (even hundreds) of platforms, where majority builds wouldn’t cover any new area. Moreover, if a scope is broad (extreme in nightly with all platforms), a lot of time is spent by cmake based filtration (“filter” entry in yamle). Twister has to run partial cmake (with package helpers) for each platform in a scope (several seconds each check) in order to figure out should the board be filtered out or not.
Proposed change
weekly run:
on PR run:
on push:
Detailed RFC
Using “filter” keyword is taxing for twister performance, but it also allows for a quite robust filtering mechanism based on actual dt/kconfigs, in contrast to e.g. “depends_on” which is based on “ad hoc” strings placed by developers in boards descriptions (or not if forgotten or not being aware that such string can even have relation anywhere in the tree #77823 (comment)).
That’s why I believe we shouldn’t be getting rid of “filter” but focus on how to run twister in a more targeted way.
Testing with twister leaned more into "fishing with net" workflow. We drop a net and afterwards check which platforms were caught. Being more targeted can be achieved with “integration_platforms” and using integration mode, where twister is told by the test where to verify it. What’s more, skips are turned to errors on integration platforms, making sure such tests won’t slip into some crack (e.g. due to typos causing skips). Another great and indirect feature of integration_platforms is that it shifts responsibility of guessing the proper testing scope from twister (hence CI/twister maintainers) giving agency to maintainers of a given region.
As for weekly build, IMO the main question is “what is its purpose?”. The current approach was to try “all on all” (all tests and samples in tree on all platforms) to verify as much as possible. As we see, this is no longer feasible.
Anas had a valid observation, that from the zephyr project perspective, we should focus on “verification that zephyr is working”, and not on “if all platforms in the tree are working with all zephyr tests/samples”. Having this in mind, maybe we shouldn’t run “all on all” but only a minimal subset of platforms to provide the best coverage? Do we know how to achieve this?
My proposal is to find some combination of platforms properties (categories) that nicely fill the space of possible variability that could affect the result of a test. Then add to twister a global level “platform_key” functionality (pass through args in twister call affecting all tests in a session instead of individual keys defined in individual yamls for individual tests).
With current yamls nothing more than "architecture" can be obtained as a key which is rather rough grain. The goal would be to get some better granularity. E.g. to build on a single platform with a given soc used. Socs should be easy to get from board.yaml and it is already recognized that we want to use those (or some mutual variant for yamls used in twister and buildsystem) with twister #74142.
To limit blind spots we could make the selection of platform random (with known seed).
On PR scope should be more targeted. This is already obtained by using test_plan.py script dynamically selecting which tests to run based on changes in the PR and usage of integration mode. I believe there is still a lot to gain if integration_platforms are directly added to every single test and sample.
Dependencies
Add section explaining the CI workflows in the docs. Also add section on how contributors should use twister to get interesting results.
Codifying the rules for integration_platforms and adding validation checks. E.g. no more than 2 integration platforms per test, if possible use (sim)emulator etc..
Concerns and Unresolved Questions
Alternatives
Use platform_allow to limit the scope. As in https://github.com/zephyrproject-rtos/zephyr/pull/78024/files#diff-0b73ca0ba59ca0865fef3b2346e9911f6bf2610237eb9ec6619e50271cd18862R77-R78 and in #77839
I think using platform_allow to limit sample execution is a problematic way to go. It will move us back to the place we were 4 years ago: #25104 - people will not be able to run samples on their boards with twister unless those are in platform_allow (without using --force-platforms which would introduce further issues). Integration_platforms were introduced to resolve that issue #26382
Another option is including analysis of coverage reports and selecting the combination of tests/samples/boards that provides the greatest coverage in total. This solution would require some more effort in collecting the coverage data and its analysis.
The text was updated successfully, but these errors were encountered: