GitLab CI and the new KernelCI API #314

gctucker · 2024-02-12T23:32:53Z

gctucker
Feb 12, 2024
Maintainer

The topic of using modern DevOps tools for upstream Linux kernel development has been discussed almost for as long as these tools have existed, such as GitHub, GitLab and Gerrit. Probably due to personal taste or historical reasons, GitLab CI tends to be the most popular example so I picked it for the topic of this discussion. For completeness, let's not forget that Chromium OS kernel development uses Gerrit (downstream but public) and U-Boot uses Azure Pipelines on GitHub even though the main workflow is still based on emails.

Issues with adopting GitLab CI

While the overall mindset is evolving and some subsystems such as drm already have a long history of using GitLab CI, several issues keeping being brought up every time the topic is mentioned. These are well founded and pretty well understood, only the solutions aren't trivial. It seems like it's mostly a matter of time until a DevOps system can be adopted more broadly, once the known blocking issues have been addressed. Let's take a look at what I believe to be the main ones.

Short-lived tooling

Emails, patch files and tarballs are based on established standards and are pretty much going to stay for ever, or for at least as long as there will be a Linux kernel. The Git SCM was first created especially for the kernel development workflow so it's assumed it will also always be there or that it wouldn't be abandoned without a new tool to replace it. These things are universal, portable and not owned by a particular corporation or private group.

On the other hand, GitHub is now owned by Microsoft so it may go offline or change dramatically if it was acquired again - basically there's nothing to really guarantee it's always going to stay the same. Also it requires users to create an account with this single provider. Gerrit can be self-hosted but is a centralised system and maintained by Google, pretty much like GitLab. So these are similar to Patchwork in this sense, they may be used but none of them can be enforced yet as the main tool for upstream kernel development. This is why there are workflows that run a CI "on the side", to still rely only on emails as a common denominator for code reviews and plain Git repositories for applying changes but have some amount of automated testing going on in parallel. How could this gap be closed? We'll come back to it a bit later. Let's look at the other issues first.

Commit message quality

When you receive a patch over email, you actually have to read it as an email which means it needs to be properly written and longer series often come with a cover letter. It's especially important for large and complex projects such as the Linux kernel. On systems like GitLab, there are a vast number of small projects where few people actually take a close look at the Git history. So the trend is more to rely on the web UI with review comments. Still, the changes get merged and there is value in having a self-descriptive Git history: developers can follow what the changes were about without having to dig out a closed merge request etc.

It's of course possible to try and enforce a higher standard for commit messages on GitLab but there's no way to actually put review comments on the commit messages themselves. Also they can be easily hidden behind the merge request description etc. from the web UI. I've hit this problem many times on GitHub too while trying to apply the same quality standard as for kernel commits to make KernelCI more kernel developer-friendly but had to resort to doing things like quoting bits from the commit message in a comment. I don't think it's very technically challenging to have this fixed in GitLab or even GitHub, rather it's a different approach to doing code reviews using the web UI as primary tool rather than the text from the patches like with emails. So that's another hurdle that gives these tools bad press in the kernel community. What else?

Products don't run upstream

There isn't a single commercial product out there running a plain mainline kernel, or even an LTS upstream kernel. So enforcing a CI to pass to get changes merged is hard to justify. If a CI check fails and it needs to pass in order to release a software update in the field for actual users, then it's pretty clear that it's a problem as nobody wants users to hit that issue. But when people know that the changes are looking fine and really want to have it merged for the next kernel release, they can easily find it frustrating to hit a red CI check and can easily argue that it's not important or could be fixed in -rc2 or it's a flake or a problem with the test or...

That's because until now, it's always been up to particular people to judge and decide what goes into the kernel and when to declare the mainline kernel ready to be tagged with a new version. And we all know that it's not guaranteed to work in any way - I like to quote Torvalds' email when v5.17 was released:

So go test this, and we'll get 5.18 started tomorrow.

Linus

Here's a newly released kernel: go and test it! It's the polar opposite of what a CI loop is all about, i.e. testing and getting to a particular quality level before making the release in a mechanical way.

What can we do about them?

There are probably several other common persistent "problems" reported by kernel developers and maintainers but let's see how KernelCI with its new API can already address the ones described above. I'll put a comment for each idea to allow discussion in threads. I'm essentially basing this on my kernel testing continuum blog post which was based on a topic for Kernel Recipes which in turn was based on many prior discussions with the community over the years. Anyone can of course start additional ones, and I would also like to look into the practicalities of doing a proof-of-concept with GitLab CI and KernelCI / kci like @khilman did a while ago with the legacy system and the kci_build and kci_test command line tools.

gctucker · 2024-02-12T23:39:44Z

gctucker
Feb 12, 2024
Maintainer Author

In practice: artifacts

Let's start by getting one practical aspect out of the way: the new API doesn't specify how to store artifacts or "files" in general. It just requires a publicly-available URL to access them. So they may be hosted as GitLab artifacts although it's not completely trivial to get the URL for them, or with a third-party storage but then it means implementing the upload / retrieving in the GitLab CI pipeline implementation itself. So a choice has to be made here, in any case I would advocate for some helpers to facilitate this. For example, a kci gitlab command could be used or kci storage could have a GitLab storage type to facilitate getting permanent public URLs for GitLab artifacts - or to make it easier to use artifacts stored in a third-party storage in GitLab CI pipelines. The latter is pretty much already done with kci storage so I think this would be the default option given how things currently stand. Users can already choose to send their files using SSH or Azure Files, and adding S3-compatible storage is also on the roadmap.

0 replies

gctucker · 2024-02-12T23:48:33Z

gctucker
Feb 12, 2024
Maintainer Author

In practice: meta-data

The standard way of using the new kci tool is to have all the test results and meta-data sent to the API and have it stored in a central database. This means having a user account and storing the credentials somewhere, which is very well supported with GitLab CI pipelines. However, an alternative would be to store this data in local JSON files and pass them between pipeline stages as local files. The advantage of doing this is that the pipelines would be more self-contained: less likely to fail due to infrastructure errors while communicating with the API server (networking, downtime, expired token, API version change...).

So for a more robust and reliable pipeline, to avoid the situation where developers have perfectly valid changes that can't get merged because of some unplugged cable or something, local files is better. It's not entirely implemented with the new tooling though - having something like --api=local would be ideal I think. The legacy tools can rely solely on local JSON files so we know it's possible.

But then of course, sharing the results with the API server is very valuable too. It means the results can be compare pre-merge and post-merge, when the Git tree then gets tested later outside GitLab CI in linux-next or anywhere else. The results could be linked and if the revision in linux-next is failing but not the pre-merge CI pipeline then it means that it only started failing in combination with some other change - you get the idea.

How much data should be kept locally or in the central API database? That's up to each pipeline implementation to decide, in a way it's another manifestation of the "continuum" principle from a local, self-contained tool to a public, shared automated service.

0 replies

gctucker · 2024-02-12T23:54:44Z

gctucker
Feb 12, 2024
Maintainer Author

In practice: orchestration

Following the same principles, the GitLab CI pipeline stages can be either orchestrated by GitLab as in a regular pipeline or via the API events mechanism. When using local JSON files, I believe only the regular pipeline approach can really be used unless you start generating custom events via the API but that would seem a bit over the top. When sending all the intermediate results to the API, it's possible to get events and maybe have pipeline stages waiting for such events to be received. So all the pipeline steps could be started in parallel and GitLab wouldn't know which ones will depend on which ones to delegate all this to the API events. That would seem also a bit over the top, but there is some value in doing this when other external tools are also involved. For example, you could do kernel builds on a GitLab CI runner and then wait for some runtime results somewhere else which would get triggered following an event when the kernel build is ready, and then the pipeline would wait for the next events when the runtime test results are in. It would seem much cleaner this way than polling some particular lab infrastructure like Mesa CI does with known LAVA instances - which works very well but is not very flexible or scalable.

0 replies

tales-aparecida · 2024-02-13T14:11:13Z

tales-aparecida
Feb 13, 2024

Nice overview, Guillaume!

Regarding "Commit message quality", there are open issues about Ability to start discussions on commit message, Product discovery: increase commit message visibility in merge requests
and Commit comments for a merge request should be resolvable discussions

No recent activity on them, though.

1 reply

tales-aparecida Feb 13, 2024

Oh wait, there's another one with a lot of activity! https://gitlab.com/gitlab-org/gitlab/-/issues/19691

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitLab CI and the new KernelCI API #314

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

GitLab CI and the new KernelCI API #314

gctucker Feb 12, 2024 Maintainer

Issues with adopting GitLab CI

Short-lived tooling

Commit message quality

Products don't run upstream

What can we do about them?

Replies: 4 comments · 1 reply

gctucker Feb 12, 2024 Maintainer Author

In practice: artifacts

gctucker Feb 12, 2024 Maintainer Author

In practice: meta-data

gctucker Feb 12, 2024 Maintainer Author

In practice: orchestration

tales-aparecida Feb 13, 2024

tales-aparecida Feb 13, 2024

gctucker
Feb 12, 2024
Maintainer

Replies: 4 comments 1 reply

gctucker
Feb 12, 2024
Maintainer Author

gctucker
Feb 12, 2024
Maintainer Author

gctucker
Feb 12, 2024
Maintainer Author

tales-aparecida
Feb 13, 2024