You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This ensures correctness, it almost guarantees that we won't get errors due to different versions of tools being used across the same set of files. However, it comes at the cost of more cache misses!
For example, in Gecko we typically have a ton of tasks coming in for any given docker-image. Furthermore, pools tend to only run tasks with certain images, so this feature makes a lot of sense.
On the other hand, mozilla-vpn-client has only a single pool that runs a wide array of tasks with docker-images. Further, pushes come in infrequently so workers aren't very long lived. This means we almost never have cache hits.
Another point is the type of cache. Checkout caches tend to be more susceptible (especially with Mercurial) to this, but something like a dotfile cache might not be (maybe?). The point is different kinds of caches have different levels of risk for this.
I propose that instead of automatically adding the run-task and docker-image hashes to all cache names, we use them as values that can be interpolated into the cache name. I.e, a cache name could be checkouts-{run_task}-{docker_image} and these values would be included in the hash name. Or it could just be checkouts and then they wouldn't. This allows individual projects, and even individual caches within a project, to set up cache names however is best for that context.
There's definitely an open question around whether one or both of these hashes should be included by default. Also how hard we should try to preserve backwards compatibility.
The text was updated successfully, but these errors were encountered:
ahal
changed the title
Support customization of cache names
Make run-task and docker-image hashes optional in cache names
May 13, 2024
Currently Taskgraph adds both the hash of
run-task
and thedocker-image
tasks to cache names (if those things are being used):taskgraph/src/taskgraph/transforms/task.py
Line 519 in e556578
This ensures correctness, it almost guarantees that we won't get errors due to different versions of tools being used across the same set of files. However, it comes at the cost of more cache misses!
For example, in Gecko we typically have a ton of tasks coming in for any given docker-image. Furthermore, pools tend to only run tasks with certain images, so this feature makes a lot of sense.
On the other hand,
mozilla-vpn-client
has only a single pool that runs a wide array of tasks with docker-images. Further, pushes come in infrequently so workers aren't very long lived. This means we almost never have cache hits.Another point is the type of cache. Checkout caches tend to be more susceptible (especially with Mercurial) to this, but something like a dotfile cache might not be (maybe?). The point is different kinds of caches have different levels of risk for this.
I propose that instead of automatically adding the
run-task
anddocker-image
hashes to all cache names, we use them as values that can be interpolated into the cache name. I.e, a cache name could becheckouts-{run_task}-{docker_image}
and these values would be included in the hash name. Or it could just becheckouts
and then they wouldn't. This allows individual projects, and even individual caches within a project, to set up cache names however is best for that context.There's definitely an open question around whether one or both of these hashes should be included by default. Also how hard we should try to preserve backwards compatibility.
The text was updated successfully, but these errors were encountered: