Skip to content

Commit

Permalink
Add pull-through caching
Browse files Browse the repository at this point in the history
closes #507
  • Loading branch information
lubosmj committed Jan 17, 2024
1 parent 6d1aa47 commit cbaa073
Show file tree
Hide file tree
Showing 19 changed files with 1,263 additions and 143 deletions.
3 changes: 3 additions & 0 deletions CHANGES/507.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Added support for pull-through caching. Users can now configure a dedicated distribution and remote
linked to an external registry without the need to create and mirror repositories in advance. Pulp
downloads missing content automatically if requested and acts as a caching proxy.
3 changes: 2 additions & 1 deletion docs/tech-preview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ Tech previews

The following features are currently being released as part of a tech preview:

* Build an OCI image from a Containerfile
* Building an OCI image from a Containerfile.
* Support for hosting Flatpak content in OCI format.
* Pull-through caching (i.e., proxy cache) for upstream registries.
42 changes: 42 additions & 0 deletions docs/workflows/host.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,45 @@ Docker Output::
In general, the automatic conversion cannot be performed when the content is not available
in the storage. Therefore, it may be successful only if the content was previously synced
with the ``immediate`` policy.


Pull-Through Caching
--------------------

.. warning::
This feature is provided as a tech preview and could change in backwards incompatible
ways in the future.

The Pull-Through Caching feature offers an alternative way to host content by leveraging a **remote
registry** as the source of truth. This eliminates the need for in-advance repository
synchronization because Pulp acts as a **caching proxy** and stores images, after they have been
pulled by an end client, in a local repository.

Configuring the caching::

# initialize a pull-through remote (the concept of upstream-name is not applicable here)
REMOTE_HREF=$(http ${BASE_ADDR}/pulp/api/v3/remotes/container/pull-through/ name=docker-cache url=https://registry-1.docker.io | jq -r ".pulp_href")

# create a pull-through distribution linked to the initialized remote
http ${BASE_ADDR}/pulp/api/v3/distributions/container/pull-through/ remote=${REMOTE_HREF} name=docker-cache base_path=docker-cache

Pulling content::

podman pull localhost:24817/docker-cache/library/busybox

In the example above, the image "busybox" is pulled from *DockerHub* through the "docker-cache"
distribution, acting as a transparent caching layer.

By incorporating the Pull-Through Caching feature into standard workflows, users **do not need** to
pre-configure a new repository and sync it to facilitate the retrieval of the actual content. This
speeds up the whole process of shipping containers from its early management stages to distribution.
Similarly to on-demand syncing, the feature also **reduces external network dependencies**, and
ensures a more reliable container deployment system in production environments.

.. note::
During the pull-through operation, Pulp creates a local repository that maintains a single
version for pulled images. For instance, when pulling an image like "debian:10," a local
repository named "debian" with the tag "10" is created. Subsequent pulls, such as "debian:11,"
generate a new repository version that incorporates both the "10" and "11" tags, automatically
removing the previous version. Repositories and their content remain manageable through standard
Pulp API endpoints. The repositories are read-only and public by default.
17 changes: 12 additions & 5 deletions pulp_container/app/cache.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
from django.core.exceptions import ObjectDoesNotExist
from django.db.models import F, Value

from pulpcore.plugin.cache import CacheKeys, AsyncContentCache, SyncContentCache

from pulp_container.app.models import ContainerDistribution
from pulp_container.app.models import ContainerDistribution, ContainerPullThroughDistribution
from pulp_container.app.exceptions import RepositoryNotFound

ACCEPT_HEADER_KEY = "accept_header"
Expand Down Expand Up @@ -69,11 +70,17 @@ def find_base_path_cached(request, cached):
return path
else:
try:
distro = ContainerDistribution.objects.select_related(
"repository", "repository_version"
).get(base_path=path)
distro = ContainerDistribution.objects.get(base_path=path)
except ObjectDoesNotExist:
raise RepositoryNotFound(name=path)
distro = (
ContainerPullThroughDistribution.objects.annotate(path=Value(path))
.filter(path__startswith=F("base_path"))
.order_by("-base_path")
.first()
)
if not distro:
raise RepositoryNotFound(name=path)

return distro.base_path


Expand Down
10 changes: 6 additions & 4 deletions pulp_container/app/content.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@
registry = Registry()

app.add_routes(
[web.get(r"/pulp/container/{path:.+}/blobs/sha256:{digest:.+}", registry.get_by_digest)]
)
app.add_routes(
[web.get(r"/pulp/container/{path:.+}/manifests/sha256:{digest:.+}", registry.get_by_digest)]
[
web.get(
r"/pulp/container/{path:.+}/{content:(blobs|manifests)}/sha256:{digest:.+}",
registry.get_by_digest,
)
]
)
app.add_routes([web.get(r"/pulp/container/{path:.+}/manifests/{tag_name}", registry.get_tag)])
23 changes: 22 additions & 1 deletion pulp_container/app/downloaders.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import re

from aiohttp.client_exceptions import ClientResponseError
from collections import namedtuple
from logging import getLogger
from multidict import MultiDict
from urllib import parse
Expand All @@ -15,6 +16,11 @@

log = getLogger(__name__)

HeadResult = namedtuple(
"HeadResult",
["status_code", "path", "artifact_attributes", "url", "headers"],
)


class RegistryAuthHttpDownloader(HttpDownloader):
"""
Expand All @@ -31,6 +37,7 @@ def __init__(self, *args, **kwargs):
Initialize the downloader.
"""
self.remote = kwargs.pop("remote")

super().__init__(*args, **kwargs)

async def _run(self, handle_401=True, extra_data=None):
Expand Down Expand Up @@ -95,7 +102,12 @@ async def _run(self, handle_401=True, extra_data=None):
return await self._run(handle_401=False, extra_data=extra_data)
else:
raise
to_return = await self._handle_response(response)

if http_method == "head":
to_return = await self._handle_head_response(response)
else:
to_return = await self._handle_response(response)

await response.release()
self.response_headers = response.headers

Expand Down Expand Up @@ -173,6 +185,15 @@ def auth_header(token, basic_auth):
return {"Authorization": basic_auth}
return {}

async def _handle_head_response(self, response):
return HeadResult(
status_code=response.status,
path=None,
artifact_attributes=None,
url=self.url,
headers=response.headers,
)


class NoAuthSignatureDownloader(HttpDownloader):
"""A downloader class suited for signature downloads."""
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Generated by Django 4.2.8 on 2023-12-12 21:15

from django.db import migrations, models
import django.db.models.deletion
import pulpcore.app.models.access_policy


class Migration(migrations.Migration):

dependencies = [
('core', '0116_alter_remoteartifact_md5_alter_remoteartifact_sha1_and_more'),
('container', '0036_containerpushrepository_pending_blobs_manifests'),
]

operations = [
migrations.CreateModel(
name='ContainerPullThroughRemote',
fields=[
('remote_ptr', models.OneToOneField(auto_created=True, on_delete=django.db.models.deletion.CASCADE, parent_link=True, primary_key=True, serialize=False, to='core.remote')),
],
options={
'permissions': [('manage_roles_containerpullthroughremote', 'Can manage role assignments on pull-through container remote')],
'default_related_name': '%(app_label)s_%(model_name)s',
},
bases=('core.remote', pulpcore.app.models.access_policy.AutoAddObjPermsMixin),
),
migrations.AddField(
model_name='containerrepository',
name='pending_blobs',
field=models.ManyToManyField(to='container.blob'),
),
migrations.AddField(
model_name='containerrepository',
name='pending_manifests',
field=models.ManyToManyField(to='container.manifest'),
),
migrations.CreateModel(
name='ContainerPullThroughDistribution',
fields=[
('distribution_ptr', models.OneToOneField(auto_created=True, on_delete=django.db.models.deletion.CASCADE, parent_link=True, primary_key=True, serialize=False, to='core.distribution')),
('private', models.BooleanField(default=False, help_text='Restrict pull access to explicitly authorized users. Related distributions inherit this value. Defaults to unrestricted pull access.')),
('description', models.TextField(null=True)),
('namespace', models.ForeignKey(null=True, on_delete=django.db.models.deletion.CASCADE, related_name='container_pull_through_distributions', to='container.containernamespace')),
],
options={
'permissions': [('manage_roles_containerpullthroughdistribution', 'Can manage role assignments on pull-through cache distribution')],
'default_related_name': '%(app_label)s_%(model_name)s',
},
bases=('core.distribution', pulpcore.app.models.access_policy.AutoAddObjPermsMixin),
),
migrations.AddField(
model_name='containerdistribution',
name='pull_through_distribution',
field=models.ForeignKey(null=True, on_delete=django.db.models.deletion.CASCADE, related_name='distributions', to='container.containerpullthroughdistribution'),
),
]
70 changes: 70 additions & 0 deletions pulp_container/app/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,25 @@ class Meta:
]


class ContainerPullThroughRemote(Remote, AutoAddObjPermsMixin):
"""
A remote for pull-through caching, omitting the requirement for the upstream name.
This remote is used for instantiating new regular container remotes with the upstream name.
Configuring credentials and everything related to container workflows can be therefore done
from within a single instance of this remote.
"""

class Meta:
default_related_name = "%(app_label)s_%(model_name)s"
permissions = [
(
"manage_roles_containerpullthroughremote",
"Can manage role assignments on pull-through container remote",
),
]


class ManifestSigningService(SigningService):
"""
Signing service used for creating container signatures.
Expand Down Expand Up @@ -486,6 +505,8 @@ class ContainerRepository(
manifest_signing_service = models.ForeignKey(
ManifestSigningService, on_delete=models.SET_NULL, null=True
)
pending_blobs = models.ManyToManyField(Blob)
pending_manifests = models.ManyToManyField(Manifest)

class Meta:
default_related_name = "%(app_label)s_%(model_name)s"
Expand All @@ -509,6 +530,15 @@ def finalize_new_version(self, new_version):
"""
remove_duplicates(new_version)
validate_repo_version(new_version)
self.remove_pending_content(new_version)

def remove_pending_content(self, repository_version):
"""Remove pending blobs and manifests when committing the content to the repository."""
added_content = repository_version.added(
base_version=repository_version.base_version
).values_list("pk")
self.pending_blobs.remove(*Blob.objects.filter(pk__in=added_content))
self.pending_manifests.remove(*Manifest.objects.filter(pk__in=added_content))


class ContainerPushRepository(Repository, AutoAddObjPermsMixin):
Expand Down Expand Up @@ -565,6 +595,39 @@ def remove_pending_content(self, repository_version):
self.pending_manifests.remove(*Manifest.objects.filter(pk__in=added_content))


class ContainerPullThroughDistribution(Distribution, AutoAddObjPermsMixin):
"""
A distribution for pull-through caching, referencing normal distributions.
"""

TYPE = "pull-through"

namespace = models.ForeignKey(
ContainerNamespace,
on_delete=models.CASCADE,
related_name="container_pull_through_distributions",
null=True,
)
private = models.BooleanField(
default=False,
help_text=_(
"Restrict pull access to explicitly authorized users. "
"Related distributions inherit this value. "
"Defaults to unrestricted pull access."
),
)
description = models.TextField(null=True)

class Meta:
default_related_name = "%(app_label)s_%(model_name)s"
permissions = [
(
"manage_roles_containerpullthroughdistribution",
"Can manage role assignments on pull-through cache distribution",
),
]


class ContainerDistribution(Distribution, AutoAddObjPermsMixin):
"""
A container distribution defines how a repository version is distributed by Pulp's webserver.
Expand Down Expand Up @@ -595,6 +658,13 @@ class ContainerDistribution(Distribution, AutoAddObjPermsMixin):
)
description = models.TextField(null=True)

pull_through_distribution = models.ForeignKey(
ContainerPullThroughDistribution,
related_name="distributions",
on_delete=models.CASCADE,
null=True,
)

def get_repository_version(self):
"""
Returns the repository version that is supposed to be served by this ContainerDistribution.
Expand Down
Loading

0 comments on commit cbaa073

Please sign in to comment.