Implementing Multi-Container Capability #346

mattwahl · 2024-05-15T19:35:08Z

I've done a bit of work to enable the ability to run multiple instances of the container pointing to the same set of watch directories using shared volumes. The driver for this change is that I wanted to be able to run this image as a service within a Docker Swarm cluster (or k8s, I don't believe there is a major difference in philosophy there) and be able to scale up to do multiple conversions at the same time.

I'm using flock under the covers to lock a hidden file to act as a mutex file, which prevents multiple concurrent processes for interacting with the file at the same time. This allows the container to safely "claim" a file from the watch directory and others to wait until they can get a lock on the file to try and claim the next file in the loop or exit the processing loop with no work to be done.

I have tested the implementation with great success running:

multiple containers on the same host
one container each on two separate VMs, hosted on the same server (using glusterfs as a distributed file system)
one container each on two separate VMs, hosted on separate servers (using glusterfs as a distributed file system)

This is pretty early on, but I wanted to make sure this was aligned with the project owner before completing the work.

Remaining work, in my opinion:

feature flag to disable the capability (or possible the inverse)
a mechanism for tracking containers that no longer exist and releasing claimed files

…xecute against the same watch directories without double-processing. technically complete, but not functionally complete as there still needs to be logic to check the health of containers and allow for restarting of files that were claimed by a container that no longer exists

mattwahl · 2024-05-16T18:33:55Z

@jlesage I wasn't sure how to add you to the pull request as a reviewer (first pull request on Github), so just tagging you here for my own peace of mind. No rush on review/feedback.

jlesage · 2024-05-25T16:55:12Z

Thank you for this. I have a few comments:

flock is not supported by all filesystems. This can be a problem.
Why the lock occurs on /config ? As per my understanding, /config is not the shared folder. /watch would be the one. Sharing the same /config folder across multiple containers can be a problem, because it contains per-instance things, like log, cache, etc (see log and xdg folders).

mattwahl · 2024-05-29T20:09:07Z

Regarding flock, I will try to do some investigations. Do you know of any filesystems that do not support flock? I'll do some digging but if you knew of one that would help jumpstart my research. I'm a bit of a novice with some of the low-level details (I'm a Java guy 😄). I recall seeing something about using a directory as a locking mechanism, so I can explore that as time allows.

With regards to the lock-tracking file and the mutex file being in the /config directory; I was following the pattern for the successful and failed processing files. I'm a little unsure what you mean by "not the shared folder" as the use case here is running the container as a service a la kubernetes or docker swarm, so the mounts are shared across the replicas of the service - thus making all of the mounts shared.

I'm running this right now as a Docker Swarm service with two replicas on different VMs and have not had any issues with per-instance caching or logging. The processes both write to the same files, which may be a bit confusing as it stands right now where the logging framework is not prepending the machine information to the logs.. but I have not found it to be a problem as of yet. I have not attempted using the web/vnc interface though so that could be where the implementation runs into problems.

So in the context of multiple instances sharing all of the mounts, including the config mount, what would your recommendation be here? I'd be glad to continue to iterate as time permits (assuming that you find value in this feature to begin with).

Merge branch 'jlesage:master' into master

647ab15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing Multi-Container Capability #346

Implementing Multi-Container Capability #346

mattwahl commented May 15, 2024

mattwahl commented May 16, 2024

jlesage commented May 25, 2024

mattwahl commented May 29, 2024

Implementing Multi-Container Capability #346

Are you sure you want to change the base?

Implementing Multi-Container Capability #346

Conversation

mattwahl commented May 15, 2024

mattwahl commented May 16, 2024

jlesage commented May 25, 2024

mattwahl commented May 29, 2024