Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Multi-Container Capability #346

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mattwahl
Copy link

I've done a bit of work to enable the ability to run multiple instances of the container pointing to the same set of watch directories using shared volumes. The driver for this change is that I wanted to be able to run this image as a service within a Docker Swarm cluster (or k8s, I don't believe there is a major difference in philosophy there) and be able to scale up to do multiple conversions at the same time.

I'm using flock under the covers to lock a hidden file to act as a mutex file, which prevents multiple concurrent processes for interacting with the file at the same time. This allows the container to safely "claim" a file from the watch directory and others to wait until they can get a lock on the file to try and claim the next file in the loop or exit the processing loop with no work to be done.

I have tested the implementation with great success running:

  • multiple containers on the same host
  • one container each on two separate VMs, hosted on the same server (using glusterfs as a distributed file system)
  • one container each on two separate VMs, hosted on separate servers (using glusterfs as a distributed file system)

This is pretty early on, but I wanted to make sure this was aligned with the project owner before completing the work.

Remaining work, in my opinion:

  1. feature flag to disable the capability (or possible the inverse)
  2. a mechanism for tracking containers that no longer exist and releasing claimed files

…xecute against the same watch directories without double-processing. technically complete, but not functionally complete as there still needs to be logic to check the health of containers and allow for restarting of files that were claimed by a container that no longer exists
@mattwahl
Copy link
Author

@jlesage I wasn't sure how to add you to the pull request as a reviewer (first pull request on Github), so just tagging you here for my own peace of mind. No rush on review/feedback.

@jlesage
Copy link
Owner

jlesage commented May 25, 2024

Thank you for this. I have a few comments:

  • flock is not supported by all filesystems. This can be a problem.
  • Why the lock occurs on /config ? As per my understanding, /config is not the shared folder. /watch would be the one. Sharing the same /config folder across multiple containers can be a problem, because it contains per-instance things, like log, cache, etc (see log and xdg folders).

@mattwahl
Copy link
Author

Regarding flock, I will try to do some investigations. Do you know of any filesystems that do not support flock? I'll do some digging but if you knew of one that would help jumpstart my research. I'm a bit of a novice with some of the low-level details (I'm a Java guy 😄). I recall seeing something about using a directory as a locking mechanism, so I can explore that as time allows.

With regards to the lock-tracking file and the mutex file being in the /config directory; I was following the pattern for the successful and failed processing files. I'm a little unsure what you mean by "not the shared folder" as the use case here is running the container as a service a la kubernetes or docker swarm, so the mounts are shared across the replicas of the service - thus making all of the mounts shared.

I'm running this right now as a Docker Swarm service with two replicas on different VMs and have not had any issues with per-instance caching or logging. The processes both write to the same files, which may be a bit confusing as it stands right now where the logging framework is not prepending the machine information to the logs.. but I have not found it to be a problem as of yet. I have not attempted using the web/vnc interface though so that could be where the implementation runs into problems.

So in the context of multiple instances sharing all of the mounts, including the config mount, what would your recommendation be here? I'd be glad to continue to iterate as time permits (assuming that you find value in this feature to begin with).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants