Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda: limit gpu memory #280

Open
jbarth-ubhd opened this issue Oct 6, 2021 · 6 comments
Open

cuda: limit gpu memory #280

jbarth-ubhd opened this issue Oct 6, 2021 · 6 comments

Comments

@jbarth-ubhd
Copy link

jbarth-ubhd commented Oct 6, 2021

From tensorflow docs: »...to configure a virtual GPU device with tf.config.set_logical_device_configuration and set a hard limit on the total memory to allocate on the GPU.«

Found in eynollah.py: #gpu_options = tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=7.7, allow_growth=True)

Could the gpu be utilized better with gpu memory limit?

grafik

Y = memory usage in MB. Logarithmic scale. min = 10 MB. Sampled in 3 s intervals with nvidia-smi.

red = sbb-binarize, blue = eynollah-segment, green = calamari-recognize (no gpu)

@bertsky
Copy link
Collaborator

bertsky commented Oct 6, 2021

Note: per_process_gpu_memory_fraction is a flp ratio, so 7.7 means "oversubscribe memory of a single GPU by using slow unified memory". Here's the full documentation:

Fraction of the available GPU memory to allocate for each process.
1 means to allocate all of the GPU memory, 0.5 means the process
allocates up to ~50% of the available GPU memory.
GPU memory is pre-allocated unless the allow_growth option is enabled.
If greater than 1.0, uses CUDA unified memory to potentially oversubscribe
the amount of memory available on the GPU device by using host memory as a
swap space. Accessing memory not available on the device will be
significantly slower as that would require memory transfer between the host
and the device. Options to reduce the memory requirement should be
considered before enabling this option as this may come with a negative
performance impact. Oversubscription using the unified memory requires
Pascal class or newer GPUs and it is currently only supported on the Linux
operating system. See
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements
for the detailed requirements.

If you want to share GPUs, set it between 0 and 1. (But of course, you don't know how much memory is available on the target, so fixed ratios are problematic. I recommend making this configurable at runtime, perhaps via envvars.)

@bertsky
Copy link
Collaborator

bertsky commented Feb 23, 2024

Also: limiting CPU-side RSS would be useful. We could do this from the outside (on the docker run cmdline or in docker-compose.yml) – e.g. --memory 2G or --ulimit rss=2000000:4000000 (resp. deploy.resources.limits.memory: 2). Or via ulimit inside the image (profile.d etc).

(This came up in OCR-D/quiver-benchmarks#22.)

Unfortunately, there is no way to set the amount of shared GPU memory from the outside. So all our processors must be programmed in a cooperative way.

@mikegerber
Copy link
Contributor

Or via ulimit inside the image (profile.d etc).

/etc/profile.d might not work as expected; I believe it's only sourced for login shells, so I'd suggest to work with the Docker options if possible.

(I wouldn't be surprised if I'm wrong about /etc/profile.d, I just tend to test this stuff thoroughly because it's easy to mess up.)

@bertsky
Copy link
Collaborator

bertsky commented Feb 23, 2024

/etc/profile.d might not work as expected; I believe it's only sourced for login shells,

Indeed. For non-interactive bash, one can pass BASH_ENV to be sourced, but when invoked as sh even that won't happen. Besides, our Docker containers generally will not be run in a shell. But perhaps there's another way?

There may also be a middle ground in setting limits for all containers in the Docker daemon config.

@mikegerber
Copy link
Contributor

Besides, our Docker containers generally will not be run in a shell. But perhaps there's another way?
There may also be a middle ground in setting limits for all containers in the Docker daemon config.

I'm not up-to-date: How are the containers run currently? Is it recommended that users do the docker run themselves or is there some kind of tooling we could change?

ocrd_all's Dockerfile looks like there's nothing we could use.

https://docs.python.org/3.9/library/resource.html looks promising, maybe we could use it in core, but I'm not sure if I understood it properly.

@bertsky
Copy link
Collaborator

bertsky commented Feb 23, 2024

I'm not up-to-date: How are the containers run currently? Is it recommended that users do the docker run themselves or is there some kind of tooling we could change?

Yes, it's still basically whatever the user comes up with. (We wrap them in Docker Compose services of another docker stage, but other usres still run CLIs, some of them after converting to Singularity.)

But that's rapidly changing with the service containers which @joschrew is implementing. There we could simply add specs to the (generated) Docker Compose file, maybe after reading some .env variables.

https://docs.python.org/3.9/library/resource.html looks promising, maybe we could use it in core, but I'm not sure if I understood it properly.

Indeed, that's yet another route we could go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants