Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thrice as much memory for AlexNet on Caltech256 in Julia than in Python. Why? #261

Open
hesseltuinhof opened this issue Jun 23, 2017 · 3 comments

Comments

@hesseltuinhof
Copy link

I am having a severe problem with training AlexNet (see alexnet.jl) in Julia (0.5.2) on my GPU (12gb mem).

I am training on Caltech256 dataset (see main.jl)

Julia variant: I am out of memory when starting. See the following log:

julia> include("main.jl")
[15:36:03] src/io/iter_image_recordio_2.cc:135: ImageRecordIOParser2: ../../../data/caltech256-train.rec, use 4 threads for decoding..
[15:36:03] src/io/iter_image_recordio_2.cc:135: ImageRecordIOParser2: ../../../data/caltech256-val.rec, use 4 threads for decoding..
INFO: Start training on MXNet.mx.Context[GPU0]
INFO: Initializing parameters...
INFO: Creating KVStore...
[15:36:09] src/operator/././cudnn_algoreg-inl.h:65: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO: TempSpace: Total 1345 MB allocated on GPU0
INFO: Start training...
[15:36:12] /home/antholzer/mxnet/dmlc-core/include/dmlc/./logging.h:304: [15:36:12] src/storage/./pooled_storage_manager.h:84: cudaMalloc failed: out of memory

Now if I run the same Python variant (alexnet.py, main.py), I have no problems with memory. With a batch size of 128 I am running at 3gb memory and with batch size of 256 at around 4gb.

Note: At least I was able to train the Julia variant with a batch size of 16.

I wonder why the Julia variant blows up its memory... ❓ Anyone has an idea about this or experiences similar issues?

@pluskid
Copy link
Member

pluskid commented Jun 23, 2017

One potential reason that I could think of immediately is the difference between memory / resource handling in Julia and Python. Python use a ref-counter to aggressively release resources once the ref counting goes to zero, while Julia use a GC. The problem is that GC is potentially only run once in a while or when the memory goes low, but the GPU memory is invisible to Julia GC. So many of the NDArray created on GPU might not be released.

Currently I do not know a better way to handle external resources in Julia. Maybe you can try to explicitly call julia GC after each batch to see if it helps? (albeit it will probably slow down the training process a bit)

@hesseltuinhof
Copy link
Author

hesseltuinhof commented Jun 28, 2017

Thanks for your proposal. I implemented it with the the following

mem_fun(x...) = gc()
mem_fix = mx.every_n_batch(mem_fun, 1)

Hope that there will be a better solution to handle this problem in the future. Actually this is kind of a game-killer for new people that want to try out Julia and MXNet...

@pluskid
Copy link
Member

pluskid commented Jun 28, 2017

@hesseltuinhof Thanks! Glad to hear that it works. Unfortunately, this is a limitation of Julia. Python has reference counting which is great for managing not only memory but also other resources. But Julia rely on GC for memory management, which is not very good for managing other kind of resources. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants