Thrice as much memory for AlexNet on Caltech256 in Julia than in Python. Why? #261

hesseltuinhof · 2017-06-23T13:54:54Z

I am having a severe problem with training AlexNet (see alexnet.jl) in Julia (0.5.2) on my GPU (12gb mem).

I am training on Caltech256 dataset (see main.jl)

Julia variant: I am out of memory when starting. See the following log:

julia> include("main.jl")
[15:36:03] src/io/iter_image_recordio_2.cc:135: ImageRecordIOParser2: ../../../data/caltech256-train.rec, use 4 threads for decoding..
[15:36:03] src/io/iter_image_recordio_2.cc:135: ImageRecordIOParser2: ../../../data/caltech256-val.rec, use 4 threads for decoding..
INFO: Start training on MXNet.mx.Context[GPU0]
INFO: Initializing parameters...
INFO: Creating KVStore...
[15:36:09] src/operator/././cudnn_algoreg-inl.h:65: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO: TempSpace: Total 1345 MB allocated on GPU0
INFO: Start training...
[15:36:12] /home/antholzer/mxnet/dmlc-core/include/dmlc/./logging.h:304: [15:36:12] src/storage/./pooled_storage_manager.h:84: cudaMalloc failed: out of memory

Now if I run the same Python variant (alexnet.py, main.py), I have no problems with memory. With a batch size of 128 I am running at 3gb memory and with batch size of 256 at around 4gb.

Note: At least I was able to train the Julia variant with a batch size of 16.

I wonder why the Julia variant blows up its memory... ❓ Anyone has an idea about this or experiences similar issues?

The text was updated successfully, but these errors were encountered:

pluskid · 2017-06-23T14:00:31Z

One potential reason that I could think of immediately is the difference between memory / resource handling in Julia and Python. Python use a ref-counter to aggressively release resources once the ref counting goes to zero, while Julia use a GC. The problem is that GC is potentially only run once in a while or when the memory goes low, but the GPU memory is invisible to Julia GC. So many of the NDArray created on GPU might not be released.

Currently I do not know a better way to handle external resources in Julia. Maybe you can try to explicitly call julia GC after each batch to see if it helps? (albeit it will probably slow down the training process a bit)

hesseltuinhof · 2017-06-28T09:48:53Z

Thanks for your proposal. I implemented it with the the following

mem_fun(x...) = gc()
mem_fix = mx.every_n_batch(mem_fun, 1)

Hope that there will be a better solution to handle this problem in the future. Actually this is kind of a game-killer for new people that want to try out Julia and MXNet...

pluskid · 2017-06-28T18:31:59Z

@hesseltuinhof Thanks! Glad to hear that it works. Unfortunately, this is a limitation of Julia. Python has reference counting which is great for managing not only memory but also other resources. But Julia rely on GC for memory management, which is not very good for managing other kind of resources. :(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thrice as much memory for AlexNet on Caltech256 in Julia than in Python. Why? #261

Thrice as much memory for AlexNet on Caltech256 in Julia than in Python. Why? #261

hesseltuinhof commented Jun 23, 2017

pluskid commented Jun 23, 2017

hesseltuinhof commented Jun 28, 2017 •

edited

Loading

pluskid commented Jun 28, 2017

Thrice as much memory for AlexNet on Caltech256 in Julia than in Python. Why? #261

Thrice as much memory for AlexNet on Caltech256 in Julia than in Python. Why? #261

Comments

hesseltuinhof commented Jun 23, 2017

pluskid commented Jun 23, 2017

hesseltuinhof commented Jun 28, 2017 • edited Loading

pluskid commented Jun 28, 2017

hesseltuinhof commented Jun 28, 2017 •

edited

Loading