Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine translation crash: memory issue #93

Open
afarajian opened this issue Apr 11, 2016 · 7 comments
Open

Machine translation crash: memory issue #93

afarajian opened this issue Apr 11, 2016 · 7 comments

Comments

@afarajian
Copy link

Hi,

I am trying to run the machine translation example, but I face the memory allocation error:
ImportError: ('The following error happened while compiling the node, Elemwise{Composite{Switch(i0, i1, Switch(AND(LT((i2 + i3), i1), GT(i4, i1)), i5, minimum((i2 + i3), i6)))}}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{-1}, Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0, Elemwise{sub,no_inplace}.0, Elemwise{switch,no_inplace}.0), \n, /hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so: failed to map segment from shared object: Cannot allocate memory\n\nOriginal exception:\n\tImportError: The following error happened while compiling the node, Elemwise{Composite{Switch(i0, i1, Switch(AND(LT((i2 + i3), i1), GT(i4, i1)), i5, minimum((i2 + i3), i6)))}}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{-1}, Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0, Elemwise{sub,no_inplace}.0, Elemwise{switch,no_inplace}.0), \n, /hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so: failed to map segment from shared object: Cannot allocate memory', Elemwise{Composite{Switch(i0, i1, Switch(AND(LT((i2 + i3), i1), GT(i4, i1)), i5, minimum((i2 + i3), i6)))}}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{-1}, Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0, Elemwise{sub,no_inplace}.0, Elemwise{switch,no_inplace}.0), '\n', '/hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so: failed to map segment from shared object: Cannot allocate memory')

I tried with different settings, but no success. I also checked the maximum memory which was used in running the script and realized that it was less than 1GB (Max vmem = 830.188M).
Any idea about this problem and how it can be solved?

@abergeron
Copy link

Do you have a limit on the process? Because that is probably the only
thing that could cause a mmap to fail like that.

You can check your current limits with "ulimit -a"

2016-04-11 12:17 GMT-04:00 PersianNLPer [email protected]:

Hi,

I am trying to run the machine translation example, but I face the memory
allocation error:
ImportError: ('The following error happened while compiling the node,
Elemwise{Composite{Switch(i0, i1, Switch(AND(LT((i2 + i3), i1), GT(i4,
i1)), i5, minimum((i2 + i3), i6)))}}(Elemwise{le,no_inplace}.0,
TensorConstant{0}, TensorConstant{-1},
Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1,
i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1,
i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1,
Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 +
i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4,
maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8),
i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1,
i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1,
i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1,
Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 +
i2), i0)}(Composite{Switch(i0, i1, Switch(A ND(LT(i2 , i1), GT(i3, i1)),
i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9),
i8), i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0,
Elemwise{sub,no_inplace}.0, Elemwise{switch,no_inplace}.0), \n,
/hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so:
failed to map segment from shared object: Cannot allocate
memory\n\nOriginal exception:\n\tImportError: The following error happened
while compiling the node, Elemwise{Composite{Switch(i0, i1,
Switch(AND(LT((i2 + i3), i1), GT(i4, i1)), i5, minimum((i2 + i3),
i6)))}}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{-1},
Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1,
i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1,
i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1,
Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 +
i2), i 0)}(Comp osite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)),
i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9),
i8), i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1,
i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1,
i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1,
Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 +
i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4,
maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8),
i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0,
Elemwise{sub,no_inplace}.0, Elemwise{switch,no_inplace}.0), \n,
/hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so:
failed to map segment from shared object: Cannot allocate memory',
Elemwise{Composite{Switch(i0, i1, Switch(AND(LT((i2 + i3), i1), GT(i4,
i1)), i5, minimum(( i2 + i3) , i6)))}}(Elemwise{le,no_inplace}.0,
TensorConstant{0}, TensorConstant{-1},
Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1,
i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1,
i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1,
Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 +
i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4,
maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8),
i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1,
i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1,
i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1,
Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 +
i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4,
maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8),
i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0, Elemwise{s
ub,no_in place}.0, Elemwise{switch,no_inplace}.0), '\n',
'/hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so:
failed to map segment from shared object: Cannot allocate memory')

I tried with different settings, but no success. I also checked the
maximum memory which was used in running the script and realized that it
was less than 1GB (Max vmem = 830.188M).
Any idea about this problem and how it can be solved?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#93

@afarajian
Copy link
Author

I am running my experiments on our cluster, and to submit the job I need to define the amount of memory that my process would need. So, I have used 10GB, 20GB, and 30GB so far, and nothing has changed.
About the ulimit -a: the virtual memory size is exactly the same as the amount of memory that I asked for running the experiment. So, basically it is 10, 20, and 30GB in different settings. Here you can see the results of ulimit -a in one of the experiments:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 514573
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 514573
virtual memory (kbytes, -v) 20971520
file locks (-x) unlimited

@abergeron
Copy link

I can't really help anymore then. It works locally and there is no memory
leak so the problem comes from the cluster configuration somehow. I will
admit that I have no other ideas.

2016-04-11 12:29 GMT-04:00 PersianNLPer [email protected]:

I am running my experiments on our cluster, and to submit the job I need
to define the amount of memory that my process would need. So, I have used
10GB, 20GB, and 30GB so far, and nothing has changed.
About the ulimit -a: the virtual memory size is exactly the same as the
amount of memory that I asked for running the experiment. So, basically it
is 10, 20, and 30GB in different settings. Here you can see the results of
ulimit -a in one of the experiments:

core file size (blocks, -c) unlimited data seg size (kbytes, -d)
unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited
pending signals (-i) 514573 max locked memory (kbytes, -l) unlimited max
memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512
bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority
(-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited
max user processes (-u) 514573 virtual memory (kbytes, -v) 20971520 file
locks (-x) unlimited


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#93 (comment)

@afarajian
Copy link
Author

Thank you very much.
Actually, the problem was related to the incompatibility of some of the libraries. And is solved now after I removed all the libraries and installed updated versions.

But, now I have another issue which I think is related to the memory again.
Using vocabularies of size 80,000 and higher, the process crashes. I believe it is related to the fact that by increasing the vocabulary size the model size will increase as well, so, it is not able to keep all the model in the memory.
But, is there any way around to be able to work with larger vocabularies?

Thanks.

@nouiz
Copy link

nouiz commented Apr 13, 2016

Can you give the error message?

In Theano doc, there is a section about speed memory thread off.
Le 13 avr. 2016 05:35, "PersianNLPer" [email protected] a écrit :

Thank you very much.
Actually, the problem was related to the incompatibility of some of the
libraries. And is solved now after I removed all the libraries and
installed updated versions.

But, now I have another issue which I think is related to the memory again.
Using vocabularies of size 80,000 and higher, the process crashes. I
believe it is related to the fact that by increasing the vocabulary size
the model size will increase as well, so, it is not able to keep all the
model in the memory.
But, is there any way around to be able to work with larger vocabularies?

Thank you very much,


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#93 (comment)

@rizar
Copy link
Contributor

rizar commented Apr 13, 2016

Using large vocabularies in NMT is an open problem, there are some papers
on that, see e.g. http://arxiv.org/abs/1412.2007

On 13 April 2016 at 08:37, Frédéric Bastien [email protected]
wrote:

Can you give the error message?

In Theano doc, there is a section about speed memory thread off.
Le 13 avr. 2016 05:35, "PersianNLPer" [email protected] a écrit :

Thank you very much.
Actually, the problem was related to the incompatibility of some of the
libraries. And is solved now after I removed all the libraries and
installed updated versions.

But, now I have another issue which I think is related to the memory
again.
Using vocabularies of size 80,000 and higher, the process crashes. I
believe it is related to the fact that by increasing the vocabulary size
the model size will increase as well, so, it is not able to keep all the
model in the memory.
But, is there any way around to be able to work with larger vocabularies?

Thank you very much,


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
<
#93 (comment)


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#93 (comment)

@afarajian
Copy link
Author

@rizar: Thank you for the paper.
@nouiz: sorry for my late reply. actually I am now busy with something else, so for now I am running the experiments using the 30K vocabulary. But, I will try larger vocabularies next week and will send you the errors then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants