Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This library doesn't work for large embeddings #11

Open
ruanchaves opened this issue Dec 23, 2019 · 0 comments
Open

This library doesn't work for large embeddings #11

ruanchaves opened this issue Dec 23, 2019 · 0 comments

Comments

@ruanchaves
Copy link

Issue description

I tried to execute a slightly modified version of this script ( no significative changes were made ) for an embedding with a large vocabulary and 600 dimensions:

from nncompress import EmbeddingCompressor

# Load my embedding matrix
matrix = np.load("data/glove.6B.300d.npy")

# Initialize the compressor
compressor = EmbeddingCompressor(32, 16, "data/mymodel")

# Train the quantization model
compressor.train(matrix)

# Evaluate
distance = compressor.evaluate(matrix)
print("Mean euclidean distance:", distance)

# Export the codes and codebook
compressor.export(matrix, "data/mymodel")

But then, this is what I got:

Traceback (most recent call last):
  File "compress.py", line 82, in <module>
    pipe\
  File "compress.py", line 70, in train
    compressor.train(matrix)
  File "/home/user/summer/smallnilc/nncompress/embed_compress.py", line 159, in train
    word_ids_var, loss_op, train_op, maxp_op = self.build_training_graph(embed_matrix)
  File "/home/user/summer/smallnilc/nncompress/embed_compress.py", line 114, in build_training_graph
    input_matrix = tf.constant(embed_matrix, name="embed_matrix")
  File "/home/user/summer/smallnilc/small/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 180, in constant_v1
    allow_broadcast=False)
  File "/home/user/summer/smallnilc/small/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 284, in _constant_impl
    allow_broadcast=allow_broadcast))
  File "/home/user/summer/smallnilc/small/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 537, in make_tensor_proto
    "Cannot create a tensor proto whose content is larger than 2GB.")
ValueError: Cannot create a tensor proto whose content is larger than 2GB.

Tensorflow devs have answered issues similar to this one by saying that the only solution is to rewrite your code in a way that it doesn't break the hard limit of 2GB imposed by protobuf.

Steps to reproduce the issue

Simply try to compress an embedding above 300 dimensions ( either 600 or 1000 dimensions ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant