You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Two things which theano doesn't really do and which would be really really useful for sequential data and NLP applications, perhaps enough to make me take the jump :)
In theano, ragged arrays require workarounds with padding and masking, which aside from being quite ugly and making the code less intuitive, can also hurt performance unless you do a bunch of extra preprocessing to lump sequences of similar lengths together in minibatches.
Sparse vectors and matrices are also very useful and something that theano has at best second-class support for. For the common case of neural models with dense weight matrices, sparse by dense dot products are probably the most useful thing to have implemented with efficient sparse gradients. Common operations in NLP neural models can be seen as sparse-by-dense dot products, e.g. a lookup table (sparse one-hot vector by dense embedding matrix), or a "continuous bag of words" sum of word embeddings (sparse count vector by dense embedding matrix.). Noise-contrastive estimation (useful for large softmax output layers) also relies for its speed advantage on efficiently backpropagating a sparse error vector from the output layer.
The text was updated successfully, but these errors were encountered:
Two things which theano doesn't really do and which would be really really useful for sequential data and NLP applications, perhaps enough to make me take the jump :)
In theano, ragged arrays require workarounds with padding and masking, which aside from being quite ugly and making the code less intuitive, can also hurt performance unless you do a bunch of extra preprocessing to lump sequences of similar lengths together in minibatches.
Sparse vectors and matrices are also very useful and something that theano has at best second-class support for. For the common case of neural models with dense weight matrices, sparse by dense dot products are probably the most useful thing to have implemented with efficient sparse gradients. Common operations in NLP neural models can be seen as sparse-by-dense dot products, e.g. a lookup table (sparse one-hot vector by dense embedding matrix), or a "continuous bag of words" sum of word embeddings (sparse count vector by dense embedding matrix.). Noise-contrastive estimation (useful for large softmax output layers) also relies for its speed advantage on efficiently backpropagating a sparse error vector from the output layer.
The text was updated successfully, but these errors were encountered: