Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSCMatrix OpMulMatrix uses O(numRows) memory, which is too much for some applications #767

Open
darkjh opened this issue Dec 15, 2019 · 5 comments

Comments

@darkjh
Copy link
Contributor

darkjh commented Dec 15, 2019

In the v1.0 release the OpMulMatirx impl for CSCMatrix has changed.
https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/operators/CSCMatrixOps.scala#L725

In the multiplication a dense array is allocated. For large sparse matrix (which csc matrix is designed for) this would not work ...

@dlwh
Copy link
Member

dlwh commented Dec 26, 2019

Hi @darkjh , so almost every sparse matrix multiply algorithm I know about uses O(numRows) (or equivalent) temporary memory for doing a matrix multiply. for example, here's scipy: https://github.com/scipy/scipy/blob/f2ec91c4908f9d67b5445fbfacce7f47518b35d1/scipy/sparse/sparsetools/csr.h#L533

And here's CSparse (which powers matlab sparse routines, IIRC):
https://people.sc.fsu.edu/~jburkardt/c_src/csparse/csparse.c (cs_multiply).

Can you say a bit more about your use case? I can look into doing something with blocks or something

@darkjh
Copy link
Contributor Author

darkjh commented Dec 27, 2019

@dlwh AFAIK we can go sparse only in one direction, for CSC is the row, not column. We use CSC for our ML algorithms and each CSC contains one partition of our dataset. Each column represents a feature vector which is very sparse as we use hashing to handle all the features.
Basically our CSC is P x N, with N being the partition size and P a very large number, say Int.maxValue.

@dlwh dlwh changed the title CSCMatrix OpMulMatrix not usable for sparse matrix CSCMatrix OpMulMatrix uses O(numRows) memory, which is too much for some applications Feb 14, 2021
@dlwh
Copy link
Member

dlwh commented Feb 14, 2021

sorry for being so slow on this. I did look into fixing this, but it got super tricky and I abandoned it

@darkjh
Copy link
Contributor Author

darkjh commented Feb 14, 2021

@dlwh Hi, np! Can you share some insights of the potential fix? Some links? Maybe I can give some brain power into this issue.

@dlwh
Copy link
Member

dlwh commented Feb 14, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants