Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of COO / CSR sparse format #189

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

certik
Copy link
Member

@certik certik commented May 16, 2020

The key of this is the API, as shown with examples in test_sparse.f90.
In particular, I am proposing the following functions / API:

Conversion Dense <-> COO:

  • dense2coo
  • coo2dense

Conversion COO -> CSR:

  • coo2csr
  • csr_sort_indices
  • csr_sum_duplicates
  • coo2csr_canonical (calls the previous three functions)
  • csr_has_canonical_format

CSR functionality:

  • csr_matvec
  • csr_getvalue

Dense functionality:

  • getnnz

In these algorithms, it seems it is possible to just have one (best)
implementation with one exception: sorting of indices (as implemented by
csr_sort_indices), where different algorithms give different
performance and it depends on the actual matrix. As such, it might make
sense to provide a default in csr_sort_indices that is called by
coo2csr_canonical that is as fast as we can make it for most cases (currently it
uses quicksort, but there are other faster algoritms for our use case
that we should switch over) and then provide other implementations such
as csr_sort_indices_mergesort, csr_sort_indices_timsort, etc. for
users so that they can choose the algorithm for sorting indices, or even
provide their own.

The file stdlib_experimental_sparse.f90 provides an example
implementation of these algorithms that was inspired by SciPy. We can
also discuss the details of that, but the key that I would like to
discuss is the API itself.

See #38.

The key of this is the API, as shown with examples in test_sparse.f90.
In particular, I am proposing the following functions / API:

Higher level API:

Conversion Dense <-> COO:

* dense2coo
* coo2dense

Conversion COO -> CSR:

* coo2csr

CSR functionality:

* csr_matvec
* csr_getvalue

Dense functionality:

* getnnz

Lower level API:

* csr_has_canonical_format
* csr_sum_duplicates
* csr_sort_indices

In these algorithms, it seems it is possible to just have one (best)
implementation with one exception: sorting of indices (as implemented by
`csr_sort_indices`), where different algorithms give different
performance and it depends on the actual matrix. As such, it might make
sense to provide a default in `csr_sort_indices` that is called by
`coo2csr` that is as fast as we can make it for most cases (currently it
uses quicksort, but there are other faster algoritms for our use case
that we should switch over) and then provide other implementations such
as `csr_sort_indices_mergesort`, `csr_sort_indices_timsort`, etc. for
users so that they can choose the algorithm for sorting indices, or even
provide their own.

The file stdlib_experimental_sparse.f90 provides an example
implementation of these algorithms that was inspired by SciPy. We can
also discuss the details of that, but the key that I would like to
discuss is the API itself.
@certik certik requested review from milancurcic and jvdp1 May 16, 2020 15:31
@certik certik mentioned this pull request May 16, 2020
@certik
Copy link
Member Author

certik commented May 16, 2020

@sfilippone I would really be interested in your feedback on the API, based on your work on psblas3. This is a serial API, which I think we should have, in addition to a parallel API that we'll tackle later.

Copy link

@zerothi zerothi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I have a somewhat related project https://github.com/siesta-project/buds
which also has CSR matrices and some quite nice features I think.

I would be very happy to move this into stdlib in one way or the other.

Some notes:

  • I think such a complex data structure should be hidden in a type (class) to allow more stuff (discussions in link)
  • Some times it may be beneficial if the data contained isn't necessarily a single dimension (consider a 3D matrix with 2D sparsity and 1D dense).
  • Many external interfaces (in C) require 0-based indexing, and hence it would be nice with a retrieval method of the type to a 0-based index (basically only doing this on the index arrays and returning with a pointer to the actual data array)
  • My implementation allows very efficient addition of elements by having an additional counter. This allows one to have empty elements in each row and thus only re-allocate when one tries to add more terms than this. If some forms of the csr matrix becomes an issue it may be beneficial to have a 2nd format that has a list (each row) of lists (with data). In this way one need only reallocate the row that needs to be bigger.

I probably have some more comments if you want?

real(dp), intent(in) :: Ax(:), x(:)
real(dp) :: y(size(Ap)-1)
integer :: i
!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I think omp should be orphaned. In that way it is the user who controls the parallelization, and not the stdlib.

Copy link
Member Author

@certik certik Jun 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment. Let's discuss openmp in #213.

I forgot that I used openmp here, I am completely fine to remove it, that is an orthogonal issue to sparse matrices. If we decide to use openmp in stdlib, we can keep it here, if we decide not to, we can remove it. And I am happy to remove it to merge this PR, and we can always add it back in. Let's discuss more in #213.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. :)

end do
end subroutine

integer function getnnz(B) result(nnz)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, perhaps these methods should allow integer(8) to allow very large matrices.

Perhaps this should be get_nnz?

!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i)
!$omp do
do i = 1, size(Ap)-1
y(i) = dot_product(Ax(Ap(i):Ap(i+1)-1), x(Aj(Ap(i):Ap(i+1)-1)))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create a temporary which could potentially be very large. I would try to avoid this since the temporary still needs to traverse the copied elements.

end function


real(dp) function csr_getvalue(Ap, Aj, Ax, i, j) result(r)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

csr_get_value?

I have never really liked get when intent is obvious, I may be wrong.
But csr_value is just as clear to me. ;)

@certik
Copy link
Member Author

certik commented May 16, 2020 via email

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different API look good to me.

My major concern is about the declaration of the pointer array of CSR. This array can easily contain values larger than integer(int32). Therefore, I wonder if it would not be advisable to declare it as integer(int64).

end subroutine

integer function getnnz(B) result(nnz)
real(dp), intent(in) :: B(:, :)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for dense matrices.
Since we are in a module for sparse matrices, should it return nnz only for sparse matrices?

integer :: n
B = 0
do n = 1, size(Ai)
B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that all elements in Ai and Aj are filled, which is not always the case because COO is often used as a temporary format.

integer, intent(in) :: Ai(:), Aj(:)
real(dp), intent(in) :: Ax(:)
real(dp), intent(out) :: B(:, :)
integer :: n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to always use integer(int64) when going through a sparse matrix.

! Duplicate entries are carried over to the CSR representation.
integer, intent(in) :: Ai(:), Aj(:)
real(dp), intent(in) :: Ax(:)
integer, intent(out) :: Bp(:), Bj(:)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to declare Bp(:) as integer(int64) to avoid any troubles with large sparse matrices.
However, we may want to be compatible to e.g., psblas3 (I didn't check what is the default declaration in psblas3).

verbose_ = .false.
if (present(verbose)) verbose_ = verbose
allocate(Bp(maxval(Ai)+1))
if (verbose_) print *, "coo2csr"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optval can be used here ;)

call csr_sum_duplicates(Bp, Bj_, Bx_)
if (verbose_) print *, "done"
nnz = Bp(size(Bp))-1
allocate(Bj(nnz), Bx(nnz))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed with Fortran2003 and>, right?


subroutine csr_sum_duplicates(Ap, Aj, Ax)
! Sum together duplicate column entries in each row of CSR matrix A
! The column indicies within each row must be in sorted order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! The column indicies within each row must be in sorted order.
! The column indices within each row must be in sorted order.

end do
end subroutine

function csr_matvec(Ap, Aj, Ax, x) result(y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to follow the API of Sparse BLAS.

r = 0
end function

pure elemental subroutine swap_int(x,y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this subroutine should be moved to another module of stdlib.

call check(all(abs(Bx - [5, 7, 1, 3]) < 1e-12_dp))
call check(csr_has_canonical_format(Bp, Bj))

call check(abs(csr_getvalue(Bp, Bj, Bx, 1, 1) - 5) < tiny(1._dp))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By experience, tiny(1._dp) can be too severe.

@sfilippone
Copy link

sfilippone commented May 17, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented May 17, 2020

Re: int32 vs int64. You'll need both. Let me elaborate. The most common application of sparse matrices is in the inner steps of some PDE solver, dealing with a discretization on a mesh. Such problems can indeed reach sizes that need indices with int64; however in those cases they are tackled through parallel solution strategies mostly based on MPI. The parallelization strategy is to assign a subdomain to each MPI process; each subdomain is most likely of a size that fits comfortably in an int32, provided that you apply a local numbering scheme, as well as a global numbering scheme. And you occasionally need to convert the indices of the local portion into global numbering for some processing, and back; examples include data exchanges among processes within a transpose operation. The solution that I have now in the development branch of PSBLAS is the following: there are 4 integer kinds: int32==psb_mpk_ <= psb_ipk_ <= psb_lpk_ <= psb_epk_== int64 IPK is the integer kind used for *LOCAL indices, which are the ones for the local portion of the matrices involved in numerical operators such as Matrix-Vector product, LPK is the integer kind used for GLOBAL indices, which are used in pre- and post-processing. IPK and LPK can be specified at configure time within the constraints abov, with the default being IPK=int32 and LPK=int64. Salvatore

Thank you Salvatore for your input. I should have a closer look to psblas.

For stdlib: would it not be of interest to integrate psblas3 (if allowed) in stdlib directly? Or through fpm? In comparison to the other implementations, psblas3 is well tested, is associated to peer-reviewed publications and is parallelized (also with Coarray Fortran?) if we want to support parallelization in stdlib. If something is missing, we could identify it and maybe collaborate with psblas to implement the missing parts there?
This is just a question to be sure that we don't start into something that is similar to reinventing the wheel.

@sfilippone
Copy link

sfilippone commented May 17, 2020 via email

@certik
Copy link
Member Author

certik commented May 17, 2020

Finally got to my computer. Thanks everybody for the feedback, I'll first address the high level feedback: do we want this at all, or should stdlib just use psblas3?

My own view is that stdlib should not be competing with or reimplementing well established libraries, whether LAPACK, psblas3, or others.

Just linking against psblas3 but not providing any additional functionality or API I think is also not worth it. As the Fortran Package Manager becomes more successful, it will be very easy to depend on any library such as psblas3 in users projects, and we should be encouraging people to do exactly that. It will be good for users, good for developers of such libraries and the whole ecosystem.

Where I see the role of stdlib is to standardize API and provide a reference implementation to functionalities that are repeated from project to project with slightly different API each. There are many implementations of CSR and other functionality, see #38 for a partial list, and there are many codes that use this in some form. We should see if there is a way to figure out an API that each of these codes could use.

The other aspect where I see stdlib as contributing is when people just want to quickly use some functionality in their research / new projects, or later on interactively in the Jupyter notebook for example, having a nice and well documented API that is very easy to use for new users would be beneficial. Later on, for big specialized production codes, one can always use specialized libraries. As an example, most electronic structure codes have their own custom specialized eigensolver. But I think it is still worth having a nice API to dense eigensolver via Lapack, and a sparse eigensolver on top of this sparse functionality, so that people who just want to prototype some algorithm in Fortran can do so as easily as with SciPy, Matlab or Julia.

In order to move the conversation forward, we have to have both non-OO low level api as well as a high level OO API. The reason is that I know many people (myself included) who do not want to use the OO API, just simple subroutines that do the job. But I also know a lot of people who would prefer the OO API. By providing both, we can capture most of the Fortran community and make stdlib useful for almost everybody.

@sfilippone at the very least I can see that we can use your sorting algorithms. I am hoping there is more that we can figure out how to reuse.

Also, I just want to start with a serial implementation, we can overload the subroutines to work with both int32 and int64 etc. I am really hoping we can figure out an API that we would all agree would make sense to have in stdlib.

@jvdp1
Copy link
Member

jvdp1 commented May 17, 2020

But I think it is still worth having a nice API to dense eigensolver via Lapack, and a sparse eigensolver on top of this sparse functionality, so that people who just want to prototype some algorithm in Fortran can do so as easily as with SciPy, Matlab or Julia.

@certik I totally agree with your comments.However, I wonder if this could be achieved by interfacing with the serial version of psblas, even partially. If it is not possible (e.g., because of no non-OO low level API), then I am fine with this, and I will help as good as I can.

@milancurcic
Copy link
Member

I don't think that stdlib should just gulp up existing libraries and expose them under different name.

First, decide if sparse matrix algebra is in scope or not. I'm not the target user (although I use SMM through ESMF under the hood daily), but if we agreed that a SciPy-like functionality is in scope, then this is too.

Second, if you do add it to stdlib, I think that it should provide added value over any existing library. In this case, I see that the added value could be ease of installation, and ease of use (simpler API).

This resonated with me personally the most:

The other aspect where I see stdlib as contributing is when people just want to quickly use some functionality in their research / new projects, or later on interactively in the Jupyter notebook for example, having a nice and well documented API that is very easy to use for new users would be beneficial. Later on, for big specialized production codes, one can always use specialized libraries.

@milancurcic
Copy link
Member

Just to be clear, I personally think that SciPy-like scope is included in the scope of Fortran stdlib.

Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can go forward, pending any unresolved requests for changes from other reviewers.

@awvwgk awvwgk added the waiting for OP This patch requires action from the OP label Apr 17, 2021
@awvwgk
Copy link
Member

awvwgk commented May 31, 2021

What is the current status of this patch? We have a couple of merge conflicts in the build files, which need a merge/rebase against the default branch. Also, there are a couple of unaddressed review comments. @certik, is there anything we can do to help you move this pull request forward?

@milancurcic
Copy link
Member

@certik would you like to continue working on this or should we close it for the time being?

@certik
Copy link
Member Author

certik commented Aug 26, 2021

@milancurcic are you against this approach to sparse arrays?

If not, then let's keep it open. While I don't have time to finish this myself right now, others should help. If we close it, then it will be harder for people to know about it.

@milancurcic
Copy link
Member

We can keep it open. What I'm asking is whether you intend to continue working on this eventually, when you have more time. I ask this because it wasn't clear to me whether you thought this PR can go forward with the changes requested.

In essence, is this PR stale due to lack of time or lack of consensus?

What should be tackled next if someone else were to pick it up?

Would you be able to review the changes if we found a person to take over this PR?

@certik
Copy link
Member Author

certik commented Aug 27, 2021

I reread all comments. I don't see any major disagreements. What has to be done:

  • Provide int32 and int64 versions or just do int64 for now and add int32 later; possibly generate this using fypp
  • Address every comment
  • keep this low level non-OO interface
  • benchmark against scipy (and other libraries). If I remember well, the bottleneck was the sorting algorithm, but overall the performance should be decent. But it would be good to have numbers, to ensure the API allows solid performance and to see how much more (if at all) we have to improve.
  • Later we can discuss how to design an OO interface built on top.

I don't have the time right now, but I want to finish this, or help others finish it. This will be a nice selling point of stdlib to have these basic sparse routines in, nicely documented and very performing.

@awvwgk awvwgk added topic: mathematics linear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ... wontfix This will not be worked on and removed waiting for OP This patch requires action from the OP labels Sep 18, 2021
@awvwgk
Copy link
Member

awvwgk commented Sep 18, 2021

I marked this patch as wontfix since you are not intending to continue working on it. But the implementation here is of course free to use for other contributors to continue working on this project.

@certik
Copy link
Member Author

certik commented Sep 18, 2021

I wrote

I don't have the time right now, but I want to finish this, or help others finish it.

So I disagree with the statement:

since you are not intending to continue working on it.

Furthermore, I think marking things as "wontfix" is not encouraging to others (or me!) to pick it up and finish it. For this reason, and given that my stated intention is to finish this, I have removed the "wontfix" label.

We need to help each other finish these things and get them into stdlib to move stdlib forward.

You don't want such COO functionality in there?

If you do, then let's plan how to do it.

@certik certik removed the wontfix This will not be worked on label Sep 18, 2021
@awvwgk
Copy link
Member

awvwgk commented Sep 18, 2021

Sorry, that was not what I meant. This patch seemed stale and open for others to pickup on, at least that was the situation I usually applied this label for my own patches.

@awvwgk awvwgk added the help wanted Extra attention is needed label Sep 18, 2021
@milancurcic
Copy link
Member

FWIW, the "wontfix" label sounds demotivating to me as well. Like the whole project is yelling at me "we don't want this". I don't mind a PR staying open for a long time, if that's what it takes to find people to see it through.

@awvwgk
Copy link
Member

awvwgk commented Sep 18, 2021

Be assured that I want to move project forward. Frankly, I don't think this patch can be moved forward in its current form, even after resolving the merge conflicts.

I have been looking into this patch a couple of times now, and stopped again because it has significantly diverged from the latest default branch, which makes it quite involved to work with the implementation because a lot of changes in the general structure and build files are just missing.

Instead I would suggest to salvage the implementation and start fresh from the latest default branch, addressing the feedback provided already. We have done so in the past with other patches like the stringlist and were quite successful with this approach.

@certik
Copy link
Member Author

certik commented Sep 20, 2021

@awvwgk awesome. I know that's not what you meant and that you want to move the project forward. But that is how I perceive it and I think many others too. To move forward, I think the label that you just used "help wanted" is perfect. Let's keep this PR open until a better one is up.

@vickysharma0812
Copy link

@certik

In the following

if (abs(B(i, j)) < tiny(1._dp)) cycle

instead of using tiny(1.0)

please declare a parameter:

real(dp), parameter ::  zero=tiny(1.0_dp)
...

if (abs(B(i, j)) < zero )  cycle

...

@vickysharma0812
Copy link

@certik

In the following subroutine

subroutine coo2dense(Ai, Aj, Ax, B)
integer, intent(in) :: Ai(:), Aj(:)
real(dp), intent(in) :: Ax(:)
real(dp), intent(out) :: B(:, :)
integer :: n
B = 0
do n = 1, size(Ai)
B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n)
end do
end subroutine

you can exploit do concurrent

do concurrent(n = 1:size(Ai))
    B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n)
end do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed topic: mathematics linear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants