Initial implementation of COO / CSR sparse format #189

certik · 2020-05-16T15:31:35Z

The key of this is the API, as shown with examples in test_sparse.f90.
In particular, I am proposing the following functions / API:

Conversion Dense <-> COO:

dense2coo
coo2dense

Conversion COO -> CSR:

coo2csr
csr_sort_indices
csr_sum_duplicates
coo2csr_canonical (calls the previous three functions)
csr_has_canonical_format

CSR functionality:

csr_matvec
csr_getvalue

Dense functionality:

getnnz

In these algorithms, it seems it is possible to just have one (best)
implementation with one exception: sorting of indices (as implemented by
csr_sort_indices), where different algorithms give different
performance and it depends on the actual matrix. As such, it might make
sense to provide a default in csr_sort_indices that is called by
coo2csr_canonical that is as fast as we can make it for most cases (currently it
uses quicksort, but there are other faster algoritms for our use case
that we should switch over) and then provide other implementations such
as csr_sort_indices_mergesort, csr_sort_indices_timsort, etc. for
users so that they can choose the algorithm for sorting indices, or even
provide their own.

The file stdlib_experimental_sparse.f90 provides an example
implementation of these algorithms that was inspired by SciPy. We can
also discuss the details of that, but the key that I would like to
discuss is the API itself.

See #38.

The key of this is the API, as shown with examples in test_sparse.f90. In particular, I am proposing the following functions / API: Higher level API: Conversion Dense <-> COO: * dense2coo * coo2dense Conversion COO -> CSR: * coo2csr CSR functionality: * csr_matvec * csr_getvalue Dense functionality: * getnnz Lower level API: * csr_has_canonical_format * csr_sum_duplicates * csr_sort_indices In these algorithms, it seems it is possible to just have one (best) implementation with one exception: sorting of indices (as implemented by `csr_sort_indices`), where different algorithms give different performance and it depends on the actual matrix. As such, it might make sense to provide a default in `csr_sort_indices` that is called by `coo2csr` that is as fast as we can make it for most cases (currently it uses quicksort, but there are other faster algoritms for our use case that we should switch over) and then provide other implementations such as `csr_sort_indices_mergesort`, `csr_sort_indices_timsort`, etc. for users so that they can choose the algorithm for sorting indices, or even provide their own. The file stdlib_experimental_sparse.f90 provides an example implementation of these algorithms that was inspired by SciPy. We can also discuss the details of that, but the key that I would like to discuss is the API itself.

certik · 2020-05-16T15:36:34Z

@sfilippone I would really be interested in your feedback on the API, based on your work on psblas3. This is a serial API, which I think we should have, in addition to a parallel API that we'll tackle later.

zerothi

FYI, I have a somewhat related project https://github.com/siesta-project/buds
which also has CSR matrices and some quite nice features I think.

I would be very happy to move this into stdlib in one way or the other.

Some notes:

~~I think such a complex data structure should be hidden in a type (class) to allow more stuff~~ (discussions in link)
Some times it may be beneficial if the data contained isn't necessarily a single dimension (consider a 3D matrix with 2D sparsity and 1D dense).
Many external interfaces (in C) require 0-based indexing, and hence it would be nice with a retrieval method of the type to a 0-based index (basically only doing this on the index arrays and returning with a pointer to the actual data array)
My implementation allows very efficient addition of elements by having an additional counter. This allows one to have empty elements in each row and thus only re-allocate when one tries to add more terms than this. If some forms of the csr matrix becomes an issue it may be beneficial to have a 2nd format that has a list (each row) of lists (with data). In this way one need only reallocate the row that needs to be bigger.

I probably have some more comments if you want?

zerothi · 2020-05-16T17:26:02Z

src/stdlib_experimental_sparse.f90

+real(dp), intent(in) :: Ax(:), x(:)
+real(dp) :: y(size(Ap)-1)
+integer :: i
+!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i)


Generally I think omp should be orphaned. In that way it is the user who controls the parallelization, and not the stdlib.

Good comment. Let's discuss openmp in #213.

I forgot that I used openmp here, I am completely fine to remove it, that is an orthogonal issue to sparse matrices. If we decide to use openmp in stdlib, we can keep it here, if we decide not to, we can remove it. And I am happy to remove it to merge this PR, and we can always add it back in. Let's discuss more in #213.

zerothi · 2020-05-16T17:26:57Z

src/stdlib_experimental_sparse.f90

+end do
+end subroutine
+
+integer function getnnz(B) result(nnz)


Also, perhaps these methods should allow integer(8) to allow very large matrices.

Perhaps this should be get_nnz?

zerothi · 2020-05-16T17:30:08Z

src/stdlib_experimental_sparse.f90

+!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i)
+!$omp do
+do i = 1, size(Ap)-1
+    y(i) = dot_product(Ax(Ap(i):Ap(i+1)-1), x(Aj(Ap(i):Ap(i+1)-1)))


This will create a temporary which could potentially be very large. I would try to avoid this since the temporary still needs to traverse the copied elements.

zerothi · 2020-05-16T17:31:33Z

src/stdlib_experimental_sparse.f90

+end function
+
+
+real(dp) function csr_getvalue(Ap, Aj, Ax, i, j) result(r)


csr_get_value?

I have never really liked get when intent is obvious, I may be wrong.
But csr_value is just as clear to me. ;)

certik · 2020-05-16T19:19:44Z

Great comments! Thank you. I'll reply later, I'll just point out a general design that we figured out that will satisfy almost everybody in the community: have a low level API that is procedural with no side effects (some people call it functional) as in this PR. And then have a high level API that is object oriented and hides complexity in a class or derived types and that uses the low level API to do the work. If you want, you can help suggest some good high level API for this.

…

On Sat, May 16, 2020, at 11:38 AM, Nick R. Papior wrote: ***@***.**** commented on this pull request. FYI, I have a somewhat related project https://github.com/siesta-project/buds which also has CSR matrices and some quite nice features I think. I would be very happy to move this into stdlib in one way or the other. Some notes: * I think such a *complex* data structure should be hidden in a type (class) to allow more stuff * Some times it may be beneficial if the data contained isn't necessarily a single dimension (consider a 3D matrix with 2D sparsity and 1D dense). * Many external interfaces (in C) require 0-based indexing, and hence it would be nice with a retrieval method of the `type` to a 0-based index (basically only doing this on the index arrays and returning with a pointer to the actual data array) * My implementation allows very efficient addition of elements by having an additional counter. This allows one to have *empty* elements in each row and thus only re-allocate when one tries to add more terms than this. If some forms of the csr matrix becomes an issue it *may* be beneficial to have a 2nd format that has a list (each row) of lists (with data). In this way one need only reallocate the row that needs to be bigger. I probably have some more comments if you want? In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > + r1 = Ap(i) + r2 = Ap(i+1)-1 + l = r2-r1+1 + idx(:l) = iargsort_quicksort(Aj(r1:r2)) + Aj(r1:r2) = Aj(r1+idx(:l)-1) + Ax(r1:r2) = Ax(r1+idx(:l)-1) +end do +end subroutine + +function csr_matvec(Ap, Aj, Ax, x) result(y) +! Compute y = A*x for CSR matrix A and dense vectors x, y +integer, intent(in) :: Ap(:), Aj(:) +real(dp), intent(in) :: Ax(:), x(:) +real(dp) :: y(size(Ap)-1) +integer :: i +!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i) Generally I think `omp` should be orphaned. In that way it is the user who controls the parallelization, and not the stdlib. In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +integer, intent(out) :: Ai(:), Aj(:) +real(dp), intent(out) :: Ax(:) +integer :: i, j, idx +idx = 1 +do j = 1, size(B, 2) + do i = 1, size(B, 1) + if (abs(B(i, j)) < tiny(1._dp)) cycle + Ai(idx) = i + Aj(idx) = j + Ax(idx) = B(i, j) + idx = idx + 1 + end do +end do +end subroutine + +integer function getnnz(B) result(nnz) Also, perhaps these methods should allow `integer(8)` to allow very large matrices. Perhaps this should be `get_nnz`? In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > + idx(:l) = iargsort_quicksort(Aj(r1:r2)) + Aj(r1:r2) = Aj(r1+idx(:l)-1) + Ax(r1:r2) = Ax(r1+idx(:l)-1) +end do +end subroutine + +function csr_matvec(Ap, Aj, Ax, x) result(y) +! Compute y = A*x for CSR matrix A and dense vectors x, y +integer, intent(in) :: Ap(:), Aj(:) +real(dp), intent(in) :: Ax(:), x(:) +real(dp) :: y(size(Ap)-1) +integer :: i +!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i) +!$omp do +do i = 1, size(Ap)-1 + y(i) = dot_product(Ax(Ap(i):Ap(i+1)-1), x(Aj(Ap(i):Ap(i+1)-1))) This will create a temporary which *could* potentially be very large. I would try to avoid this since the temporary still needs to traverse the copied elements. In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +l = 1 +i = size(A) +! Now we always have A(l) < val; A(i) >= val and we must make sure that "i" is +! the lowest possible such index. +do while (l + 1 < i) + idx = (l+i) / 2 + if (A(idx) < val) then + l = idx + else + i = idx + end if +end do +end function + + +real(dp) function csr_getvalue(Ap, Aj, Ax, i, j) result(r) `csr_get_value`? I have never really liked `get` when intent is obvious, I may be wrong. But `csr_value` is just as clear to me. ;) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#189 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFAWDCCRCJHEQQTQ6PAFTRR3FSPANCNFSM4NC62JUA>.

jvdp1

The different API look good to me.

My major concern is about the declaration of the pointer array of CSR. This array can easily contain values larger than integer(int32). Therefore, I wonder if it would not be advisable to declare it as integer(int64).

jvdp1 · 2020-05-16T21:01:57Z

src/stdlib_experimental_sparse.f90

+end subroutine
+
+integer function getnnz(B) result(nnz)
+real(dp), intent(in) :: B(:, :)


This is for dense matrices.
Since we are in a module for sparse matrices, should it return nnz only for sparse matrices?

jvdp1 · 2020-05-16T21:05:52Z

src/stdlib_experimental_sparse.f90

+integer :: n
+B = 0
+do n = 1, size(Ai)
+    B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n)


This implies that all elements in Ai and Aj are filled, which is not always the case because COO is often used as a temporary format.

jvdp1 · 2020-05-16T21:05:55Z

src/stdlib_experimental_sparse.f90

+integer, intent(in) :: Ai(:), Aj(:)
+real(dp), intent(in) :: Ax(:)
+real(dp), intent(out) :: B(:, :)
+integer :: n


I would suggest to always use integer(int64) when going through a sparse matrix.

jvdp1 · 2020-05-16T21:08:47Z

src/stdlib_experimental_sparse.f90

+! Duplicate entries are carried over to the CSR representation.
+integer, intent(in) :: Ai(:), Aj(:)
+real(dp), intent(in) :: Ax(:)
+integer, intent(out) :: Bp(:), Bj(:)


I suggest to declare Bp(:) as integer(int64) to avoid any troubles with large sparse matrices.
However, we may want to be compatible to e.g., psblas3 (I didn't check what is the default declaration in psblas3).

jvdp1 · 2020-05-16T21:14:14Z

src/stdlib_experimental_sparse.f90

+verbose_ = .false.
+if (present(verbose)) verbose_ = verbose
+allocate(Bp(maxval(Ai)+1))
+if (verbose_) print *, "coo2csr"


optval can be used here ;)

jvdp1 · 2020-05-16T21:16:12Z

src/stdlib_experimental_sparse.f90

+call csr_sum_duplicates(Bp, Bj_, Bx_)
+if (verbose_) print *, "done"
+nnz = Bp(size(Bp))-1
+allocate(Bj(nnz), Bx(nnz))


This line can be removed with Fortran2003 and>, right?

jvdp1 · 2020-05-16T21:17:39Z

src/stdlib_experimental_sparse.f90

+
+subroutine csr_sum_duplicates(Ap, Aj, Ax)
+! Sum together duplicate column entries in each row of CSR matrix A
+! The column indicies within each row must be in sorted order.


Suggested change

! The column indicies within each row must be in sorted order.

! The column indices within each row must be in sorted order.

jvdp1 · 2020-05-16T21:23:37Z

src/stdlib_experimental_sparse.f90

+end do
+end subroutine
+
+function csr_matvec(Ap, Aj, Ax, x) result(y)


I suggest to follow the API of Sparse BLAS.

jvdp1 · 2020-05-16T21:26:18Z

src/stdlib_experimental_sparse.f90

+r = 0
+end function
+
+pure elemental subroutine swap_int(x,y)


I think this subroutine should be moved to another module of stdlib.

jvdp1 · 2020-05-16T21:28:40Z

src/tests/sparse/test_sparse.f90

+call check(all(abs(Bx - [5, 7, 1, 3]) < 1e-12_dp))
+call check(csr_has_canonical_format(Bp, Bj))
+
+call check(abs(csr_getvalue(Bp, Bj, Bx, 1, 1) - 5) < tiny(1._dp))


By experience, tiny(1._dp) can be too severe.

sfilippone · 2020-05-17T06:51:38Z

Re: int32 vs int64. You'll need both. Let me elaborate. The most common application of sparse matrices is in the inner steps of some PDE solver, dealing with a discretization on a mesh. Such problems can indeed reach sizes that need indices with int64; however in those cases they are tackled through parallel solution strategies mostly based on MPI. The parallelization strategy is to assign a subdomain to each MPI process; each subdomain is most likely of a size that fits comfortably in an int32, provided that you apply a local numbering scheme, as well as a global numbering scheme. And you occasionally need to convert the indices of the local portion into global numbering for some processing, and back; examples include data exchanges among processes within a transpose operation. The solution that I have now in the development branch of PSBLAS is the following: there are 4 integer kinds: int32==psb_mpk_ <= psb_ipk_ <= psb_lpk_ <= psb_epk_== int64 IPK is the integer kind used for *LOCAL indices, which are the ones for the local portion of the matrices involved in numerical operators such as Matrix-Vector product, LPK is the integer kind used for GLOBAL indices, which are used in pre- and post-processing. IPK and LPK can be specified at configure time within the constraints abov, with the default being IPK=int32 and LPK=int64. Salvatore

…

On Sat, May 16, 2020 at 11:32 PM Jeremie Vandenplas < ***@***.***> wrote: ***@***.**** commented on this pull request. The different API look good to me. My major concern is about the declaration of the pointer array of CSR. This array can easily contain values larger than integer(int32). Therefore, I wonder if it would not be advisable to declare it as integer(int64). ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +real(dp), intent(out) :: Ax(:) +integer :: i, j, idx +idx = 1 +do j = 1, size(B, 2) + do i = 1, size(B, 1) + if (abs(B(i, j)) < tiny(1._dp)) cycle + Ai(idx) = i + Aj(idx) = j + Ax(idx) = B(i, j) + idx = idx + 1 + end do +end do +end subroutine + +integer function getnnz(B) result(nnz) +real(dp), intent(in) :: B(:, :) This is for dense matrices. Since we are in a module for sparse matrices, should it return nnz only for sparse matrices? ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > + if (abs(B(i, j)) < tiny(1._dp)) cycle + nnz = nnz + 1 + end do +end do +end function + +! COO + +subroutine coo2dense(Ai, Aj, Ax, B) +integer, intent(in) :: Ai(:), Aj(:) +real(dp), intent(in) :: Ax(:) +real(dp), intent(out) :: B(:, :) +integer :: n +B = 0 +do n = 1, size(Ai) + B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n) This implies that all elements in Ai and Aj are filled, which is not always the case because COO is often used as a temporary format. ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +nnz = 0 +do j = 1, size(B, 2) + do i = 1, size(B, 1) + if (abs(B(i, j)) < tiny(1._dp)) cycle + nnz = nnz + 1 + end do +end do +end function + +! COO + +subroutine coo2dense(Ai, Aj, Ax, B) +integer, intent(in) :: Ai(:), Aj(:) +real(dp), intent(in) :: Ax(:) +real(dp), intent(out) :: B(:, :) +integer :: n I would suggest to always use integer(int64) when going through a sparse matrix. ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +real(dp), intent(in) :: Ax(:) +real(dp), intent(out) :: B(:, :) +integer :: n +B = 0 +do n = 1, size(Ai) + B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n) +end do +end subroutine + +subroutine coo2csr(Ai, Aj, Ax, Bp, Bj, Bx) +! Converts from COO (Ai, Aj, Ax) into CSR (Bp, Bj, Bx) +! Row and column indices are *not* assumed to be ordered. +! Duplicate entries are carried over to the CSR representation. +integer, intent(in) :: Ai(:), Aj(:) +real(dp), intent(in) :: Ax(:) +integer, intent(out) :: Bp(:), Bj(:) I suggest to declare Bp(:) as integer(int64) to avoid any troubles with large sparse matrices. However, we may want to be compatible to e.g., psblas3 (I didn't check what is the default declaration in psblas3). ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +! Converts from COO (Ai, Aj, Ax) into CSR (Bp, Bj, Bx) +! Row and column indices are *not* assumed to be ordered. +! Duplicate entries are summed up and the indices are ordered. +integer, intent(in) :: Ai(:), Aj(:) +real(dp), intent(in) :: Ax(:) +integer, allocatable, intent(out) :: Bp(:), Bj(:) +real(dp), allocatable, intent(out) :: Bx(:) +logical, optional, intent(in) :: verbose +integer :: Bj_(size(Ai)) +real(dp) :: Bx_(size(Ai)) +integer :: nnz +logical :: verbose_ +verbose_ = .false. +if (present(verbose)) verbose_ = verbose +allocate(Bp(maxval(Ai)+1)) +if (verbose_) print *, "coo2csr" optval can be used here ;) ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +integer :: Bj_(size(Ai)) +real(dp) :: Bx_(size(Ai)) +integer :: nnz +logical :: verbose_ +verbose_ = .false. +if (present(verbose)) verbose_ = verbose +allocate(Bp(maxval(Ai)+1)) +if (verbose_) print *, "coo2csr" +call coo2csr(Ai, Aj, Ax, Bp, Bj_, Bx_) +if (verbose_) print *, "csr_sort_indices" +call csr_sort_indices(Bp, Bj_, Bx_) +if (verbose_) print *, "csr_sum_duplicates" +call csr_sum_duplicates(Bp, Bj_, Bx_) +if (verbose_) print *, "done" +nnz = Bp(size(Bp))-1 +allocate(Bj(nnz), Bx(nnz)) This line can be removed with Fortran2003 and>, right? ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +! conditions facilitate faster matrix computations. +integer, intent(in) :: Ap(:), Aj(:) +integer :: i, j +r = .false. +do i = 1, size(Ap)-1 + if (Ap(i) > Ap(i+1)) return + do j = Ap(i)+1, Ap(i+1)-1 + if (Aj(j-1) >= Aj(j)) return + end do +end do +r = .true. +end function + +subroutine csr_sum_duplicates(Ap, Aj, Ax) +! Sum together duplicate column entries in each row of CSR matrix A +! The column indicies within each row must be in sorted order. ⬇️ Suggested change -! The column indicies within each row must be in sorted order. +! The column indices within each row must be in sorted order. ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +subroutine csr_sort_indices(Ap, Aj, Ax) +! Sort CSR column indices inplace +integer, intent(inout) :: Ap(:), Aj(:) +real(dp), intent(inout) :: Ax(:) +integer :: i, r1, r2, l, idx(size(Aj)) +do i = 1, size(Ap)-1 + r1 = Ap(i) + r2 = Ap(i+1)-1 + l = r2-r1+1 + idx(:l) = iargsort_quicksort(Aj(r1:r2)) + Aj(r1:r2) = Aj(r1+idx(:l)-1) + Ax(r1:r2) = Ax(r1+idx(:l)-1) +end do +end subroutine + +function csr_matvec(Ap, Aj, Ax, x) result(y) I suggest to follow the API of Sparse BLAS. ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +real(dp), intent(in) :: Ax(:) +integer, intent(in) :: i, j +integer :: row_start, row_end, offset +row_start = Ap(i) +row_end = Ap(i+1)-1 +offset = lower_bound(Aj(row_start:row_end), j) + row_start - 1 +if (offset <= row_end) then + if (Aj(offset) == j) then + r = Ax(offset) + return + end if +end if +r = 0 +end function + +pure elemental subroutine swap_int(x,y) I think this subroutine should be moved to another module of stdlib. ------------------------------ In src/tests/sparse/test_sparse.f90 <#189 (comment)>: > +call check(all(Bj == [1, 1, 3, 3, 4, 2])) +call check(all(abs(Bx - [1, 4, 2, 5, 1, 3]) < 1e-12_dp)) +call csr_sum_duplicates(Bp, Bj, Bx) +nnz = Bp(size(Bp))-1 +call check(all(Bp == [1, 2, 4, 5])) +call check(all(Bj(:nnz) == [1, 3, 4, 2])) +call check(all(abs(Bx(:nnz) - [5, 7, 1, 3]) < 1e-12_dp)) +call check(csr_has_canonical_format(Bp, Bj(:nnz))) + +call coo2csr_canonical(Ai, Aj, Ax, Bp, Bj, Bx) +call check(all(Bp == [1, 2, 4, 5])) +call check(all(Bj == [1, 3, 4, 2])) +call check(all(abs(Bx - [5, 7, 1, 3]) < 1e-12_dp)) +call check(csr_has_canonical_format(Bp, Bj)) + +call check(abs(csr_getvalue(Bp, Bj, Bx, 1, 1) - 5) < tiny(1._dp)) By experience, tiny(1._dp) can be too severe. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#189 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD274T4OGJKXS6GJTVKGM3TRR4A7RANCNFSM4NC62JUA> .

jvdp1 · 2020-05-17T07:18:58Z

Re: int32 vs int64. You'll need both. Let me elaborate. The most common application of sparse matrices is in the inner steps of some PDE solver, dealing with a discretization on a mesh. Such problems can indeed reach sizes that need indices with int64; however in those cases they are tackled through parallel solution strategies mostly based on MPI. The parallelization strategy is to assign a subdomain to each MPI process; each subdomain is most likely of a size that fits comfortably in an int32, provided that you apply a local numbering scheme, as well as a global numbering scheme. And you occasionally need to convert the indices of the local portion into global numbering for some processing, and back; examples include data exchanges among processes within a transpose operation. The solution that I have now in the development branch of PSBLAS is the following: there are 4 integer kinds: int32==psb_mpk_ <= psb_ipk_ <= psb_lpk_ <= psb_epk_== int64 IPK is the integer kind used for *LOCAL indices, which are the ones for the local portion of the matrices involved in numerical operators such as Matrix-Vector product, LPK is the integer kind used for GLOBAL indices, which are used in pre- and post-processing. IPK and LPK can be specified at configure time within the constraints abov, with the default being IPK=int32 and LPK=int64. Salvatore

Thank you Salvatore for your input. I should have a closer look to psblas.

For stdlib: would it not be of interest to integrate psblas3 (if allowed) in stdlib directly? Or through fpm? In comparison to the other implementations, psblas3 is well tested, is associated to peer-reviewed publications and is parallelized (also with Coarray Fortran?) if we want to support parallelization in stdlib. If something is missing, we could identify it and maybe collaborate with psblas to implement the missing parts there?
This is just a question to be sure that we don't start into something that is similar to reinventing the wheel.

sfilippone · 2020-05-17T07:45:43Z

I am OK with integrating PSBLAS in stdlib (with proper handling of copyright for me and my collaborators), at least the parts that make sense today (I suspect I have a lot of stuff that should not be included just yet). In the project there are a set of facilities marked at SERIAL (admittedly, they should be packaged better because they are split in two-three subdirs) which are the ones you are probably most interested in. Note that I also use a home-grown set of sed scripts to have a crude emulation of macros/templates to avoid rewriting things multiple times; if that could be morphed into something robust I'd be more than happy, while waiting for the Fortran standard committee to decide on a template facility (I have seen more than one proposal in this direction, but it's not yet decided AFAIK). A few additional comments: 1. Storage formats. I have a set of additional storage formats in psblas3-ext; they also support CUDA, accessed through the C interoperability. 2. Coarrays. I have a preliminary version with CoArrays (in fact, I am one of the founders of the OpenCoArrays project which is used to provide CoArrays in GNU). However, there is a big issue and that is constraints C825. The effect of the constraint is that the set of entities that can have a coarray component is fixed at compile time; in other words, you cannot have an allocatable array of derived types that contains a(n allocatable) coarray component. I have reported this to the Fortran standard committee (see https://j3-fortran.org/doc/year/18/18-280r1.txt ) and indeed my proposal has been approved; however there is no telling when it will appear in compilers, hence I am not going to use this in production code any time soon. Salvatore

…

On Sun, May 17, 2020 at 9:19 AM Jeremie Vandenplas ***@***.***> wrote: Re: int32 vs int64. You'll need both. Let me elaborate. The most common application of sparse matrices is in the inner steps of some PDE solver, dealing with a discretization on a mesh. Such problems can indeed reach sizes that need indices with int64; however in those cases they are tackled through parallel solution strategies mostly based on MPI. The parallelization strategy is to assign a subdomain to each MPI process; each subdomain is most likely of a size that fits comfortably in an int32, provided that you apply a local numbering scheme, as well as a global numbering scheme. And you occasionally need to convert the indices of the local portion into global numbering for some processing, and back; examples include data exchanges among processes within a transpose operation. The solution that I have now in the development branch of PSBLAS is the following: there are 4 integer kinds: int32==psb_mpk_ <= psb_ipk_ <= psb_lpk_ <= psb_epk_== int64 IPK is the integer kind used for *LOCAL indices, which are the ones for the local portion of the matrices involved in numerical operators such as Matrix-Vector product, LPK is the integer kind used for GLOBAL indices, which are used in pre- and post-processing. IPK and LPK can be specified at configure time within the constraints abov, with the default being IPK=int32 and LPK=int64. Salvatore Thank you Salvatore for your input. I should have a closer look to psblas. For stdlib: would it not be of interest to integrate psblas3 (if allowed) in stdlib directly? Or through fpm? In comparison to the other implementations, psblas3 is well tested, is associated to peer-reviewed publications and is parallelized (also with Coarray Fortran?) if we want to support parallelization in stdlib. If something is missing, we could identify it and maybe collaborate with psblas to implement the missing parts there? This is just a question to be sure that we don't start into something that is similar to reinventing the wheel. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#189 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD274T4E5LRQPHUNASHTN53RR6FW3ANCNFSM4NC62JUA> .

certik · 2020-05-17T13:46:34Z

Finally got to my computer. Thanks everybody for the feedback, I'll first address the high level feedback: do we want this at all, or should stdlib just use psblas3?

My own view is that stdlib should not be competing with or reimplementing well established libraries, whether LAPACK, psblas3, or others.

Just linking against psblas3 but not providing any additional functionality or API I think is also not worth it. As the Fortran Package Manager becomes more successful, it will be very easy to depend on any library such as psblas3 in users projects, and we should be encouraging people to do exactly that. It will be good for users, good for developers of such libraries and the whole ecosystem.

Where I see the role of stdlib is to standardize API and provide a reference implementation to functionalities that are repeated from project to project with slightly different API each. There are many implementations of CSR and other functionality, see #38 for a partial list, and there are many codes that use this in some form. We should see if there is a way to figure out an API that each of these codes could use.

The other aspect where I see stdlib as contributing is when people just want to quickly use some functionality in their research / new projects, or later on interactively in the Jupyter notebook for example, having a nice and well documented API that is very easy to use for new users would be beneficial. Later on, for big specialized production codes, one can always use specialized libraries. As an example, most electronic structure codes have their own custom specialized eigensolver. But I think it is still worth having a nice API to dense eigensolver via Lapack, and a sparse eigensolver on top of this sparse functionality, so that people who just want to prototype some algorithm in Fortran can do so as easily as with SciPy, Matlab or Julia.

In order to move the conversation forward, we have to have both non-OO low level api as well as a high level OO API. The reason is that I know many people (myself included) who do not want to use the OO API, just simple subroutines that do the job. But I also know a lot of people who would prefer the OO API. By providing both, we can capture most of the Fortran community and make stdlib useful for almost everybody.

@sfilippone at the very least I can see that we can use your sorting algorithms. I am hoping there is more that we can figure out how to reuse.

Also, I just want to start with a serial implementation, we can overload the subroutines to work with both int32 and int64 etc. I am really hoping we can figure out an API that we would all agree would make sense to have in stdlib.

jvdp1 · 2020-05-17T15:11:23Z

But I think it is still worth having a nice API to dense eigensolver via Lapack, and a sparse eigensolver on top of this sparse functionality, so that people who just want to prototype some algorithm in Fortran can do so as easily as with SciPy, Matlab or Julia.

@certik I totally agree with your comments.However, I wonder if this could be achieved by interfacing with the serial version of psblas, even partially. If it is not possible (e.g., because of no non-OO low level API), then I am fine with this, and I will help as good as I can.

milancurcic · 2020-05-17T15:16:33Z

I don't think that stdlib should just gulp up existing libraries and expose them under different name.

First, decide if sparse matrix algebra is in scope or not. I'm not the target user (although I use SMM through ESMF under the hood daily), but if we agreed that a SciPy-like functionality is in scope, then this is too.

Second, if you do add it to stdlib, I think that it should provide added value over any existing library. In this case, I see that the added value could be ease of installation, and ease of use (simpler API).

This resonated with me personally the most:

The other aspect where I see stdlib as contributing is when people just want to quickly use some functionality in their research / new projects, or later on interactively in the Jupyter notebook for example, having a nice and well documented API that is very easy to use for new users would be beneficial. Later on, for big specialized production codes, one can always use specialized libraries.

milancurcic · 2020-05-17T15:31:55Z

Just to be clear, I personally think that SciPy-like scope is included in the scope of Fortran stdlib.

milancurcic

I think this can go forward, pending any unresolved requests for changes from other reviewers.

awvwgk · 2021-05-31T08:35:13Z

What is the current status of this patch? We have a couple of merge conflicts in the build files, which need a merge/rebase against the default branch. Also, there are a couple of unaddressed review comments. @certik, is there anything we can do to help you move this pull request forward?

milancurcic · 2021-08-20T17:30:22Z

@certik would you like to continue working on this or should we close it for the time being?

certik · 2021-08-26T20:15:26Z

@milancurcic are you against this approach to sparse arrays?

If not, then let's keep it open. While I don't have time to finish this myself right now, others should help. If we close it, then it will be harder for people to know about it.

milancurcic · 2021-08-26T22:09:42Z

We can keep it open. What I'm asking is whether you intend to continue working on this eventually, when you have more time. I ask this because it wasn't clear to me whether you thought this PR can go forward with the changes requested.

In essence, is this PR stale due to lack of time or lack of consensus?

What should be tackled next if someone else were to pick it up?

Would you be able to review the changes if we found a person to take over this PR?

certik · 2021-08-27T18:12:07Z

I reread all comments. I don't see any major disagreements. What has to be done:

Provide int32 and int64 versions or just do int64 for now and add int32 later; possibly generate this using fypp
Address every comment
keep this low level non-OO interface
benchmark against scipy (and other libraries). If I remember well, the bottleneck was the sorting algorithm, but overall the performance should be decent. But it would be good to have numbers, to ensure the API allows solid performance and to see how much more (if at all) we have to improve.
Later we can discuss how to design an OO interface built on top.

I don't have the time right now, but I want to finish this, or help others finish it. This will be a nice selling point of stdlib to have these basic sparse routines in, nicely documented and very performing.

awvwgk · 2021-09-18T10:09:56Z

I marked this patch as wontfix since you are not intending to continue working on it. But the implementation here is of course free to use for other contributors to continue working on this project.

certik · 2021-09-18T15:55:18Z

I wrote

I don't have the time right now, but I want to finish this, or help others finish it.

So I disagree with the statement:

since you are not intending to continue working on it.

Furthermore, I think marking things as "wontfix" is not encouraging to others (or me!) to pick it up and finish it. For this reason, and given that my stated intention is to finish this, I have removed the "wontfix" label.

We need to help each other finish these things and get them into stdlib to move stdlib forward.

You don't want such COO functionality in there?

If you do, then let's plan how to do it.

awvwgk · 2021-09-18T17:07:31Z

Sorry, that was not what I meant. This patch seemed stale and open for others to pickup on, at least that was the situation I usually applied this label for my own patches.

milancurcic · 2021-09-18T17:19:17Z

FWIW, the "wontfix" label sounds demotivating to me as well. Like the whole project is yelling at me "we don't want this". I don't mind a PR staying open for a long time, if that's what it takes to find people to see it through.

awvwgk · 2021-09-18T17:48:44Z

Be assured that I want to move project forward. Frankly, I don't think this patch can be moved forward in its current form, even after resolving the merge conflicts.

I have been looking into this patch a couple of times now, and stopped again because it has significantly diverged from the latest default branch, which makes it quite involved to work with the implementation because a lot of changes in the general structure and build files are just missing.

Instead I would suggest to salvage the implementation and start fresh from the latest default branch, addressing the feedback provided already. We have done so in the past with other patches like the stringlist and were quite successful with this approach.

certik · 2021-09-20T18:59:51Z

@awvwgk awesome. I know that's not what you meant and that you want to move the project forward. But that is how I perceive it and I think many others too. To move forward, I think the label that you just used "help wanted" is perfect. Let's keep this PR open until a better one is up.

vickysharma0812 · 2022-12-16T17:09:08Z

@certik

In the following

stdlib/src/stdlib_experimental_sparse.f90

Line 36 in 89feb91

if (abs(B(i, j)) < tiny(1._dp)) cycle

instead of using tiny(1.0)

please declare a parameter:

real(dp), parameter ::  zero=tiny(1.0_dp)
...

if (abs(B(i, j)) < zero )  cycle

...

vickysharma0812 · 2022-12-16T17:12:03Z

@certik

In the following subroutine

stdlib/src/stdlib_experimental_sparse.f90

Lines 44 to 53 in 89feb91

    
           subroutine coo2dense(Ai, Aj, Ax, B) 
        
           integer, intent(in) :: Ai(:), Aj(:) 
        
           real(dp), intent(in) :: Ax(:) 
        
           real(dp), intent(out) :: B(:, :) 
        
           integer :: n 
        
           B = 0 
        
           do n = 1, size(Ai) 
        
               B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n) 
        
           end do 
        
           end subroutine

you can exploit do concurrent

do concurrent(n = 1:size(Ai))
    B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n)
end do

certik requested review from milancurcic and jvdp1 May 16, 2020 15:31

certik mentioned this pull request May 16, 2020

Sparse matrix support #38

Open

zerothi reviewed May 16, 2020

View reviewed changes

jvdp1 reviewed May 16, 2020

View reviewed changes

milancurcic approved these changes Jun 23, 2020

View reviewed changes

awvwgk added the waiting for OP This patch requires action from the OP label Apr 17, 2021

zoziha mentioned this pull request Aug 27, 2021

[Proposal] add swap subroutines: swap the values of two variables of the same type ? #462

Open

awvwgk added topic: mathematics linear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ... wontfix This will not be worked on and removed waiting for OP This patch requires action from the OP labels Sep 18, 2021

certik removed the wontfix This will not be worked on label Sep 18, 2021

awvwgk added the help wanted Extra attention is needed label Sep 18, 2021

jalvesz mentioned this pull request Jan 10, 2024

Sparse algebra support with OOP API #760

Merged

jvdp1 mentioned this pull request Jan 14, 2024

Support for linear algebra #749

Open

		end function


		real(dp) function csr_getvalue(Ap, Aj, Ax, i, j) result(r)

	! The column indicies within each row must be in sorted order.
	! The column indices within each row must be in sorted order.

Initial implementation of COO / CSR sparse format #189

Are you sure you want to change the base?

Initial implementation of COO / CSR sparse format #189

Conversation

certik commented May 16, 2020 • edited Loading

certik commented May 16, 2020

zerothi left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

certik Jun 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

certik commented May 16, 2020 via email

jvdp1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfilippone commented May 17, 2020 via email

jvdp1 commented May 17, 2020

sfilippone commented May 17, 2020 via email

certik commented May 17, 2020 • edited Loading

jvdp1 commented May 17, 2020

milancurcic commented May 17, 2020

milancurcic commented May 17, 2020

milancurcic left a comment

Choose a reason for hiding this comment

awvwgk commented May 31, 2021

milancurcic commented Aug 20, 2021

certik commented Aug 26, 2021

milancurcic commented Aug 26, 2021

certik commented Aug 27, 2021

awvwgk commented Sep 18, 2021

certik commented Sep 18, 2021 • edited Loading

awvwgk commented Sep 18, 2021

milancurcic commented Sep 18, 2021

awvwgk commented Sep 18, 2021

certik commented Sep 20, 2021

vickysharma0812 commented Dec 16, 2022

vickysharma0812 commented Dec 16, 2022

certik commented May 16, 2020 •

edited

Loading

zerothi left a comment •

edited

Loading

certik Jun 20, 2020 •

edited

Loading

certik commented May 17, 2020 •

edited

Loading

certik commented Sep 18, 2021 •

edited

Loading