-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of COO / CSR sparse format #189
base: master
Are you sure you want to change the base?
Conversation
The key of this is the API, as shown with examples in test_sparse.f90. In particular, I am proposing the following functions / API: Higher level API: Conversion Dense <-> COO: * dense2coo * coo2dense Conversion COO -> CSR: * coo2csr CSR functionality: * csr_matvec * csr_getvalue Dense functionality: * getnnz Lower level API: * csr_has_canonical_format * csr_sum_duplicates * csr_sort_indices In these algorithms, it seems it is possible to just have one (best) implementation with one exception: sorting of indices (as implemented by `csr_sort_indices`), where different algorithms give different performance and it depends on the actual matrix. As such, it might make sense to provide a default in `csr_sort_indices` that is called by `coo2csr` that is as fast as we can make it for most cases (currently it uses quicksort, but there are other faster algoritms for our use case that we should switch over) and then provide other implementations such as `csr_sort_indices_mergesort`, `csr_sort_indices_timsort`, etc. for users so that they can choose the algorithm for sorting indices, or even provide their own. The file stdlib_experimental_sparse.f90 provides an example implementation of these algorithms that was inspired by SciPy. We can also discuss the details of that, but the key that I would like to discuss is the API itself.
@sfilippone I would really be interested in your feedback on the API, based on your work on psblas3. This is a serial API, which I think we should have, in addition to a parallel API that we'll tackle later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I have a somewhat related project https://github.com/siesta-project/buds
which also has CSR matrices and some quite nice features I think.
I would be very happy to move this into stdlib in one way or the other.
Some notes:
I think such a complex data structure should be hidden in a type (class) to allow more stuff(discussions in link)- Some times it may be beneficial if the data contained isn't necessarily a single dimension (consider a 3D matrix with 2D sparsity and 1D dense).
- Many external interfaces (in C) require 0-based indexing, and hence it would be nice with a retrieval method of the
type
to a 0-based index (basically only doing this on the index arrays and returning with a pointer to the actual data array) - My implementation allows very efficient addition of elements by having an additional counter. This allows one to have empty elements in each row and thus only re-allocate when one tries to add more terms than this. If some forms of the csr matrix becomes an issue it may be beneficial to have a 2nd format that has a list (each row) of lists (with data). In this way one need only reallocate the row that needs to be bigger.
I probably have some more comments if you want?
real(dp), intent(in) :: Ax(:), x(:) | ||
real(dp) :: y(size(Ap)-1) | ||
integer :: i | ||
!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I think omp
should be orphaned. In that way it is the user who controls the parallelization, and not the stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good comment. Let's discuss openmp in #213.
I forgot that I used openmp here, I am completely fine to remove it, that is an orthogonal issue to sparse matrices. If we decide to use openmp in stdlib, we can keep it here, if we decide not to, we can remove it. And I am happy to remove it to merge this PR, and we can always add it back in. Let's discuss more in #213.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. :)
end do | ||
end subroutine | ||
|
||
integer function getnnz(B) result(nnz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, perhaps these methods should allow integer(8)
to allow very large matrices.
Perhaps this should be get_nnz
?
!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i) | ||
!$omp do | ||
do i = 1, size(Ap)-1 | ||
y(i) = dot_product(Ax(Ap(i):Ap(i+1)-1), x(Aj(Ap(i):Ap(i+1)-1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will create a temporary which could potentially be very large. I would try to avoid this since the temporary still needs to traverse the copied elements.
end function | ||
|
||
|
||
real(dp) function csr_getvalue(Ap, Aj, Ax, i, j) result(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
csr_get_value
?
I have never really liked get
when intent is obvious, I may be wrong.
But csr_value
is just as clear to me. ;)
Great comments! Thank you.
I'll reply later, I'll just point out a general design that we figured out that will satisfy almost everybody in the community: have a low level API that is procedural with no side effects (some people call it functional) as in this PR. And then have a high level API that is object oriented and hides complexity in a class or derived types and that uses the low level API to do the work.
If you want, you can help suggest some good high level API for this.
…On Sat, May 16, 2020, at 11:38 AM, Nick R. Papior wrote:
***@***.**** commented on this pull request.
FYI, I have a somewhat related project https://github.com/siesta-project/buds
which also has CSR matrices and some quite nice features I think.
I would be very happy to move this into stdlib in one way or the other.
Some notes:
* I think such a *complex* data structure should be hidden in a type
(class) to allow more stuff
* Some times it may be beneficial if the data contained isn't
necessarily a single dimension (consider a 3D matrix with 2D sparsity
and 1D dense).
* Many external interfaces (in C) require 0-based indexing, and hence
it would be nice with a retrieval method of the `type` to a 0-based
index (basically only doing this on the index arrays and returning with
a pointer to the actual data array)
* My implementation allows very efficient addition of elements by
having an additional counter. This allows one to have *empty* elements
in each row and thus only re-allocate when one tries to add more terms
than this. If some forms of the csr matrix becomes an issue it *may* be
beneficial to have a 2nd format that has a list (each row) of lists
(with data). In this way one need only reallocate the row that needs to
be bigger.
I probably have some more comments if you want?
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> + r1 = Ap(i)
+ r2 = Ap(i+1)-1
+ l = r2-r1+1
+ idx(:l) = iargsort_quicksort(Aj(r1:r2))
+ Aj(r1:r2) = Aj(r1+idx(:l)-1)
+ Ax(r1:r2) = Ax(r1+idx(:l)-1)
+end do
+end subroutine
+
+function csr_matvec(Ap, Aj, Ax, x) result(y)
+! Compute y = A*x for CSR matrix A and dense vectors x, y
+integer, intent(in) :: Ap(:), Aj(:)
+real(dp), intent(in) :: Ax(:), x(:)
+real(dp) :: y(size(Ap)-1)
+integer :: i
+!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i)
Generally I think `omp` should be orphaned. In that way it is the user
who controls the parallelization, and not the stdlib.
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> +integer, intent(out) :: Ai(:), Aj(:)
+real(dp), intent(out) :: Ax(:)
+integer :: i, j, idx
+idx = 1
+do j = 1, size(B, 2)
+ do i = 1, size(B, 1)
+ if (abs(B(i, j)) < tiny(1._dp)) cycle
+ Ai(idx) = i
+ Aj(idx) = j
+ Ax(idx) = B(i, j)
+ idx = idx + 1
+ end do
+end do
+end subroutine
+
+integer function getnnz(B) result(nnz)
Also, perhaps these methods should allow `integer(8)` to allow very
large matrices.
Perhaps this should be `get_nnz`?
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> + idx(:l) = iargsort_quicksort(Aj(r1:r2))
+ Aj(r1:r2) = Aj(r1+idx(:l)-1)
+ Ax(r1:r2) = Ax(r1+idx(:l)-1)
+end do
+end subroutine
+
+function csr_matvec(Ap, Aj, Ax, x) result(y)
+! Compute y = A*x for CSR matrix A and dense vectors x, y
+integer, intent(in) :: Ap(:), Aj(:)
+real(dp), intent(in) :: Ax(:), x(:)
+real(dp) :: y(size(Ap)-1)
+integer :: i
+!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i)
+!$omp do
+do i = 1, size(Ap)-1
+ y(i) = dot_product(Ax(Ap(i):Ap(i+1)-1), x(Aj(Ap(i):Ap(i+1)-1)))
This will create a temporary which *could* potentially be very large. I
would try to avoid this since the temporary still needs to traverse the
copied elements.
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> +l = 1
+i = size(A)
+! Now we always have A(l) < val; A(i) >= val and we must make sure that "i" is
+! the lowest possible such index.
+do while (l + 1 < i)
+ idx = (l+i) / 2
+ if (A(idx) < val) then
+ l = idx
+ else
+ i = idx
+ end if
+end do
+end function
+
+
+real(dp) function csr_getvalue(Ap, Aj, Ax, i, j) result(r)
`csr_get_value`?
I have never really liked `get` when intent is obvious, I may be wrong.
But `csr_value` is just as clear to me. ;)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#189 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFAWDCCRCJHEQQTQ6PAFTRR3FSPANCNFSM4NC62JUA>.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The different API look good to me.
My major concern is about the declaration of the pointer array of CSR. This array can easily contain values larger than integer(int32)
. Therefore, I wonder if it would not be advisable to declare it as integer(int64)
.
end subroutine | ||
|
||
integer function getnnz(B) result(nnz) | ||
real(dp), intent(in) :: B(:, :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for dense matrices.
Since we are in a module for sparse matrices, should it return nnz only for sparse matrices?
integer :: n | ||
B = 0 | ||
do n = 1, size(Ai) | ||
B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that all elements in Ai
and Aj
are filled, which is not always the case because COO is often used as a temporary format.
integer, intent(in) :: Ai(:), Aj(:) | ||
real(dp), intent(in) :: Ax(:) | ||
real(dp), intent(out) :: B(:, :) | ||
integer :: n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to always use integer(int64)
when going through a sparse matrix.
! Duplicate entries are carried over to the CSR representation. | ||
integer, intent(in) :: Ai(:), Aj(:) | ||
real(dp), intent(in) :: Ax(:) | ||
integer, intent(out) :: Bp(:), Bj(:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to declare Bp(:)
as integer(int64)
to avoid any troubles with large sparse matrices.
However, we may want to be compatible to e.g., psblas3 (I didn't check what is the default declaration in psblas3).
verbose_ = .false. | ||
if (present(verbose)) verbose_ = verbose | ||
allocate(Bp(maxval(Ai)+1)) | ||
if (verbose_) print *, "coo2csr" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optval
can be used here ;)
call csr_sum_duplicates(Bp, Bj_, Bx_) | ||
if (verbose_) print *, "done" | ||
nnz = Bp(size(Bp))-1 | ||
allocate(Bj(nnz), Bx(nnz)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be removed with Fortran2003 and>, right?
|
||
subroutine csr_sum_duplicates(Ap, Aj, Ax) | ||
! Sum together duplicate column entries in each row of CSR matrix A | ||
! The column indicies within each row must be in sorted order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
! The column indicies within each row must be in sorted order. | |
! The column indices within each row must be in sorted order. |
end do | ||
end subroutine | ||
|
||
function csr_matvec(Ap, Aj, Ax, x) result(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to follow the API of Sparse BLAS.
r = 0 | ||
end function | ||
|
||
pure elemental subroutine swap_int(x,y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this subroutine should be moved to another module of stdlib
.
call check(all(abs(Bx - [5, 7, 1, 3]) < 1e-12_dp)) | ||
call check(csr_has_canonical_format(Bp, Bj)) | ||
|
||
call check(abs(csr_getvalue(Bp, Bj, Bx, 1, 1) - 5) < tiny(1._dp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By experience, tiny(1._dp)
can be too severe.
Re: int32 vs int64.
You'll need both. Let me elaborate.
The most common application of sparse matrices is in the inner steps of
some PDE solver, dealing with a discretization on a mesh. Such problems can
indeed reach sizes that need indices with int64; however in those cases
they are tackled through parallel solution strategies mostly based on MPI.
The parallelization strategy is to assign a subdomain to each MPI process;
each subdomain is most likely of a size that fits comfortably in an int32,
provided that you apply a local numbering scheme, as well as a global
numbering scheme. And you occasionally need to convert the indices of the
local portion into global numbering for some processing, and back; examples
include data exchanges among processes within a transpose operation.
The solution that I have now in the development branch of PSBLAS is the
following: there are 4 integer kinds:
int32==psb_mpk_ <= psb_ipk_ <= psb_lpk_ <= psb_epk_== int64
IPK is the integer kind used for *LOCAL indices, which are the ones for the
local portion of the matrices involved in numerical operators such as
Matrix-Vector product,
LPK is the integer kind used for GLOBAL indices, which are used in pre- and
post-processing.
IPK and LPK can be specified at configure time within the constraints abov,
with the default being IPK=int32 and LPK=int64.
Salvatore
…On Sat, May 16, 2020 at 11:32 PM Jeremie Vandenplas < ***@***.***> wrote:
***@***.**** commented on this pull request.
The different API look good to me.
My major concern is about the declaration of the pointer array of CSR.
This array can easily contain values larger than integer(int32).
Therefore, I wonder if it would not be advisable to declare it as
integer(int64).
------------------------------
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> +real(dp), intent(out) :: Ax(:)
+integer :: i, j, idx
+idx = 1
+do j = 1, size(B, 2)
+ do i = 1, size(B, 1)
+ if (abs(B(i, j)) < tiny(1._dp)) cycle
+ Ai(idx) = i
+ Aj(idx) = j
+ Ax(idx) = B(i, j)
+ idx = idx + 1
+ end do
+end do
+end subroutine
+
+integer function getnnz(B) result(nnz)
+real(dp), intent(in) :: B(:, :)
This is for dense matrices.
Since we are in a module for sparse matrices, should it return nnz only
for sparse matrices?
------------------------------
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> + if (abs(B(i, j)) < tiny(1._dp)) cycle
+ nnz = nnz + 1
+ end do
+end do
+end function
+
+! COO
+
+subroutine coo2dense(Ai, Aj, Ax, B)
+integer, intent(in) :: Ai(:), Aj(:)
+real(dp), intent(in) :: Ax(:)
+real(dp), intent(out) :: B(:, :)
+integer :: n
+B = 0
+do n = 1, size(Ai)
+ B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n)
This implies that all elements in Ai and Aj are filled, which is not
always the case because COO is often used as a temporary format.
------------------------------
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> +nnz = 0
+do j = 1, size(B, 2)
+ do i = 1, size(B, 1)
+ if (abs(B(i, j)) < tiny(1._dp)) cycle
+ nnz = nnz + 1
+ end do
+end do
+end function
+
+! COO
+
+subroutine coo2dense(Ai, Aj, Ax, B)
+integer, intent(in) :: Ai(:), Aj(:)
+real(dp), intent(in) :: Ax(:)
+real(dp), intent(out) :: B(:, :)
+integer :: n
I would suggest to always use integer(int64) when going through a sparse
matrix.
------------------------------
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> +real(dp), intent(in) :: Ax(:)
+real(dp), intent(out) :: B(:, :)
+integer :: n
+B = 0
+do n = 1, size(Ai)
+ B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n)
+end do
+end subroutine
+
+subroutine coo2csr(Ai, Aj, Ax, Bp, Bj, Bx)
+! Converts from COO (Ai, Aj, Ax) into CSR (Bp, Bj, Bx)
+! Row and column indices are *not* assumed to be ordered.
+! Duplicate entries are carried over to the CSR representation.
+integer, intent(in) :: Ai(:), Aj(:)
+real(dp), intent(in) :: Ax(:)
+integer, intent(out) :: Bp(:), Bj(:)
I suggest to declare Bp(:) as integer(int64) to avoid any troubles with
large sparse matrices.
However, we may want to be compatible to e.g., psblas3 (I didn't check
what is the default declaration in psblas3).
------------------------------
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> +! Converts from COO (Ai, Aj, Ax) into CSR (Bp, Bj, Bx)
+! Row and column indices are *not* assumed to be ordered.
+! Duplicate entries are summed up and the indices are ordered.
+integer, intent(in) :: Ai(:), Aj(:)
+real(dp), intent(in) :: Ax(:)
+integer, allocatable, intent(out) :: Bp(:), Bj(:)
+real(dp), allocatable, intent(out) :: Bx(:)
+logical, optional, intent(in) :: verbose
+integer :: Bj_(size(Ai))
+real(dp) :: Bx_(size(Ai))
+integer :: nnz
+logical :: verbose_
+verbose_ = .false.
+if (present(verbose)) verbose_ = verbose
+allocate(Bp(maxval(Ai)+1))
+if (verbose_) print *, "coo2csr"
optval can be used here ;)
------------------------------
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> +integer :: Bj_(size(Ai))
+real(dp) :: Bx_(size(Ai))
+integer :: nnz
+logical :: verbose_
+verbose_ = .false.
+if (present(verbose)) verbose_ = verbose
+allocate(Bp(maxval(Ai)+1))
+if (verbose_) print *, "coo2csr"
+call coo2csr(Ai, Aj, Ax, Bp, Bj_, Bx_)
+if (verbose_) print *, "csr_sort_indices"
+call csr_sort_indices(Bp, Bj_, Bx_)
+if (verbose_) print *, "csr_sum_duplicates"
+call csr_sum_duplicates(Bp, Bj_, Bx_)
+if (verbose_) print *, "done"
+nnz = Bp(size(Bp))-1
+allocate(Bj(nnz), Bx(nnz))
This line can be removed with Fortran2003 and>, right?
------------------------------
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> +! conditions facilitate faster matrix computations.
+integer, intent(in) :: Ap(:), Aj(:)
+integer :: i, j
+r = .false.
+do i = 1, size(Ap)-1
+ if (Ap(i) > Ap(i+1)) return
+ do j = Ap(i)+1, Ap(i+1)-1
+ if (Aj(j-1) >= Aj(j)) return
+ end do
+end do
+r = .true.
+end function
+
+subroutine csr_sum_duplicates(Ap, Aj, Ax)
+! Sum together duplicate column entries in each row of CSR matrix A
+! The column indicies within each row must be in sorted order.
⬇️ Suggested change
-! The column indicies within each row must be in sorted order.
+! The column indices within each row must be in sorted order.
------------------------------
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> +subroutine csr_sort_indices(Ap, Aj, Ax)
+! Sort CSR column indices inplace
+integer, intent(inout) :: Ap(:), Aj(:)
+real(dp), intent(inout) :: Ax(:)
+integer :: i, r1, r2, l, idx(size(Aj))
+do i = 1, size(Ap)-1
+ r1 = Ap(i)
+ r2 = Ap(i+1)-1
+ l = r2-r1+1
+ idx(:l) = iargsort_quicksort(Aj(r1:r2))
+ Aj(r1:r2) = Aj(r1+idx(:l)-1)
+ Ax(r1:r2) = Ax(r1+idx(:l)-1)
+end do
+end subroutine
+
+function csr_matvec(Ap, Aj, Ax, x) result(y)
I suggest to follow the API of Sparse BLAS.
------------------------------
In src/stdlib_experimental_sparse.f90
<#189 (comment)>:
> +real(dp), intent(in) :: Ax(:)
+integer, intent(in) :: i, j
+integer :: row_start, row_end, offset
+row_start = Ap(i)
+row_end = Ap(i+1)-1
+offset = lower_bound(Aj(row_start:row_end), j) + row_start - 1
+if (offset <= row_end) then
+ if (Aj(offset) == j) then
+ r = Ax(offset)
+ return
+ end if
+end if
+r = 0
+end function
+
+pure elemental subroutine swap_int(x,y)
I think this subroutine should be moved to another module of stdlib.
------------------------------
In src/tests/sparse/test_sparse.f90
<#189 (comment)>:
> +call check(all(Bj == [1, 1, 3, 3, 4, 2]))
+call check(all(abs(Bx - [1, 4, 2, 5, 1, 3]) < 1e-12_dp))
+call csr_sum_duplicates(Bp, Bj, Bx)
+nnz = Bp(size(Bp))-1
+call check(all(Bp == [1, 2, 4, 5]))
+call check(all(Bj(:nnz) == [1, 3, 4, 2]))
+call check(all(abs(Bx(:nnz) - [5, 7, 1, 3]) < 1e-12_dp))
+call check(csr_has_canonical_format(Bp, Bj(:nnz)))
+
+call coo2csr_canonical(Ai, Aj, Ax, Bp, Bj, Bx)
+call check(all(Bp == [1, 2, 4, 5]))
+call check(all(Bj == [1, 3, 4, 2]))
+call check(all(abs(Bx - [5, 7, 1, 3]) < 1e-12_dp))
+call check(csr_has_canonical_format(Bp, Bj))
+
+call check(abs(csr_getvalue(Bp, Bj, Bx, 1, 1) - 5) < tiny(1._dp))
By experience, tiny(1._dp) can be too severe.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#189 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD274T4OGJKXS6GJTVKGM3TRR4A7RANCNFSM4NC62JUA>
.
|
Thank you Salvatore for your input. I should have a closer look to psblas. For |
I am OK with integrating PSBLAS in stdlib (with proper handling of
copyright for me and my collaborators), at least the parts that make sense
today (I suspect I have a lot of stuff that should not be included just
yet). In the project there are a set of facilities marked at SERIAL
(admittedly, they should be packaged better because they are split in
two-three subdirs) which are the ones you are probably most interested in.
Note that I also use a home-grown set of sed scripts to have a crude
emulation of macros/templates to avoid rewriting things multiple times; if
that could be morphed into something robust I'd be more than happy, while
waiting for the Fortran standard committee to decide on a template facility
(I have seen more than one proposal in this direction, but it's not yet
decided AFAIK).
A few additional comments:
1. Storage formats.
I have a set of additional storage formats in psblas3-ext; they also
support CUDA, accessed through the C interoperability.
2. Coarrays.
I have a preliminary version with CoArrays (in fact, I am one of the
founders of the OpenCoArrays project which is used to provide CoArrays in
GNU). However, there is a big issue and that is constraints C825. The
effect of the constraint is that the set of entities that can have a
coarray component is fixed at compile time; in other words, you cannot have
an allocatable array of derived types that contains a(n allocatable)
coarray component.
I have reported this to the Fortran standard committee (see
https://j3-fortran.org/doc/year/18/18-280r1.txt ) and indeed my proposal
has been approved; however there is no telling when it will appear in
compilers, hence I am not going to use this in production code any time
soon.
Salvatore
…On Sun, May 17, 2020 at 9:19 AM Jeremie Vandenplas ***@***.***> wrote:
Re: int32 vs int64. You'll need both. Let me elaborate. The most common
application of sparse matrices is in the inner steps of some PDE solver,
dealing with a discretization on a mesh. Such problems can indeed reach
sizes that need indices with int64; however in those cases they are tackled
through parallel solution strategies mostly based on MPI. The
parallelization strategy is to assign a subdomain to each MPI process; each
subdomain is most likely of a size that fits comfortably in an int32,
provided that you apply a local numbering scheme, as well as a global
numbering scheme. And you occasionally need to convert the indices of the
local portion into global numbering for some processing, and back; examples
include data exchanges among processes within a transpose operation. The
solution that I have now in the development branch of PSBLAS is the
following: there are 4 integer kinds: int32==psb_mpk_ <= psb_ipk_ <=
psb_lpk_ <= psb_epk_== int64 IPK is the integer kind used for *LOCAL
indices, which are the ones for the local portion of the matrices involved
in numerical operators such as Matrix-Vector product, LPK is the integer
kind used for GLOBAL indices, which are used in pre- and post-processing.
IPK and LPK can be specified at configure time within the constraints abov,
with the default being IPK=int32 and LPK=int64. Salvatore
Thank you Salvatore for your input. I should have a closer look to psblas.
For stdlib: would it not be of interest to integrate psblas3 (if allowed)
in stdlib directly? Or through fpm? In comparison to the other
implementations, psblas3 is well tested, is associated to peer-reviewed
publications and is parallelized (also with Coarray Fortran?) if we want to
support parallelization in stdlib. If something is missing, we could
identify it and maybe collaborate with psblas to implement the missing
parts there?
This is just a question to be sure that we don't start into something that
is similar to reinventing the wheel.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#189 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD274T4E5LRQPHUNASHTN53RR6FW3ANCNFSM4NC62JUA>
.
|
Finally got to my computer. Thanks everybody for the feedback, I'll first address the high level feedback: do we want this at all, or should stdlib just use psblas3? My own view is that stdlib should not be competing with or reimplementing well established libraries, whether LAPACK, psblas3, or others. Just linking against psblas3 but not providing any additional functionality or API I think is also not worth it. As the Fortran Package Manager becomes more successful, it will be very easy to depend on any library such as psblas3 in users projects, and we should be encouraging people to do exactly that. It will be good for users, good for developers of such libraries and the whole ecosystem. Where I see the role of stdlib is to standardize API and provide a reference implementation to functionalities that are repeated from project to project with slightly different API each. There are many implementations of CSR and other functionality, see #38 for a partial list, and there are many codes that use this in some form. We should see if there is a way to figure out an API that each of these codes could use. The other aspect where I see stdlib as contributing is when people just want to quickly use some functionality in their research / new projects, or later on interactively in the Jupyter notebook for example, having a nice and well documented API that is very easy to use for new users would be beneficial. Later on, for big specialized production codes, one can always use specialized libraries. As an example, most electronic structure codes have their own custom specialized eigensolver. But I think it is still worth having a nice API to dense eigensolver via Lapack, and a sparse eigensolver on top of this sparse functionality, so that people who just want to prototype some algorithm in Fortran can do so as easily as with SciPy, Matlab or Julia. In order to move the conversation forward, we have to have both non-OO low level api as well as a high level OO API. The reason is that I know many people (myself included) who do not want to use the OO API, just simple subroutines that do the job. But I also know a lot of people who would prefer the OO API. By providing both, we can capture most of the Fortran community and make stdlib useful for almost everybody. @sfilippone at the very least I can see that we can use your sorting algorithms. I am hoping there is more that we can figure out how to reuse. Also, I just want to start with a serial implementation, we can overload the subroutines to work with both int32 and int64 etc. I am really hoping we can figure out an API that we would all agree would make sense to have in stdlib. |
@certik I totally agree with your comments.However, I wonder if this could be achieved by interfacing with the serial version of psblas, even partially. If it is not possible (e.g., because of no non-OO low level API), then I am fine with this, and I will help as good as I can. |
I don't think that stdlib should just gulp up existing libraries and expose them under different name. First, decide if sparse matrix algebra is in scope or not. I'm not the target user (although I use SMM through ESMF under the hood daily), but if we agreed that a SciPy-like functionality is in scope, then this is too. Second, if you do add it to stdlib, I think that it should provide added value over any existing library. In this case, I see that the added value could be ease of installation, and ease of use (simpler API). This resonated with me personally the most:
|
Just to be clear, I personally think that SciPy-like scope is included in the scope of Fortran stdlib. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can go forward, pending any unresolved requests for changes from other reviewers.
What is the current status of this patch? We have a couple of merge conflicts in the build files, which need a merge/rebase against the default branch. Also, there are a couple of unaddressed review comments. @certik, is there anything we can do to help you move this pull request forward? |
@certik would you like to continue working on this or should we close it for the time being? |
@milancurcic are you against this approach to sparse arrays? If not, then let's keep it open. While I don't have time to finish this myself right now, others should help. If we close it, then it will be harder for people to know about it. |
We can keep it open. What I'm asking is whether you intend to continue working on this eventually, when you have more time. I ask this because it wasn't clear to me whether you thought this PR can go forward with the changes requested. In essence, is this PR stale due to lack of time or lack of consensus? What should be tackled next if someone else were to pick it up? Would you be able to review the changes if we found a person to take over this PR? |
I reread all comments. I don't see any major disagreements. What has to be done:
I don't have the time right now, but I want to finish this, or help others finish it. This will be a nice selling point of stdlib to have these basic sparse routines in, nicely documented and very performing. |
I marked this patch as wontfix since you are not intending to continue working on it. But the implementation here is of course free to use for other contributors to continue working on this project. |
I wrote
So I disagree with the statement:
Furthermore, I think marking things as "wontfix" is not encouraging to others (or me!) to pick it up and finish it. For this reason, and given that my stated intention is to finish this, I have removed the "wontfix" label. We need to help each other finish these things and get them into stdlib to move stdlib forward. You don't want such COO functionality in there? If you do, then let's plan how to do it. |
Sorry, that was not what I meant. This patch seemed stale and open for others to pickup on, at least that was the situation I usually applied this label for my own patches. |
FWIW, the "wontfix" label sounds demotivating to me as well. Like the whole project is yelling at me "we don't want this". I don't mind a PR staying open for a long time, if that's what it takes to find people to see it through. |
Be assured that I want to move project forward. Frankly, I don't think this patch can be moved forward in its current form, even after resolving the merge conflicts. I have been looking into this patch a couple of times now, and stopped again because it has significantly diverged from the latest default branch, which makes it quite involved to work with the implementation because a lot of changes in the general structure and build files are just missing. Instead I would suggest to salvage the implementation and start fresh from the latest default branch, addressing the feedback provided already. We have done so in the past with other patches like the stringlist and were quite successful with this approach. |
@awvwgk awesome. I know that's not what you meant and that you want to move the project forward. But that is how I perceive it and I think many others too. To move forward, I think the label that you just used "help wanted" is perfect. Let's keep this PR open until a better one is up. |
In the following stdlib/src/stdlib_experimental_sparse.f90 Line 36 in 89feb91
instead of using please declare a parameter: real(dp), parameter :: zero=tiny(1.0_dp)
...
if (abs(B(i, j)) < zero ) cycle
... |
In the following subroutine stdlib/src/stdlib_experimental_sparse.f90 Lines 44 to 53 in 89feb91
you can exploit do concurrent(n = 1:size(Ai))
B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n)
end do |
The key of this is the API, as shown with examples in test_sparse.f90.
In particular, I am proposing the following functions / API:
Conversion Dense <-> COO:
Conversion COO -> CSR:
CSR functionality:
Dense functionality:
In these algorithms, it seems it is possible to just have one (best)
implementation with one exception: sorting of indices (as implemented by
csr_sort_indices
), where different algorithms give differentperformance and it depends on the actual matrix. As such, it might make
sense to provide a default in
csr_sort_indices
that is called bycoo2csr_canonical
that is as fast as we can make it for most cases (currently ituses quicksort, but there are other faster algoritms for our use case
that we should switch over) and then provide other implementations such
as
csr_sort_indices_mergesort
,csr_sort_indices_timsort
, etc. forusers so that they can choose the algorithm for sorting indices, or even
provide their own.
The file stdlib_experimental_sparse.f90 provides an example
implementation of these algorithms that was inspired by SciPy. We can
also discuss the details of that, but the key that I would like to
discuss is the API itself.
See #38.