Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified user interface for tensorOps functions #227

Open
rrsettgast opened this issue Mar 22, 2021 · 8 comments
Open

Simplified user interface for tensorOps functions #227

rrsettgast opened this issue Mar 22, 2021 · 8 comments

Comments

@rrsettgast
Copy link
Member

rrsettgast commented Mar 22, 2021

The tensor ops functions typically require a single/combination of integer arguments to set the bounds for the function. Take for example:

template< std::ptrdiff_t ISIZE, typename VECTOR >
LVARRAY_HOST_DEVICE CONSTEXPR_WITHOUT_BOUNDS_CHECK inline
auto l2NormSquared( VECTOR const & vector )
{
  static_assert( ISIZE > 0, "ISIZE must be greater than zero." );
  internal::checkSizes< ISIZE >( vector );

  auto norm = vector[ 0 ] * vector[ 0 ];
  for( std::ptrdiff_t i = 1; i < ISIZE; ++i )
  {
    norm = norm + vector[ i ] * vector[ i ];
  }
  return norm;
}

The call looks something like:

l2NormSquared<3>(array)

where the N(3) is always required, and the type VECTOR is deduced. This is a little bit clunky imo. We could do something where we deduced both a size and type. Like so:

#include <tuple>
#include <iostream>
#include <type_traits>

template< typename _T > 
  struct HasMemberFunction_size_impl 
  { 
private: 
    template< typename CLASS > static constexpr auto test( int )->decltype( std::is_convertible< decltype( std::declval< CLASS >().size(  ) ), int >::value, int() ) 
    { return CLASS::size(); } 
    template< typename CLASS > static constexpr auto test( ... )->int 
    { return sizeof(CLASS)/sizeof(std::declval<CLASS>()[0]);; } 
public: 
    static constexpr int value = test< _T >( 0 );
  }; 
template< typename CLASS > 
static constexpr int HasMemberFunction_size = HasMemberFunction_size_impl< CLASS >::value;




template< int N >
struct Tensor
{
    static constexpr int size() 
    { return N; }

    double & operator[]( int const i )
    {
        return data[i];
    }

    double const & operator[]( int const i ) const
    {
        return data[i];
    }

    double data[N];
};



template< typename T, int N>
double func_impl( T const & array )
{
    double rvalue = array[0]*array[0];
    for( int i=1 ; i<N ; ++i )
    {
        rvalue += array[i]*array[i];
    }
    return rvalue;
}

template< typename T, int N = HasMemberFunction_size<T> >
typename std::enable_if<!std::is_pointer<T>::value, double>::type func( T const & array )
{
   std::cout<<"Calling reference version"<<std::endl;
   return func_impl<T,N>(array);
}

template< int N, typename T >
typename std::enable_if<std::is_pointer<T>::value, double>::type func( T const array )
{
   std::cout<<"Calling pointer version"<<std::endl;
   return func_impl<T,N>(array);
}



////////////////////////////////////////////

int main()
{
    Tensor<3> t;

    t[0] = 1;
    t[1] = 2;
    t[2] = 3;


    double a[3] = {1,2,3};
    double * const pa = a;


    std::cout<<func(t)<<std::endl;
    std::cout<<func(a)<<std::endl;
    std::cout<<func<3>(pa)<<std::endl;

}

Here it is in a compiler explorer:
https://godbolt.org/z/K1P4T9qdE

If we were to standardize the way we return compile time sizes, this works fairly well. With a raw pointer you would still require the specification of N. With multi-dimensions this would be more complicated...but I don't think it is too prohibitive, and the usability is certainly nicer.

@corbett5 @klevzoff Thoughts?

@corbett5
Copy link
Collaborator

corbett5 commented Mar 22, 2021

I think this is cleaner, it doesn't require detection and only uses one extra function instead of two.
https://godbolt.org/z/3xzMnhYc7

I'm not convinced though. If there was a way to do it without requiring the extra function to reorder N and T when deducing N I'd be into it. But there are a lot of functions in tensorOps and wrapping all of them would be a pain. Plus you'd still have to explicitly specify the sizes anytime one of the first few arguments is an Array class.

IE for matrix multiplication

template< int ISIZE=Size0< DST_MATRIX >, int JSIZE=Size1< DST_MATRIX >, int KSIZE=Size2< MATRIX_A >, ... >
void Rij_eq_AikBkj( DST_MATRIX && dstMatrix, MATRIX_A const & matrixA, MATRIX_B const & matrixB );

you only get anything if dstMatrix is a c-array and so is matrixA.

@rrsettgast
Copy link
Member Author

@corbett5 Odd...I couldn't get the overload to work when I tried it....but now it works fine. Must be my cognitive decline coming into play.

I am looking at this from a usage perspective. It is nice to not have to specify the sizes if they can be deduced. In the case of an ArraySlice/ArrayView/C-array/R1Tensor, they can be deduced. The cost is that we have to:

  1. provide inspection capabilities for the types we want to deduce
  2. have to maintain a function for the non-deduced types to call the "impl/deduced" function.
  3. It is more complicated to do this for multidimensional objects. However it is straight forward as long as the Size0, Size1, ... inspection capabilities exist.
  4. Mixing types that are deduced and non-deduced probably won't work.

Is this a fair set of statements?

@corbett5
Copy link
Collaborator

ArraySlice/ArrayView cannot be deduced at compile time.

@rrsettgast
Copy link
Member Author

Duh. I guess we would have to wait until we have compile-time dims added.

@rrsettgast
Copy link
Member Author

Did you ever put together a proposal for compile time dims, or are we waiting for RAJA to do it first?

@corbett5
Copy link
Collaborator

No, I could do it but it would take a fair bit of time. You could do it in steps though, the first of which would be to get rid of the strides.

@corbett5
Copy link
Collaborator

If you keep the strides you get the advantage of deducing and not-having to resize and it would be much quicker. But the indexing math would be unchanged.

@rrsettgast
Copy link
Member Author

I still think it is worth doing and just adding to the deducable types when they become available. I just really dislike having to specify the dims for something that is fixed dim at compile time and can be deduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants