Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements CVectorExtensionsTarget #557

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

kaushikcfd
Copy link
Collaborator

@kaushikcfd kaushikcfd commented Mar 2, 2022

/cc @sv2518

Adds support for GNU vector extensions.

TODO:

Comment on lines 702 to 771
knl = lp.make_kernel(
"{[i, j1, j2, j3]: 0<=i<10 and 0<=j1,j2,j3<4}",
"""
<> temp1[j1] = x[i, j1]
<> temp2[j2] = 2*temp1[j2] + 1 {inames=i:j2}
y[i, j3] = temp2[j3]
""",
[lp.GlobalArg("x, y", shape=lp.auto, dtype=float)],
seq_dependencies=True,
target=lp.CVectorExtensionsTarget(),
lang_version=(2018, 2))

knl = lp.tag_inames(knl, {"j1": lp.VectorizeTag(lp.OpenMPSIMDTag()),
"j2": lp.VectorizeTag(lp.OpenMPSIMDTag()),
"j3": lp.VectorizeTag(lp.OpenMPSIMDTag())})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inducer: Any big red signals with the user-facing interface of specifying fallbacks?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think OpenMPSIMDTag shoudl be renamed to something more generic, and then this fallback could be automatic.

Copy link
Collaborator Author

@kaushikcfd kaushikcfd Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this has been restructured.

@kaushikcfd kaushikcfd marked this pull request as ready for review March 3, 2022 20:38
@kaushikcfd
Copy link
Collaborator Author

This is ready for a look, for a better reviewing experience please see the patch on a commit-by-commit basis.

@kaushikcfd kaushikcfd force-pushed the c_vecextensions_target branch 3 times, most recently from ad2372f to 5b0f9c2 Compare March 4, 2022 19:49
@sv2518
Copy link
Contributor

sv2518 commented Mar 8, 2022

There are two things which we definitely still need before we are able to fully automate this into the Firedrake/PyOP2 code.

  • The first one is that conditionals cannot be vectorised yet. There is an error in Firedrake with ``pymbolic.mapper.UnsupportedExpressionError: <class 'loopy.expression.VectorizabilityChecker'> cannot handle expressions of type <class 'pymbolic.primitives.If'> An example of a test where we run into that is this one: tests/extrusion/test_mixed_periodic.py::test_mixed_periodic[interval]

--> Fix in PR

  • That math functions are not vectorised is also still missing. We run into an error:
passing '__attribute__((__vector_size__(4 * sizeof(double)))) double' (vector of 4 'double' values) to parameter of incompatible type 'double'
   t0[expr_p0] = t0[expr_p0] + 6.283185307179586 * cos(expr_t1); 

An example of a test where we run into that is tests/slate/test_slate_infrastructure.py::test_arguments[dg1-mesh0]

--> Fix in PR

  • Two other I believe faster to fix issues are one that is related to complex types on AVX512. We have an error that looks like the following so I think there is some support missing for complex128. Maybe we should not vectorised for complex in Firedrake?
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/loopy/target/c_vector_extensions.py", line 107, in vector_dtype 
 vec.types[base.numpy_dtype, count], 
KeyError: (dtype('complex128'), 8)

--> This was a bug on our side. Fixed in PyOP2

  • The other one I don’t quite understand but it is a gcc compiler error
error: use of undeclared identifier 'iel_batch'
   t0[0] = t0[0] + dat0[4 * iel_outer + iel_batch + start] * dat1[4 * iel_outer + iel_batch + start]; I had a look at the C code and I think an iname has been dropped
/* bulk slab for 'iel_outer' */
 for (int32_t iel_outer = 1; iel_outer <= -2 + -1 * start + (3 + end + 3 * start) / 4; ++iel_outer)
 {
  {
   int32_t const i4 = 0;

   #pragma omp simd
   for (int32_t iel_batch = 0; iel_batch <= 3; ++iel_batch)
    (t0[0])[iel_batch] = 0.0;
  }
  /* no-op (insn=inne__start) */
  {
   int32_t const inne_i = 0;

   t0[0] = t0[0] + dat0[4 * iel_outer + iel_batch + start] * dat1[4 * iel_outer + iel_batch + start];
  }
...

It looks like the iname for iel_batch was dropped

@sv2518
Copy link
Contributor

sv2518 commented Mar 9, 2022

So I had a look at the code. I think I can fix the first issue by adding a map_if to VectorizabilityChecker that throws a UnvectorizableError(). Does that sound about right?
For the second issue I think we need to check in map_variable if the variable is one of these

func_names = set(["abs_*", "fabs_*", "cos_*", "sin_*", "exp_*", "pow_*",
                       "sqrt_*", "fmax_*", "fmin_*", "atan2_*", "log_*",
                       "tanh_*"])

(and potentially other supported math functions) and throw a UnvectorizableError() if it is.
If you agree with that I can write the code to address both.

@kaushikcfd
Copy link
Collaborator Author

It looks like the iname for iel_batch was dropped

This looks like a bug in loopy's vectorization implementation, with reductions in them.

(and potentially other supported math functions) and throw a UnvectorizableError() if it is.

I think falling back to omp-simd might make more sense. Looks like that's already being done here:

def map_call(self, expr):
# FIXME: Should implement better vectorization check for function calls
rec_pars = [
self.rec(child) for child in expr.parameters]
if any(rec_pars):
raise UnvectorizableError("fucntion calls cannot yet be vectorized")

@kaushikcfd kaushikcfd force-pushed the c_vecextensions_target branch 3 times, most recently from 56ab5dc to 6ffe97a Compare April 1, 2022 05:41
@kaushikcfd kaushikcfd marked this pull request as draft April 1, 2022 06:20
@kaushikcfd kaushikcfd force-pushed the c_vecextensions_target branch 4 times, most recently from 42390e3 to fd4ae30 Compare May 7, 2022 16:35
@kaushikcfd kaushikcfd force-pushed the c_vecextensions_target branch 3 times, most recently from 688aa94 to 4c0c013 Compare May 11, 2022 22:53
@kaushikcfd kaushikcfd marked this pull request as ready for review May 12, 2022 17:09
@kaushikcfd kaushikcfd requested a review from inducer May 13, 2022 16:16
Copy link
Owner

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts from a quick scroll.

assert isinstance(inner, CodeGenerationResult)
if isinstance(inner.current_ast(novec_self),
astb.ast_comment_class):
# loop body is a comment => do not emit the loop
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is puzzling. Could you explain what leads to this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A noop instruction is emitted as a comment.

loopy/target/cuda.py Show resolved Hide resolved
loopy/target/c/__init__.py Show resolved Hide resolved
Comment on lines 128 to 93
elif filter_iname_tags_by_type(tags, OpenMPSIMDTag):
func = generate_openmp_simd_loop
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, I think this is very weird, as OpenMP is clearly target-specific. But the concept of a loop with no dependencies between iterations is universal. So maybe that's what the tag should reflect?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, agreed. Bleeding OpenMP-specific things into iname tags was an abstraction failure. Restructured to specify the fallback via target attributes.

loopy/check.py Outdated
Comment on lines 496 to 616
# do not check for vec-inames as their implementation is accompanied
# with a fallback machinery
par_inames = {iname for iname in dom_inames
if (kernel.iname_tags_of_type(iname, ConcurrentTag)
and not kernel.iname_tags_of_type(iname, VectorizeTag))}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain that the fallback is "don't vectorize".

loopy/check.py Outdated Show resolved Hide resolved
Comment on lines 702 to 771
knl = lp.make_kernel(
"{[i, j1, j2, j3]: 0<=i<10 and 0<=j1,j2,j3<4}",
"""
<> temp1[j1] = x[i, j1]
<> temp2[j2] = 2*temp1[j2] + 1 {inames=i:j2}
y[i, j3] = temp2[j3]
""",
[lp.GlobalArg("x, y", shape=lp.auto, dtype=float)],
seq_dependencies=True,
target=lp.CVectorExtensionsTarget(),
lang_version=(2018, 2))

knl = lp.tag_inames(knl, {"j1": lp.VectorizeTag(lp.OpenMPSIMDTag()),
"j2": lp.VectorizeTag(lp.OpenMPSIMDTag()),
"j3": lp.VectorizeTag(lp.OpenMPSIMDTag())})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think OpenMPSIMDTag shoudl be renamed to something more generic, and then this fallback could be automatic.

@sv2518
Copy link
Contributor

sv2518 commented May 23, 2022

From the Firedrake side this looks like its good to go. Thanks for all your work on it Kaushik!
We will update our fork as soon as this is merged on your main and then merge the corresponding PyOP2 PR.

@kaushikcfd kaushikcfd force-pushed the c_vecextensions_target branch 6 times, most recently from 3bdd997 to 57e2440 Compare July 6, 2022 00:47
@kaushikcfd kaushikcfd requested a review from inducer July 6, 2022 00:48
@sv2518
Copy link
Contributor

sv2518 commented Jul 13, 2022

Some tests in the actions for the automatic vectorisation of Firedrake are currently failing due to the issue I reported in #648

@sv2518
Copy link
Contributor

sv2518 commented Oct 25, 2022

Hi! I have tested the updated branch together with the updated version of Firedrake and PyOP2 and both CIs are passing. The PRs in both components were already approved and I am not around for long anymore. It would be awesome if you could merge this into Loo.py, so that we can merge it on our side and make vectorisation available to all of our users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants