Implement SplitPytatoArrayContext that attempts a trivial parallelization strategy #216

kaushikcfd · 2023-01-17T14:26:13Z

CI should pass once Fix codegen for temporary variables declaration loopy#738 is in.
⚠️ Concurrency across reductions is not targeted i.e. results in poor performance for computing error norms, etc. (Maybe we should fix this before merging?)
- In the meshmode version, we desperately froze such reductions eagerly
- Update: Added a transformation specifically for "reduce-to-scalar" operations to parallelize the reduction operation using loopy transformations.
We need a better name than SplitPytatoArrayContext.

kaushikcfd · 2023-01-19T02:53:38Z

Making this a draft until the CI is fixed.

inducer

Thanks for working on this. LGTM other than these (fairly minor) issues.

inducer · 2023-05-18T15:39:10Z

test/test_arraycontext.py

@@ -714,7 +716,7 @@ def test_array_equal(actx_factory):
 def test_array_context_einsum_array_manipulation(actx_factory, spec):
    actx = actx_factory()

-    mat = actx.from_numpy(np.random.randn(10, 10))
+    mat = actx.from_numpy(np.random.randn(16, 16))


Is there a specific reason why these have to be a nice power of two now?

inducer · 2023-05-18T15:56:05Z

arraycontext/impl/pytato/split_actx/__init__.py

+    import pytato
+
+
+class SplitPytatoPyOpenCLArrayContext(PytatoPyOpenCLArrayContext):


I'm not sure why this thing is named "split". Maybe "generic parallelizing"/"basic parallelizing"?

inducer · 2023-05-18T15:57:35Z

arraycontext/impl/pytato/split_actx/utils.py

+Copyright (C) 2023 Andreas Kloeckner
+Copyright (C) 2022 Matthias Diener
+Copyright (C) 2022 Matt Smith


All these (other than @kaushikcfd) should be U of I BOT. None of us hold copyright to our work.

inducer · 2023-05-18T15:59:25Z

arraycontext/impl/pytato/split_actx/__init__.py

@@ -0,0 +1,143 @@
+"""


Add to docs somewhere?

inducer · 2023-05-18T15:59:59Z

arraycontext/impl/pytato/split_actx/utils.py

+        We deliberately avoid using :class:`pytato.transform.CombineMapper` since
+        the mapper's caching structure would still lead to recomputing
+        the union of sets for the results of a revisited node.


Not sure I understand this comment. Could you explain? And: shouldn't we fix this in CombineMapper (if we can)? (or at least file an issue in pytato?) (also above)

inducer · 2023-05-18T17:07:44Z

arraycontext/impl/pytato/split_actx/utils.py

+
+def _split_reduce_to_scalar_across_work_items(
+    kernel: lp.LoopKernel,
+    callables: Mapping[str, InKernelCallable],


A bit weird that callables is needed here/by precompute_for_single_kernel.

inducer · 2023-05-18T17:08:32Z

arraycontext/impl/pytato/split_actx/utils.py

+    device: "pyopencl.Device",
+) -> lp.LoopKernel:
+
+    assert len({kernel.id_to_insn[insn_id].reduction_inames()


Could elevate reduction inames to a property of the _LoopNest?

inducer · 2023-05-18T17:09:39Z

arraycontext/impl/pytato/split_actx/utils.py

+
+    # collect loop nests of instructions that assign to scalars in the array
+    # program.
+    insn_id_to_loop_nest: Mapping[str, _LoopNest] = {


If insn_id_to_loop_nest isn't used, why not go straight to all_loop_nests?

inducer · 2023-05-18T17:18:36Z

arraycontext/impl/pytato/split_actx/utils.py

+        This routine **assumes** that the entrypoint in *t_unit* global
+        barriers inserted as per :func:`_get_call_kernel_insn_ids`.


Grammar? I'm not sure I understand.

inducer · 2023-05-18T18:23:56Z

arraycontext/impl/pytato/split_actx/utils.py

+    # a mapping from shape to the available base storages from temp variables
+    # that were dead.
+    shape_to_available_base_storage: Dict[int, Set[str]] = defaultdict(set)


Suggested change

# a mapping from shape to the available base storages from temp variables

# that were dead.

shape_to_available_base_storage: Dict[int, Set[str]] = defaultdict(set)

# a mapping from size in bytes to the available base storages from temp variables

# that were dead.

nbytes_to_available_base_storage: Dict[int, Set[str]] = defaultdict(set)

Otherwise Intel OpenCL gets its integer arithmetic wrong.

kaushikcfd force-pushed the split_pytato_actx branch 5 times, most recently from 615e2b5 to 5249a38 Compare January 17, 2023 20:56

kaushikcfd marked this pull request as draft January 19, 2023 02:53

kaushikcfd force-pushed the split_pytato_actx branch 3 times, most recently from adf8d2b to b736f3a Compare January 19, 2023 03:25

kaushikcfd mentioned this pull request Jan 22, 2023

Implement Batched Einsum ArrayContext #217

Draft

1 task

kaushikcfd force-pushed the split_pytato_actx branch 3 times, most recently from 11583d7 to fe7517e Compare January 24, 2023 04:59

kaushikcfd mentioned this pull request Jan 24, 2023

[Lazy evaluation] Pytato Array Context with transformations inducer/meshmode#248

Closed

7 tasks

kaushikcfd force-pushed the split_pytato_actx branch 3 times, most recently from be0878b to ac39271 Compare January 28, 2023 20:46

kaushikcfd marked this pull request as ready for review January 28, 2023 21:09

kaushikcfd requested a review from inducer January 28, 2023 21:09

kaushikcfd force-pushed the split_pytato_actx branch 2 times, most recently from 41ecb8d to bb03c86 Compare January 30, 2023 21:43

inducer force-pushed the split_pytato_actx branch from bb03c86 to b8c991a Compare May 18, 2023 15:29

inducer reviewed May 18, 2023

View reviewed changes

kaushikcfd added 3 commits June 22, 2023 09:31

Implement PytatoSplitArrayContext

1fa2a35

Add SplitPytatoArrayContext to the test suite

98f7f1a

change dimension values

b4488ba

Otherwise Intel OpenCL gets its integer arithmetic wrong.

kaushikcfd force-pushed the split_pytato_actx branch from b8c991a to b4488ba Compare June 22, 2023 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SplitPytatoArrayContext that attempts a trivial parallelization strategy #216

Implement SplitPytatoArrayContext that attempts a trivial parallelization strategy #216

kaushikcfd commented Jan 17, 2023 •

edited

Loading

kaushikcfd commented Jan 19, 2023

inducer left a comment

inducer May 18, 2023

inducer May 18, 2023

inducer May 18, 2023

inducer May 18, 2023

inducer May 18, 2023

inducer May 18, 2023

inducer May 18, 2023

inducer May 18, 2023

inducer May 18, 2023

inducer May 18, 2023

		import pytato


		class SplitPytatoPyOpenCLArrayContext(PytatoPyOpenCLArrayContext):

		This routine assumes that the entrypoint in t_unit global
		barriers inserted as per :func:`_get_call_kernel_insn_ids`.

Implement SplitPytatoArrayContext that attempts a trivial parallelization strategy #216

Are you sure you want to change the base?

Implement SplitPytatoArrayContext that attempts a trivial parallelization strategy #216

Conversation

kaushikcfd commented Jan 17, 2023 • edited Loading

kaushikcfd commented Jan 19, 2023

inducer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaushikcfd commented Jan 17, 2023 •

edited

Loading