WIP: Add a non-forking compiler option #614

connorjward · 2021-04-29T09:40:18Z

This PR:

Refactors compilation.py
Adds a new non-forking compiler class so we can compile on systems where the MPI does not allow forking a subprocess

I've fiddled with the environment variables a bit too. For example it now uses PYOP2_CC to determine the compiler instead of CC and I've moved PYOP2_CFLAGS and PYOP2_LDFLAGS out of configuration.py because they made little sense there.

The reasoning is that things like CFLAGS have no obvious default so make little sense here.

wence- · 2021-04-29T09:53:44Z

Cool. There are a bunch of other places where we (possibly without knowing) fork subprocesses. e.g the _version.py versioneer stuff. I wonder if we should expunge all of that too?

connorjward · 2021-04-29T09:54:28Z

I've done a lot of reading about different ways that we could compile the code without forking and I've found 3 approaches that might work:

cppyy is a C++ interpreter built on top of Cling which is well-supported project. I tried implementing this before but ran into the issue that C++ is less permissive about passing void pointers as arguments which is a problem because we use ctypes.c_voidp all over the place.
DragonFFI is a Clang-based JIT. I haven't figured out how to pass in all of the linker arguments yet but this is a really promising project. The main issue is that it is developed by a single person although he seems to be fairly active at maintaining it.
TinyCCompiler/TCC has a library libtcc that allows you to JIT code. The performance would likely not be great but it would still be better than nothing.

Before going any further I have two questions:

Is it essential that we use the MPI wrapper for compilation? If so then I'll need to figure out a way to find the flags that the wrapper compiler adds.
Is ABI compatibility a concern (i.e. if PETSc is installed with GCC then would a Clang JIT even work)? If so then DragonFFI is the only valid approach and we would have to enforce that the entire toolchain is compiled with Clang.

wence- · 2021-04-29T09:59:50Z

Is it essential that we use the MPI wrapper for compilation? If so then I'll need to figure out a way to find the flags that the wrapper compiler adds.

Wrapper code that calls PETSc needs to link against MPI (and find the MPI headers) so I think yes. I think firedrake can grab them out of the petscvariables configuration like it does for Eigen include paths (see firedrake/slate/slac/compiler.py)

Is ABI compatibility a concern (i.e. if PETSc is installed with GCC then would a Clang JIT even work)? If so then DragonFFI is the only valid approach and we would have to enforce that the entire toolchain is compiled with Clang.

I think that GCC and Clang have the same C ABI, but maybe not.

As to the C++ void strictness, I would have assumed (but maybe I am wrong) that with these approaches we no longer use the ctypes interface to call compiled code?

wlav · 2021-04-29T21:45:23Z

Clang JIT and GCC are mostly compatible, assuming you hand them the same compiler flags (esp. the math options, so most definitely your point 1. is something that you will want to look into) and same standard header files. There are corner cases, e.g. thread local storage and typeinfo come to mind, which will fail or are major trouble when JIT-ed; and the two also have different default run-times for OpenMP for example.

OTOH, cppyy munches ctypes.c_voidp just fine. Try this:

import cppyy, ctypes
  
cppyy.cppdef("""\
   void f(void* p) { std::cerr << p << std::endl; }
""")

p = ctypes.c_voidp(0x1234)

cppyy.gbl.f(p)

Not b/c it's "permissive", but b/c ctypes.c_voidp is explicitly recognized internally.

The larger problem with it and MPI though, is that many "constants" in MPI are in fact preprocessor macro's and thus not available automatically through cppyy in Python-land (they're still fine to use in JIT-ed code, of course).

Is the use case for compiling programs locally when running under MPI public information? In a different context, we're advocating for more JIT-ing (and hence the need for better support) in HPC. It'd be useful for us to add another use case to the growing list.

connorjward · 2021-04-30T11:07:44Z

@wlav thank you for joining the discussion! We definitely appreciate your expertise.

Cool. There are a bunch of other places where we (possibly without knowing) fork subprocesses. e.g the _version.py versioneer stuff. I wonder if we should expunge all of that too?

That sounds sensible. What would the versioning stuff be replaced with?

Wrapper code that calls PETSc needs to link against MPI (and find the MPI headers) so I think yes. I think firedrake can grab them out of the petscvariables configuration like it does for Eigen include paths (see firedrake/slate/slac/compiler.py)

Yep this will work.

As to the C++ void strictness, I would have assumed (but maybe I am wrong) that with these approaches we no longer use the ctypes interface to call compiled code?

We definitely still use ctypes. The .so is loaded with ctypes.CDLL and we set the function argtypes to usually be ctypes.c_voidp (see here and here).

@wlav the issue is that we want to cast these void * to, say, double *. C is perfectly happy to do an implicit cast but C++ complains. An example of the sort of function we want to be calling is:

void wrap_expression_kernel(int32_t const start, int32_t const end, double *__restrict__ dat1, double const *__restrict__ dat0, int32_t const *__restrict__ map0)

I think getting this to work is just a case of being a bit less lazy about how we track the argument types.

Is the use case for compiling programs locally when running under MPI public information? In a different context, we're advocating for more JIT-ing (and hence the need for better support) in HPC. It'd be useful for us to add another use case to the growing list.

PyOP2 is used by Firedrake so I would say that yes the use case is public information 👍.

Yesterday I spent some time playing with DragonFFI and TinyCC and I've come to the conclusion that neither is really feature complete. I couldn't figure out how to pass pointers into a DragonFFI function and TinyCC doesn't support complex numbers. cppyy might well be the way to go.

wlav · 2021-04-30T18:41:50Z

I have this feeling that it may help here to think of the bindings and the JIT-ing separately. Both to preserve the current ctypes code, but it also looks like all you really need is to pass a code string and some options to the Clang JIT, so your requirements on a binding to it are pretty trivial. Thus, if you keep the two concerns separate now, you can replace the current choice of JIT access later, allowing, for example, to roll your own to directly control the selection of optimization passes you want to use.

Below is an example of what I mean: it uses cppyy for JIT access (but obviously any of the options will do), but not for the bindings. Rather, just grab the function pointer, hand it to ctypes and then let ctypes think the argument types are all void*, allowing the implicit conversion behavior you want.

import cppyy
import cppyy.ll
import ctypes

cppyy.cppdef("""\
void wrap_expression_kernel(int32_t const start, int32_t const end, double *__restrict__ dat1, double const *__restrict__ dat0, int32_t const *__restrict__ map0) {
    std::cerr << start << " " << end << " " << dat1 << " " << dat0 << " " << map0 << std::endl;
}""")

ftype = ctypes.CFUNCTYPE(None, ctypes.c_int, ctypes.c_int, ctypes.c_voidp, ctypes.c_voidp, ctypes.c_voidp)
f = ftype(cppyy.ll.cast['intptr_t'](cppyy.gbl.wrap_expression_kernel))

p = ctypes.c_voidp(0x1234)
f(0, 32, p, p, p)

wence- · 2021-04-30T19:18:57Z

We're kind of willing to invest some effort to do the right thing. For example, I know that cffi has lower cross-calling overheads than ctypes (which in the limit case is not the biggest deal, but every little helps), what's the "right" way to call stuff via cpppy in that sense?

wlav · 2021-05-02T01:49:43Z

The "right" way depends on use. Assuming for example that all types are known and correct, it's pretty straightforward:

import cppyy
import numpy as np

cppyy.cppdef("""\
void wrap_expression_kernel(int32_t const start, int32_t const end, double *__restrict__ dat1, double const *__restrict__ dat0, int32_t const *__restrict__ map0) {
    std::cerr << start << " " << end << " " << dat1 << " " << dat0 << " " << map0 << std::endl;
}""")

dat1 = np.array(range(32), dtype=np.float64)
dat0 = np.array(range(32), dtype=np.float64)
map0 = np.array(range(32), dtype=np.int32)
cppyy.gbl.wrap_expression_kernel(0, 32, dat1, dat0, map0)

If the buffers don't come from python, but from C++, cppyy will create LowLevelView objects of the right type. The best thing to do is to annotate the size (if not know already) and deal with ownership (e.g. by placing a reference on the array from the client code) immediately upon their entry into Python, unless of course these objects are never used other than for passing around the pointer. The LowLevelViews can be handed to numpy to create (zero-copy) views that act as fully functional arrays.

Run-time CPU overhead of cppyy is close to cffi (C++ has some complications, such as overloads, that one doesn't have to deal with in C), but memory overhead is higher b/c of the presence of Clang/LLVM for the JIT (b/c of C++ being a much larger language).

Maybe this notebook from Matti is of value: https://github.com/mattip/c_from_python/blob/master/c_from_python.ipynb (the actual presentation is on youtube).

connorjward added 4 commits April 29, 2021 00:51

Remove confusing options from configuration

a35b1ae

The reasoning is that things like CFLAGS have no obvious default so make little sense here.

WIP: Refactor compilers

99cd50d

WIP: Refactor compilers - more compact

5635b84

Always use mpicc and begin writing non-forking cc

3d0215b

connorjward marked this pull request as draft April 29, 2021 09:54

connorjward added 4 commits May 6, 2021 12:07

WIP: Add cppyy and cffi as FFI backends

9764430

WIP: test_dats now passing

71f76c5

Fix bug where numpy arrays are not hashable

2f1c602

Correctly cast Mats

7e980a2

connorjward mentioned this pull request Oct 23, 2024

Expunge PYOP2_NO_FORK_AVAILABLE firedrakeproject/firedrake#3820

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add a non-forking compiler option #614

WIP: Add a non-forking compiler option #614

connorjward commented Apr 29, 2021

wence- commented Apr 29, 2021

connorjward commented Apr 29, 2021 •

edited

Loading

wence- commented Apr 29, 2021

wlav commented Apr 29, 2021

connorjward commented Apr 30, 2021

wlav commented Apr 30, 2021

wence- commented Apr 30, 2021

wlav commented May 2, 2021

WIP: Add a non-forking compiler option #614

Are you sure you want to change the base?

WIP: Add a non-forking compiler option #614

Conversation

connorjward commented Apr 29, 2021

wence- commented Apr 29, 2021

connorjward commented Apr 29, 2021 • edited Loading

wence- commented Apr 29, 2021

wlav commented Apr 29, 2021

connorjward commented Apr 30, 2021

wlav commented Apr 30, 2021

wence- commented Apr 30, 2021

wlav commented May 2, 2021

connorjward commented Apr 29, 2021 •

edited

Loading