feat(gpu)!: GPU/OptiX support of ReParameter (#1686)

BREAKING CHANGE: to RendererServices ABI (including for CPU) and to the renderer-side setup when using OptiX. This overhauls the implementation of how interactively-editable parameters work, where they live in memory, and get it all working on GPU/OptiX so that renderers can support interactive adjustment of those params without recompiling shaders. The basic gist is as follows: * We continue work to finish making a clean separation between "interpolated" parameters and "interactive" (editable) parameters. * Interpolated params are collected and put into a separate memory area -- a separate per-group allocation on both the CPU and GPU (where applicable). It needs to remember the offset into this arena where each of the interpolated parameters resides. These allocations and eventual release are taken care of by the OSL shading system, they live in the ShaderGroup. When the group is set up, this block of memory is initialized with the correct initial values of the params and are ready to go. * The implementation of ReParameter writes to this special memory area also, that's how it works now (both CPU and GPU). * How does the OSL library know how to allocate, free, and copy to the device memory? It doesn't! Instead, we add new RendererServices methods `device_alloc()`, `device_free()`, and `copy_to_device()`. It's up to the renderer to provide those, so that the OSL library doesn't itself need to know about the Cuda runtime. These are trivial, there's really only one implementation that makes sense, and you can copy it from the ones in testshade and testrender. * Interactive parameters are NOT constant folded during runtime optimization. * The shader entry points themselves now take an extra parameter in the main call -- this will be the pointer to the beginning of the shader group's interactive parameter arena. * When JITing, references to interactive parameters know to retrieve them from their designated offset into the interactive parameter area. * This means that the renderer-side OptiX/Cuda code is responsible for adding this extra pointer parameter to the call to the shader entry points. You can see how this is done in the testshade and testrender Cuda code. * It's up to the renderer to figure out how to make the OptiX hit program aware of the interactive parameter pointer for that particular material, in order to pass it to the osl shader entry point. The way I did it in testshade and testrender is using a field in the struct that's given to each entry of the shader binding table and can be retrieved on the OptiX side via optixGetSbtDataPointer(). In testshade/testrender, a data pointer already existed which wasn't used. In a real renderer, you may need to add a field or come up with whatever other way you want to somehow get this pointer, which can be retrieved via shadingsys->getattribute(shadergroupptr, "device_interactive_params", TypeDesc::PTR, &myptr); you can see how I do that in optixraytracer.cpp (testrender) and in optixgridrender.cpp (testshade). A number of other things you will see that's worth calling out: I added a device_ptr utility template that is just a wrapper around a device side pointer that makes it hard to accidentally dereference it on the host side. Since I was changing RendererServices anyway, I also remove unused register_global, fetch_global, global_map which were unused. They were leftovers from the way we handled strings in OptiX 6.x. Encapsulate cuda global symbol name mangling into BackendLLVM::global_unique_symname(). I did this early on, turns out it wasn't necessary, but I still like that encapsulation, so I'm keeping it. I bumped the 3rd set of digits in the version to reflect that the changes in RendererServices break ABI. This is only in main, it obviously cannot be backported to a release branch. All tests pass for scalar and batch and optix. I added a new simple reparam test, and renamed the old reparem to reparam-array. Oddly, the reparam-array test doesn't work properly on optix (it had never been tried before), but it also failed in optix at main -- so it's not related to this patch! Putting that on my list of other oddities to investigate later. It may just be a quirk of testshade, I'm not really sure yet. Added to BackendLLVM (and batched) a llvm_ptr_type(TypeSpec) method that returns the LLVM type of a pointer to the specified type. Note: This patch doesn't account for the face that a parameter marked "interactive" could prevent a shader from correctly building for the GPU because it used the kind of construct that is fine in shader source code but only will work on GPU if it can be resolved to be a constant by the time we get done with the runtime optimization (as pointed out by Stephen Friedman. We'll come back to the problem later with a more robust and automatic fix -- and if we are lucky, Stephen will have the opportunity to upstream the approach he already has. Signed-off-by: Larry Gritz <[email protected]>
AcademySoftwareFoundation · May 23, 2023 · 71a9310 · 71a9310
1 parent 5c49a7c
commit 71a9310
Show file tree

Hide file tree

Showing 49 changed files with 807 additions and 324 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -4,7 +4,7 @@
 
 cmake_minimum_required (VERSION 3.12)
 
-set (OSL_VERSION "1.13.3.0")
+set (OSL_VERSION "1.13.4.0")
 set (OSL_VERSION_OVERRIDE "" CACHE STRING
      "Version override (use with caution)!")
 mark_as_advanced (OSL_VERSION_OVERRIDE)

diff --git a/src/cmake/testing.cmake b/src/cmake/testing.cmake
@@ -311,7 +311,8 @@ macro (osl_add_all_tests)
                 pragma-nowarn
                 printf-reg
                 printf-whole-array
-                raytype raytype-reg raytype-specialized regex-reg reparam
+                raytype raytype-reg raytype-specialized regex-reg
+                reparam reparam-arrays
                 render-background render-bumptest
                 render-cornell render-furnace-diffuse
                 render-mx-furnace-burley-diffuse

diff --git a/src/include/OSL/device_ptr.h b/src/include/OSL/device_ptr.h
@@ -0,0 +1,72 @@
+// Copyright Contributors to the Open Shading Language project.
+// SPDX-License-Identifier: BSD-3-Clause
+// https://github.com/AcademySoftwareFoundation/OpenShadingLanguage
+
+#pragma once
+
+#include <OSL/oslconfig.h>
+
+
+
+OSL_NAMESPACE_ENTER
+
+
+/// Wrapper class for holding a "device" pointer -- GPU or whatnot. It
+/// provides protections so that the pointer cannot easily be accessed on the
+/// host side, where presumably it would not be valid memory.
+template<class T> class device_ptr {
+public:
+    device_ptr() = default;
+
+    // Copy ctr from another device_ptr of the same type
+    device_ptr(const device_ptr& other) = default;
+
+    /// On device, device_ptr can construct from a pointer.
+    /// On host, it must be explicitly constructed -- use with caution.
+#ifdef __CUDA_ARCH__
+    device_ptr(T* ptr) : m_ptr(ptr) {}
+#else
+    explicit device_ptr(T* ptr) : m_ptr(ptr) {}
+#endif
+
+#ifdef __CUDA_ARCH__
+    /// On device, act as a pointer. None of these things are allowed on the
+    /// host.
+    T* operator->() const
+    {
+        return m_ptr;
+    }
+    T& operator*() const
+    {
+        return *m_ptr;
+    }
+#endif
+
+    /// Extract the raw device-side pointer. Use with caution! On the host,
+    /// this will not point to valid memory.
+    T* d_get() const
+    {
+        return m_ptr;
+    }
+
+    /// Evaluate as bool is a null pointer check.
+    operator bool() const noexcept
+    {
+        return m_ptr != nullptr;
+    }
+
+    /// Reset the pointer to `dptr`, which must be a device-side raw pointer,
+    /// or null. Since this device_ptr is non-owning, any previous value is
+    /// simply overwritten.
+    void reset(T* dptr = nullptr)
+    {
+        m_ptr = dptr;
+    }
+
+private:
+    T* m_ptr = nullptr;  // underlying pointer, initializes to null
+};
+
+
+
+OSL_NAMESPACE_EXIT
diff --git a/src/include/OSL/oslconfig.h.in b/src/include/OSL/oslconfig.h.in
@@ -241,6 +241,7 @@ enum class SymArena {
     Heap,               // Belongs to context heap
     Outputs,            // Belongs to output arena
     UserData,           // UserData arena
+    Interactive,        // Interactively edited variables
     // ShaderGlobals,   // RESERVED
 };
 

diff --git a/src/include/OSL/oslexec.h b/src/include/OSL/oslexec.h
@@ -509,6 +509,14 @@ class OSLEXECPUBLIC ShadingSystem {
     ///   string pickle              Retrieves a serialized representation
     ///                                 of the shader group declaration.
     ///   int llvm_groupdata_size    Size of the GroupData struct.
+    ///   ptr interactive_params     Pointer to the memory block containing
+    ///                                 host-side interactive parameter values
+    ///                                 for this shader group.
+    ///   ptr device_interactive_params
+    ///                              Pointer to the memory block containing
+    ///                                 device-side interactive parameter values
+    ///                                 for this shader group.
+    ///
     /// Note: the attributes referred to as "string" are actually on the app
     /// side as ustring or const char* (they have the same data layout), NOT
     /// std::string!

diff --git a/src/include/OSL/rendererservices.h b/src/include/OSL/rendererservices.h
@@ -400,18 +400,44 @@ class OSLEXECPUBLIC RendererServices {
     /// Return a pointer to the texture system (if available).
     virtual TextureSystem* texturesys() const;
 
-    virtual uint64_t register_global(const std::string& var_name,
-                                     uint64_t value)
+    /// Allocate `size` bytes of memory on the device that will execute the
+    /// shaders. (Equivalent to malloc() on the CPU.)
+    virtual void* device_alloc(size_t size)
     {
-        return 0;
+        return nullptr;
+        // Note: for an OptiX-based renderer, this method should be overriden
+        // with something like:
+        //
+        //     void* dptr;
+        //     auto r = cudaMalloc(&dptr, size);
+        //     return r == cudaSuccess ? dptr : nullptr;
     }
 
-    virtual bool fetch_global(const std::string& var_name, uint64_t* value)
+    /// Free a previous allocation (by `device_alloc()`) on the device that
+    /// will execute the shaders. (Equivalent to free() on the CPU.)
+    virtual void device_free(void* ptr)
     {
-        return false;
+        // Note: for an OptiX-based renderer, this method should be overriden
+        // with something like:
+        //
+        //     cudaFree(ptr);
     }
 
-
+    /// Copy `size` bytes from location `src_host` on the host/CPU (the
+    /// machine making this call) into location `dst_device` on the device
+    /// executing shaders. (Equivalent to `memcpy(dst, src, size)` on the
+    /// CPU.)
+    virtual void* copy_to_device(void* dst_device, const void* src_host,
+                                 size_t size)
+    {
+        return nullptr;
+        // Note: for an OptiX-based renderer, this method should be overriden
+        // with something like:
+        //
+        //     auto r = cudaMemcpy(dst_device, src_host, size,
+        //                         cudaMemcpyHostToDevice);
+        //     return dst_device;
+    }
 
     /// Options we use for noise calls.
     struct NoiseOpt {

diff --git a/src/include/optix_compat.h b/src/include/optix_compat.h
@@ -18,6 +18,7 @@
 #    include <stdlib.h>
 #endif
 
+#include <OSL/device_ptr.h>
 
 #if !OSL_USE_OPTIX && !defined(__CUDA_ARCH__)
 using CUdeviceptr = void*;

diff --git a/src/include/osl_pvt.h b/src/include/osl_pvt.h
@@ -632,7 +632,18 @@ class Symbol {
         OSL_ASSERT(arena == SymArena::Absolute);
         m_arena = static_cast<unsigned int>(arena);
         m_data  = ptr;
-        // m_dataoffset = static_cast<int64_t>((char*)ptr - (char*)0);
+    }
+
+    /// Specify the location of the symbol's data, relative to an arena
+    /// (which for now must be Absolute).
+    void set_dataptr(SymArena arena, void* ptr, int offset)
+    {
+        // OSL_ASSERT(arena == SymArena::Absolute);
+        m_arena      = static_cast<unsigned int>(arena);
+        m_data       = ptr;
+        m_dataoffset = offset;
+        OSL::print("setting sym {} arena {} offset {}\n", name(), int(m_arena),
+                   m_dataoffset);
     }
 
 

diff --git a/src/liboslexec/backendllvm.cpp b/src/liboslexec/backendllvm.cpp
@@ -226,6 +226,15 @@ BackendLLVM::getLLVMSymbolBase(const Symbol& sym)
                                 llvm_type(sym.typespec().elementtype()));
         return result;
     }
+    if (sym.symtype() == SymTypeParam && sym.interactive()) {
+        // Special case for interactively-edited parameters -- they live in
+        // the interactive data block for the group.
+        // Generate the pointer to this symbol by offsetting into the
+        // interactive data block.
+        int offset = group().interactive_param_offset(layer(), sym.name());
+        return ll.offset_ptr(m_llvm_interactive_params_ptr, offset,
+                             llvm_ptr_type(sym.typespec().elementtype()));
+    }
 
     if (sym.symtype() == SymTypeParam || sym.symtype() == SymTypeOutputParam) {
         // Special case for params -- they live in the group data
@@ -334,14 +343,7 @@ BackendLLVM::getOrAllocateCUDAVariable(const Symbol& sym)
     OSL_ASSERT(use_optix()
                && "This function is only supported when using OptiX!");
 
-    // We need to sanitize the symbol name for PTX compatibility. Also, if the
-    // sym name starts with a dollar sign, which are not allowed in PTX
-    // variable names, then prepend another underscore.
-    std::string sym_name = Strutil::replace(sym.name(), ".", "_", true);
-
-    std::string name
-        = fmtformat("{}{}_{}_{}_{}", sym_name.front() == '$' ? "_" : "",
-                    sym_name, group().name(), inst()->layername(), sym.layer());
+    std::string name = global_unique_symname(sym);
 
     // Return the Value if it has already been allocated
     auto it = get_const_map().find(name);

diff --git a/src/liboslexec/backendllvm.h b/src/liboslexec/backendllvm.h
@@ -217,6 +217,20 @@ class BackendLLVM final : public OSOProcessorBase {
     llvm::Value* getOrAllocateLLVMSymbol(const Symbol& sym);
 
 #if OSL_USE_OPTIX
+    /// Return a globally unique (to the JIT module) name for symbol `sym`,
+    /// assuming it's part of the currently examined layer of the group.
+    std::string global_unique_symname(const Symbol& sym)
+    {
+        // We need to sanitize the symbol name for PTX compatibility. Also, if
+        // the sym name starts with a dollar sign, which are not allowed in
+        // PTX variable names, then prepend another underscore.
+        auto sym_name = Strutil::replace(sym.name(), ".", "_", true);
+        int layer     = sym.layer();
+        const ShaderInstance* inst_ = group()[layer];
+        return fmtformat("{}{}_{}_{}_{}", sym_name.front() == '$' ? "_" : "",
+                         sym_name, group().name(), inst_->layername(), layer);
+    }
+
     /// Allocate a CUDA variable for the given OSL symbol and return a pointer
     /// to the corresponding LLVM GlobalVariable, or return the pointer if it
     /// has already been allocated.
@@ -449,6 +463,13 @@ class BackendLLVM final : public OSOProcessorBase {
         return ll.llvm_type(llvm_typedesc(typespec));
     }
 
+    /// Generate the appropriate llvm type definition for a pointer to
+    /// the type specified by the TypeSpec.
+    llvm::Type* llvm_ptr_type(const TypeSpec& typespec)
+    {
+        return ll.type_ptr(ll.llvm_type(llvm_typedesc(typespec)));
+    }
+
     /// Generate the parameter-passing llvm type definition for an OSL
     /// TypeSpec.
     llvm::Type* llvm_pass_type(const TypeSpec& typespec);
@@ -559,6 +580,7 @@ class BackendLLVM final : public OSOProcessorBase {
     std::map<const Symbol*, int> m_param_order_map;
     llvm::Value* m_llvm_shaderglobals_ptr;
     llvm::Value* m_llvm_groupdata_ptr;
+    llvm::Value* m_llvm_interactive_params_ptr;
     llvm::Value* m_llvm_userdata_base_ptr;
     llvm::Value* m_llvm_output_base_ptr;
     llvm::Value* m_llvm_shadeindex;

diff --git a/src/liboslexec/batched_backendllvm.cpp b/src/liboslexec/batched_backendllvm.cpp
@@ -448,6 +448,16 @@ BatchedBackendLLVM::getLLVMSymbolBase(const Symbol& sym)
         return result;
     }
 
+    if (sym.symtype() == SymTypeParam && sym.interactive()) {
+        // Special case for interactively-edited parameters -- they live in
+        // the interactive data block for the group.
+        // Generate the pointer to this symbol by offsetting into the
+        // interactive data block.
+        int offset = group().interactive_param_offset(layer(), sym.name());
+        return ll.offset_ptr(m_llvm_interactive_params_ptr, offset,
+                             llvm_ptr_type(sym.typespec().elementtype()));
+    }
+
     if (sym.symtype() == SymTypeParam || sym.symtype() == SymTypeOutputParam) {
         // Special case for params -- they live in the group data
         int fieldnum = m_param_order_map[&sym];

diff --git a/src/liboslexec/batched_backendllvm.h b/src/liboslexec/batched_backendllvm.h
@@ -6,16 +6,14 @@
 #pragma once
 
 #include <map>
-#include <unordered_map>
-#include <unordered_set>
 #include <vector>
 
 #include "oslexec_pvt.h"
 
 using namespace OSL;
 using namespace OSL::pvt;
 
-#include "OSL/llvm_util.h"
+#include <OSL/llvm_util.h>
 #include "runtimeoptimize.h"
 
 #include <llvm/ADT/SmallString.h>
@@ -658,6 +656,14 @@ class BatchedBackendLLVM : public OSOProcessorBase {
         return ll.llvm_vector_type(llvm_typedesc(typespec));
     }
 
+    /// Generate the appropriate llvm type definition for a pointer to
+    /// the type specified by the TypeSpec.
+    llvm::Type* llvm_ptr_type(const TypeSpec& typespec)
+    {
+        return reinterpret_cast<llvm::Type*>(
+            ll.type_ptr(ll.llvm_type(llvm_typedesc(typespec))));
+    }
+
     /// Generate the parameter-passing llvm type definition for an OSL
     /// TypeSpec.
     llvm::Type* llvm_pass_type(const TypeSpec& typespec);
@@ -783,7 +789,7 @@ class BatchedBackendLLVM : public OSOProcessorBase {
     std::map<const Symbol*, int> m_param_order_map;
     llvm::Value* m_llvm_shaderglobals_ptr;
     llvm::Value* m_llvm_groupdata_ptr;
-
+    llvm::Value* m_llvm_interactive_params_ptr;
     llvm::Value* m_llvm_wide_shadeindex_ptr;
     llvm::Value* m_llvm_userdata_base_ptr;
     llvm::Value* m_llvm_output_base_ptr;

diff --git a/src/liboslexec/batched_llvm_gen.cpp b/src/liboslexec/batched_llvm_gen.cpp
@@ -92,13 +92,17 @@ BatchedBackendLLVM::llvm_call_layer(int layer, bool unconditional)
                            << " unconditional=" << unconditional << std::endl);
     // Make code that looks like:
     //     if (! groupdata->run[parentlayer])
-    //         parent_layer (sg, groupdata);
+    //         parent_layer (sg, groupdata, wide_shadeindex_ptr,
+    //                       userdata_base_ptr, output_base_ptr,
+    //                       execution_mask, interactive_params_ptr);
     // if it's a conditional call, or
-    //     parent_layer (sg, groupdata);
+    //     parent_layer (sg, groupdata, wide_shadeindex_ptr,
+    //                   userdata_base_ptr, output_base_ptr,
+    //                   execution_mask, interactive_params_ptr);
     // if it's run unconditionally.
     // The code in the parent layer itself will set its 'executed' flag.
 
-    llvm::Value* args[6];
+    llvm::Value* args[7];
     args[0] = sg_ptr();
     args[1] = groupdata_ptr();
     args[2] = ll.void_ptr(wide_shadeindex_ptr());
@@ -120,20 +124,19 @@ BatchedBackendLLVM::llvm_call_layer(int layer, bool unconditional)
         llvm::Value* execution_required
             = ll.op_ne(lanes_requiring_execution_value, ll.constant(0));
         then_block = ll.new_basic_block(
-            llvm_debug()
-                ? std::string("then layer ").append(std::to_string(layer))
-                : std::string());
+            llvm_debug() ? Strutil::fmt::format("then layer {}", layer)
+                         : std::string());
         after_block = ll.new_basic_block(
-            llvm_debug()
-                ? std::string("after layer ").append(std::to_string(layer))
-                : std::string());
+            llvm_debug() ? Strutil::fmt::format("after layer {}", layer)
+                         : std::string());
         ll.op_branch(execution_required, then_block, after_block);
         // insert point is now then_block
     } else {
         lanes_requiring_execution_value = ll.mask_as_int(ll.shader_mask());
     }
 
     args[5] = lanes_requiring_execution_value;
+    args[6] = m_llvm_interactive_params_ptr;
 
     // Before the merge, keeping in case we broke it
     //std::string name = fmtformat("{}_{}_{}", m_library_selector,  parent->layername().c_str(),
@@ -247,7 +250,7 @@ LLVMGEN(llvm_gen_useparam)
         // initializing them lazily, now we have to do it.
         if ((sym.symtype() == SymTypeParam
              || sym.symtype() == SymTypeOutputParam)
-            && !sym.lockgeom() && !sym.typespec().is_closure()
+            && sym.interpolated() && !sym.typespec().is_closure()
             && !sym.connected() && !sym.connected_down()
             && rop.shadingsys().lazy_userdata()) {
             rop.llvm_assign_initial_value(sym, rop.ll.mask_as_int(