Skip to content

Commit

Permalink
feat(gpu)!: GPU/OptiX support of ReParameter (#1686)
Browse files Browse the repository at this point in the history
BREAKING CHANGE: to RendererServices ABI (including for CPU) and to
the renderer-side setup when using OptiX.

This overhauls the implementation of how interactively-editable
parameters work, where they live in memory, and get it all working on
GPU/OptiX so that renderers can support interactive adjustment of
those params without recompiling shaders.

The basic gist is as follows:

* We continue work to finish making a clean separation between
  "interpolated" parameters and "interactive" (editable) parameters.

* Interpolated params are collected and put into a separate memory
  area -- a separate per-group allocation on both the CPU and GPU
  (where applicable). It needs to remember the offset into this arena
  where each of the interpolated parameters resides. These allocations
  and eventual release are taken care of by the OSL shading system,
  they live in the ShaderGroup. When the group is set up, this block
  of memory is initialized with the correct initial values of the
  params and are ready to go.

* The implementation of ReParameter writes to this special memory
  area also, that's how it works now (both CPU and GPU).

* How does the OSL library know how to allocate, free, and copy to the
  device memory? It doesn't! Instead, we add new RendererServices
  methods `device_alloc()`, `device_free()`, and
  `copy_to_device()`. It's up to the renderer to provide those, so
  that the OSL library doesn't itself need to know about the Cuda
  runtime. These are trivial, there's really only one implementation
  that makes sense, and you can copy it from the ones in testshade and
  testrender.

* Interactive parameters are NOT constant folded during runtime
  optimization.

* The shader entry points themselves now take an extra parameter in
  the main call -- this will be the pointer to the beginning of the
  shader group's interactive parameter arena.

* When JITing, references to interactive parameters know to retrieve
  them from their designated offset into the interactive parameter
  area.

* This means that the renderer-side OptiX/Cuda code is responsible for
  adding this extra pointer parameter to the call to the shader entry
  points. You can see how this is done in the testshade and testrender
  Cuda code.

* It's up to the renderer to figure out how to make the OptiX hit program
  aware of the interactive parameter pointer for that particular material,
  in order to pass it to the osl shader entry point. The way I did it in
  testshade and testrender is using a field in the struct that's given
  to each entry of the shader binding table and can be retrieved on the
  OptiX side via optixGetSbtDataPointer(). In testshade/testrender, a
  data pointer already existed which wasn't used. In a real renderer, you
  may need to add a field or come up with whatever other way you want
  to somehow get this pointer, which can be retrieved via

      shadingsys->getattribute(shadergroupptr, "device_interactive_params",
                               TypeDesc::PTR, &myptr);

  you can see how I do that in optixraytracer.cpp (testrender) and in
  optixgridrender.cpp (testshade).

A number of other things you will see that's worth calling out:

I added a device_ptr utility template that is just a wrapper around a
device side pointer that makes it hard to accidentally dereference it
on the host side.

Since I was changing RendererServices anyway, I also remove unused
register_global, fetch_global, global_map which were unused. They were
leftovers from the way we handled strings in OptiX 6.x.

Encapsulate cuda global symbol name mangling into
BackendLLVM::global_unique_symname(). I did this early on, turns out
it wasn't necessary, but I still like that encapsulation, so I'm
keeping it.

I bumped the 3rd set of digits in the version to reflect that the
changes in RendererServices break ABI. This is only in main, it
obviously cannot be backported to a release branch.

All tests pass for scalar and batch and optix.

I added a new simple reparam test, and renamed the old reparem to
reparam-array. Oddly, the reparam-array test doesn't work properly on
optix (it had never been tried before), but it also failed in optix at
main -- so it's not related to this patch! Putting that on my list of
other oddities to investigate later. It may just be a quirk of
testshade, I'm not really sure yet.

Added to BackendLLVM (and batched) a llvm_ptr_type(TypeSpec) method
that returns the LLVM type of a pointer to the specified type.

Note: This patch doesn't account for the face that a parameter marked
"interactive" could prevent a shader from correctly building for the GPU
because it used the kind of construct that is fine in shader source code but
only will work on GPU if it can be resolved to be a constant by the time
we get done with the runtime optimization (as pointed out by Stephen
Friedman. We'll come back to the problem later with a more robust and
automatic fix -- and if we are lucky, Stephen will have the opportunity to
upstream the approach he already has.

Signed-off-by: Larry Gritz <[email protected]>
  • Loading branch information
lgritz committed May 23, 2023
1 parent 5c49a7c commit 71a9310
Show file tree
Hide file tree
Showing 49 changed files with 807 additions and 324 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

cmake_minimum_required (VERSION 3.12)

set (OSL_VERSION "1.13.3.0")
set (OSL_VERSION "1.13.4.0")
set (OSL_VERSION_OVERRIDE "" CACHE STRING
"Version override (use with caution)!")
mark_as_advanced (OSL_VERSION_OVERRIDE)
Expand Down
3 changes: 2 additions & 1 deletion src/cmake/testing.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,8 @@ macro (osl_add_all_tests)
pragma-nowarn
printf-reg
printf-whole-array
raytype raytype-reg raytype-specialized regex-reg reparam
raytype raytype-reg raytype-specialized regex-reg
reparam reparam-arrays
render-background render-bumptest
render-cornell render-furnace-diffuse
render-mx-furnace-burley-diffuse
Expand Down
72 changes: 72 additions & 0 deletions src/include/OSL/device_ptr.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
// Copyright Contributors to the Open Shading Language project.
// SPDX-License-Identifier: BSD-3-Clause
// https://github.com/AcademySoftwareFoundation/OpenShadingLanguage

#pragma once

#include <OSL/oslconfig.h>



OSL_NAMESPACE_ENTER


/// Wrapper class for holding a "device" pointer -- GPU or whatnot. It
/// provides protections so that the pointer cannot easily be accessed on the
/// host side, where presumably it would not be valid memory.
template<class T> class device_ptr {
public:
device_ptr() = default;

// Copy ctr from another device_ptr of the same type
device_ptr(const device_ptr& other) = default;

/// On device, device_ptr can construct from a pointer.
/// On host, it must be explicitly constructed -- use with caution.
#ifdef __CUDA_ARCH__
device_ptr(T* ptr) : m_ptr(ptr) {}
#else
explicit device_ptr(T* ptr) : m_ptr(ptr) {}
#endif

#ifdef __CUDA_ARCH__
/// On device, act as a pointer. None of these things are allowed on the
/// host.
T* operator->() const
{
return m_ptr;
}
T& operator*() const
{
return *m_ptr;
}
#endif

/// Extract the raw device-side pointer. Use with caution! On the host,
/// this will not point to valid memory.
T* d_get() const
{
return m_ptr;
}

/// Evaluate as bool is a null pointer check.
operator bool() const noexcept
{
return m_ptr != nullptr;
}

/// Reset the pointer to `dptr`, which must be a device-side raw pointer,
/// or null. Since this device_ptr is non-owning, any previous value is
/// simply overwritten.
void reset(T* dptr = nullptr)
{
m_ptr = dptr;
}

private:
T* m_ptr = nullptr; // underlying pointer, initializes to null
};



OSL_NAMESPACE_EXIT
1 change: 1 addition & 0 deletions src/include/OSL/oslconfig.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,7 @@ enum class SymArena {
Heap, // Belongs to context heap
Outputs, // Belongs to output arena
UserData, // UserData arena
Interactive, // Interactively edited variables
// ShaderGlobals, // RESERVED
};

Expand Down
8 changes: 8 additions & 0 deletions src/include/OSL/oslexec.h
Original file line number Diff line number Diff line change
Expand Up @@ -509,6 +509,14 @@ class OSLEXECPUBLIC ShadingSystem {
/// string pickle Retrieves a serialized representation
/// of the shader group declaration.
/// int llvm_groupdata_size Size of the GroupData struct.
/// ptr interactive_params Pointer to the memory block containing
/// host-side interactive parameter values
/// for this shader group.
/// ptr device_interactive_params
/// Pointer to the memory block containing
/// device-side interactive parameter values
/// for this shader group.
///
/// Note: the attributes referred to as "string" are actually on the app
/// side as ustring or const char* (they have the same data layout), NOT
/// std::string!
Expand Down
38 changes: 32 additions & 6 deletions src/include/OSL/rendererservices.h
Original file line number Diff line number Diff line change
Expand Up @@ -400,18 +400,44 @@ class OSLEXECPUBLIC RendererServices {
/// Return a pointer to the texture system (if available).
virtual TextureSystem* texturesys() const;

virtual uint64_t register_global(const std::string& var_name,
uint64_t value)
/// Allocate `size` bytes of memory on the device that will execute the
/// shaders. (Equivalent to malloc() on the CPU.)
virtual void* device_alloc(size_t size)
{
return 0;
return nullptr;
// Note: for an OptiX-based renderer, this method should be overriden
// with something like:
//
// void* dptr;
// auto r = cudaMalloc(&dptr, size);
// return r == cudaSuccess ? dptr : nullptr;
}

virtual bool fetch_global(const std::string& var_name, uint64_t* value)
/// Free a previous allocation (by `device_alloc()`) on the device that
/// will execute the shaders. (Equivalent to free() on the CPU.)
virtual void device_free(void* ptr)
{
return false;
// Note: for an OptiX-based renderer, this method should be overriden
// with something like:
//
// cudaFree(ptr);
}


/// Copy `size` bytes from location `src_host` on the host/CPU (the
/// machine making this call) into location `dst_device` on the device
/// executing shaders. (Equivalent to `memcpy(dst, src, size)` on the
/// CPU.)
virtual void* copy_to_device(void* dst_device, const void* src_host,
size_t size)
{
return nullptr;
// Note: for an OptiX-based renderer, this method should be overriden
// with something like:
//
// auto r = cudaMemcpy(dst_device, src_host, size,
// cudaMemcpyHostToDevice);
// return dst_device;
}

/// Options we use for noise calls.
struct NoiseOpt {
Expand Down
1 change: 1 addition & 0 deletions src/include/optix_compat.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
# include <stdlib.h>
#endif

#include <OSL/device_ptr.h>

#if !OSL_USE_OPTIX && !defined(__CUDA_ARCH__)
using CUdeviceptr = void*;
Expand Down
13 changes: 12 additions & 1 deletion src/include/osl_pvt.h
Original file line number Diff line number Diff line change
Expand Up @@ -632,7 +632,18 @@ class Symbol {
OSL_ASSERT(arena == SymArena::Absolute);
m_arena = static_cast<unsigned int>(arena);
m_data = ptr;
// m_dataoffset = static_cast<int64_t>((char*)ptr - (char*)0);
}

/// Specify the location of the symbol's data, relative to an arena
/// (which for now must be Absolute).
void set_dataptr(SymArena arena, void* ptr, int offset)
{
// OSL_ASSERT(arena == SymArena::Absolute);
m_arena = static_cast<unsigned int>(arena);
m_data = ptr;
m_dataoffset = offset;
OSL::print("setting sym {} arena {} offset {}\n", name(), int(m_arena),
m_dataoffset);
}


Expand Down
18 changes: 10 additions & 8 deletions src/liboslexec/backendllvm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,15 @@ BackendLLVM::getLLVMSymbolBase(const Symbol& sym)
llvm_type(sym.typespec().elementtype()));
return result;
}
if (sym.symtype() == SymTypeParam && sym.interactive()) {
// Special case for interactively-edited parameters -- they live in
// the interactive data block for the group.
// Generate the pointer to this symbol by offsetting into the
// interactive data block.
int offset = group().interactive_param_offset(layer(), sym.name());
return ll.offset_ptr(m_llvm_interactive_params_ptr, offset,
llvm_ptr_type(sym.typespec().elementtype()));
}

if (sym.symtype() == SymTypeParam || sym.symtype() == SymTypeOutputParam) {
// Special case for params -- they live in the group data
Expand Down Expand Up @@ -334,14 +343,7 @@ BackendLLVM::getOrAllocateCUDAVariable(const Symbol& sym)
OSL_ASSERT(use_optix()
&& "This function is only supported when using OptiX!");

// We need to sanitize the symbol name for PTX compatibility. Also, if the
// sym name starts with a dollar sign, which are not allowed in PTX
// variable names, then prepend another underscore.
std::string sym_name = Strutil::replace(sym.name(), ".", "_", true);

std::string name
= fmtformat("{}{}_{}_{}_{}", sym_name.front() == '$' ? "_" : "",
sym_name, group().name(), inst()->layername(), sym.layer());
std::string name = global_unique_symname(sym);

// Return the Value if it has already been allocated
auto it = get_const_map().find(name);
Expand Down
22 changes: 22 additions & 0 deletions src/liboslexec/backendllvm.h
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,20 @@ class BackendLLVM final : public OSOProcessorBase {
llvm::Value* getOrAllocateLLVMSymbol(const Symbol& sym);

#if OSL_USE_OPTIX
/// Return a globally unique (to the JIT module) name for symbol `sym`,
/// assuming it's part of the currently examined layer of the group.
std::string global_unique_symname(const Symbol& sym)
{
// We need to sanitize the symbol name for PTX compatibility. Also, if
// the sym name starts with a dollar sign, which are not allowed in
// PTX variable names, then prepend another underscore.
auto sym_name = Strutil::replace(sym.name(), ".", "_", true);
int layer = sym.layer();
const ShaderInstance* inst_ = group()[layer];
return fmtformat("{}{}_{}_{}_{}", sym_name.front() == '$' ? "_" : "",
sym_name, group().name(), inst_->layername(), layer);
}

/// Allocate a CUDA variable for the given OSL symbol and return a pointer
/// to the corresponding LLVM GlobalVariable, or return the pointer if it
/// has already been allocated.
Expand Down Expand Up @@ -449,6 +463,13 @@ class BackendLLVM final : public OSOProcessorBase {
return ll.llvm_type(llvm_typedesc(typespec));
}

/// Generate the appropriate llvm type definition for a pointer to
/// the type specified by the TypeSpec.
llvm::Type* llvm_ptr_type(const TypeSpec& typespec)
{
return ll.type_ptr(ll.llvm_type(llvm_typedesc(typespec)));
}

/// Generate the parameter-passing llvm type definition for an OSL
/// TypeSpec.
llvm::Type* llvm_pass_type(const TypeSpec& typespec);
Expand Down Expand Up @@ -559,6 +580,7 @@ class BackendLLVM final : public OSOProcessorBase {
std::map<const Symbol*, int> m_param_order_map;
llvm::Value* m_llvm_shaderglobals_ptr;
llvm::Value* m_llvm_groupdata_ptr;
llvm::Value* m_llvm_interactive_params_ptr;
llvm::Value* m_llvm_userdata_base_ptr;
llvm::Value* m_llvm_output_base_ptr;
llvm::Value* m_llvm_shadeindex;
Expand Down
10 changes: 10 additions & 0 deletions src/liboslexec/batched_backendllvm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,16 @@ BatchedBackendLLVM::getLLVMSymbolBase(const Symbol& sym)
return result;
}

if (sym.symtype() == SymTypeParam && sym.interactive()) {
// Special case for interactively-edited parameters -- they live in
// the interactive data block for the group.
// Generate the pointer to this symbol by offsetting into the
// interactive data block.
int offset = group().interactive_param_offset(layer(), sym.name());
return ll.offset_ptr(m_llvm_interactive_params_ptr, offset,
llvm_ptr_type(sym.typespec().elementtype()));
}

if (sym.symtype() == SymTypeParam || sym.symtype() == SymTypeOutputParam) {
// Special case for params -- they live in the group data
int fieldnum = m_param_order_map[&sym];
Expand Down
14 changes: 10 additions & 4 deletions src/liboslexec/batched_backendllvm.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,14 @@
#pragma once

#include <map>
#include <unordered_map>
#include <unordered_set>
#include <vector>

#include "oslexec_pvt.h"

using namespace OSL;
using namespace OSL::pvt;

#include "OSL/llvm_util.h"
#include <OSL/llvm_util.h>
#include "runtimeoptimize.h"

#include <llvm/ADT/SmallString.h>
Expand Down Expand Up @@ -658,6 +656,14 @@ class BatchedBackendLLVM : public OSOProcessorBase {
return ll.llvm_vector_type(llvm_typedesc(typespec));
}

/// Generate the appropriate llvm type definition for a pointer to
/// the type specified by the TypeSpec.
llvm::Type* llvm_ptr_type(const TypeSpec& typespec)
{
return reinterpret_cast<llvm::Type*>(
ll.type_ptr(ll.llvm_type(llvm_typedesc(typespec))));
}

/// Generate the parameter-passing llvm type definition for an OSL
/// TypeSpec.
llvm::Type* llvm_pass_type(const TypeSpec& typespec);
Expand Down Expand Up @@ -783,7 +789,7 @@ class BatchedBackendLLVM : public OSOProcessorBase {
std::map<const Symbol*, int> m_param_order_map;
llvm::Value* m_llvm_shaderglobals_ptr;
llvm::Value* m_llvm_groupdata_ptr;

llvm::Value* m_llvm_interactive_params_ptr;
llvm::Value* m_llvm_wide_shadeindex_ptr;
llvm::Value* m_llvm_userdata_base_ptr;
llvm::Value* m_llvm_output_base_ptr;
Expand Down
23 changes: 13 additions & 10 deletions src/liboslexec/batched_llvm_gen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -92,13 +92,17 @@ BatchedBackendLLVM::llvm_call_layer(int layer, bool unconditional)
<< " unconditional=" << unconditional << std::endl);
// Make code that looks like:
// if (! groupdata->run[parentlayer])
// parent_layer (sg, groupdata);
// parent_layer (sg, groupdata, wide_shadeindex_ptr,
// userdata_base_ptr, output_base_ptr,
// execution_mask, interactive_params_ptr);
// if it's a conditional call, or
// parent_layer (sg, groupdata);
// parent_layer (sg, groupdata, wide_shadeindex_ptr,
// userdata_base_ptr, output_base_ptr,
// execution_mask, interactive_params_ptr);
// if it's run unconditionally.
// The code in the parent layer itself will set its 'executed' flag.

llvm::Value* args[6];
llvm::Value* args[7];
args[0] = sg_ptr();
args[1] = groupdata_ptr();
args[2] = ll.void_ptr(wide_shadeindex_ptr());
Expand All @@ -120,20 +124,19 @@ BatchedBackendLLVM::llvm_call_layer(int layer, bool unconditional)
llvm::Value* execution_required
= ll.op_ne(lanes_requiring_execution_value, ll.constant(0));
then_block = ll.new_basic_block(
llvm_debug()
? std::string("then layer ").append(std::to_string(layer))
: std::string());
llvm_debug() ? Strutil::fmt::format("then layer {}", layer)
: std::string());
after_block = ll.new_basic_block(
llvm_debug()
? std::string("after layer ").append(std::to_string(layer))
: std::string());
llvm_debug() ? Strutil::fmt::format("after layer {}", layer)
: std::string());
ll.op_branch(execution_required, then_block, after_block);
// insert point is now then_block
} else {
lanes_requiring_execution_value = ll.mask_as_int(ll.shader_mask());
}

args[5] = lanes_requiring_execution_value;
args[6] = m_llvm_interactive_params_ptr;

// Before the merge, keeping in case we broke it
//std::string name = fmtformat("{}_{}_{}", m_library_selector, parent->layername().c_str(),
Expand Down Expand Up @@ -247,7 +250,7 @@ LLVMGEN(llvm_gen_useparam)
// initializing them lazily, now we have to do it.
if ((sym.symtype() == SymTypeParam
|| sym.symtype() == SymTypeOutputParam)
&& !sym.lockgeom() && !sym.typespec().is_closure()
&& sym.interpolated() && !sym.typespec().is_closure()
&& !sym.connected() && !sym.connected_down()
&& rop.shadingsys().lazy_userdata()) {
rop.llvm_assign_initial_value(sym, rop.ll.mask_as_int(
Expand Down
Loading

0 comments on commit 71a9310

Please sign in to comment.