Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OptiX testrender overhaul #1829

Draft
wants to merge 110 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
8deb471
Crudely refactor make_optix_materials.
tgrant-nv Nov 22, 2023
4a85d18
Continue refactoring the OptiX pipeline setup.
tgrant-nv Nov 24, 2023
5475f08
Tweaks to allow including shadding.h/shading.cpp in wrapper.cu.
tgrant-nv Nov 20, 2023
79d39d6
FIX: Use the correct OptiX call to retrieve t_hit.
tgrant-nv Nov 28, 2023
e7adbe5
Basic pathtracing working on the GPU.
tgrant-nv Nov 28, 2023
c97b0c0
Add PDF calculation for self-emission.
tgrant-nv Dec 5, 2023
418e916
Make the sampling match the CPU path more closely.
tgrant-nv Dec 5, 2023
4cff2d8
Add some vector casting macros.
tgrant-nv Dec 5, 2023
20d452b
Trace light rays to both quad and sphere prims.
tgrant-nv Dec 5, 2023
4b16643
Separate out subpixel_radiance function to make anti-aliasing easier.
tgrant-nv Dec 5, 2023
c6a2240
Enable anti-aliasing.
tgrant-nv Dec 5, 2023
8f4bb85
Enable show_albedo_scale. Misc. cleanup.
tgrant-nv Dec 5, 2023
4c63724
Add "precise" versions of the quad and sphere intersection programs t…
tgrant-nv Dec 6, 2023
9e54203
Don't create hitgroups for geometry types that aren't present in the …
tgrant-nv Dec 6, 2023
a34ccc9
Tweak the UV calculation for the sphere.
tgrant-nv Dec 6, 2023
a3f7e85
More BDSFs sort of working (Ward yes, Phong no).
tgrant-nv Dec 7, 2023
0c92f24
Make the surface area computation for the GPU sphere match the CPU.
tgrant-nv Dec 7, 2023
b0229ed
Record the backfacing property for quad hits.
tgrant-nv Dec 7, 2023
4f4ccce
Rework the light ray tracing a bit.
tgrant-nv Dec 7, 2023
7dcc2b2
Pass max bounces through the render params.
tgrant-nv Dec 7, 2023
e17b590
Add cases from FRESNEL_REFLECTION_ID.
tgrant-nv Dec 7, 2023
fd0d6cc
FIX: Adjust the size needed calculation in closure_component_allot.
tgrant-nv Dec 8, 2023
d976332
Use the correct case labels for REFRACTION_ID. Use the correct get_al…
tgrant-nv Dec 9, 2023
d391c73
Changes to make sphere refraction work. Pass in the last hit ID to al…
tgrant-nv Dec 9, 2023
77485d3
No need to redefine the OSL_HOSTDEVICE macro.
tgrant-nv Dec 9, 2023
14445e4
Hack in the dPdx/dPdy calculation, to allow things like calculatenorm…
tgrant-nv Dec 11, 2023
ccba6f7
Add basic support for the Microfacet BSDF.
tgrant-nv Dec 14, 2023
48adb13
Add initial support for Background sampling on the GPU. Add support f…
tgrant-nv Dec 20, 2023
a0b9477
Change license notice for upper_bound.
tgrant-nv Jun 10, 2024
ddb401d
Use the full-precision intrinsics for some calculations in Background.
tgrant-nv Dec 21, 2023
fc439e6
Don't attempt to prepare the background or evaluate it if it doesn't …
tgrant-nv Dec 21, 2023
77de801
Use a full-precision divide when normalizing the PDF.
tgrant-nv Dec 21, 2023
2bd8f7e
Partially parallelize Background::prepare().
tgrant-nv Dec 24, 2023
0a1f025
Cleanup background.h a little.
tgrant-nv Dec 24, 2023
eae4ef8
Make Background::prepare_gpu() work with dimensions that aren't a mul…
tgrant-nv Dec 24, 2023
06bea21
Cleanup background.h a bit.
tgrant-nv Dec 26, 2023
a3707d6
Shuffle some of the CUDA code around in shading.cpp.
tgrant-nv Dec 26, 2023
0bf4c56
HACK: Put RenderParams in a namespace for testshade.
tgrant-nv Dec 27, 2023
5c3a786
HACK: Make OSL_DASSERT a no-op when targeting CUDA.
tgrant-nv Dec 28, 2023
1ebcd09
Enforce a minimum background resolution of 32x32.
tgrant-nv Dec 29, 2023
107e1ef
Add partial support for MaterialX closures.
tgrant-nv Dec 28, 2023
80e3043
Add a new file shading_cuda.cpp to hold the CUDA-specific functions. …
tgrant-nv Dec 29, 2023
6ed6442
process_medium_closure is mostly working.
tgrant-nv Dec 30, 2023
a81b9b2
Clean up the closure process functions a little.
tgrant-nv Dec 30, 2023
d80bc21
Layered BSDFs partially working.
tgrant-nv Jan 2, 2024
fa5c7d0
Reshuffle and refactor some code.
tgrant-nv Jan 3, 2024
d2d519b
Support different microfacet distributions (GGX, Beckmann).
tgrant-nv Jan 3, 2024
0cf56ea
Fix evaluate_layer_opacity.
tgrant-nv Jan 3, 2024
f465410
Remove the hitpoint offset stuff.
tgrant-nv Jan 3, 2024
595663a
Cleanup.
tgrant-nv Jan 3, 2024
ba60cd0
Fixup some rebase snafus in optix_grid_renderer.cu.
tgrant-nv Jan 6, 2024
8cb6d3c
Fixup some string stuff after the rebase.
tgrant-nv Jan 6, 2024
49233be
Add some templated get_albedo/sample/eval functions to help streamlin…
tgrant-nv Jan 3, 2024
2b5d5c6
Plumb through support for TRANSPARENT_ID. Untested.
tgrant-nv Jan 3, 2024
91f6495
Destroy the shading system before destroying the renderer.
tgrant-nv Jan 4, 2024
ab3008a
Add an option to disable AA pixel jitter.
tgrant-nv Jan 4, 2024
c54fb95
Disable the unsupported string ops in testoptix.
tgrant-nv Jan 4, 2024
55b7b76
fixup. Add OSL_HOSTDEVICE to BSDF::BSDF().
tgrant-nv Jan 4, 2024
705c8ca
Add missing break statement in MX_GENERALIZED_SCHLICK_ID case.
tgrant-nv Jan 4, 2024
a388b50
Manually construct the BSDFs in evaluate_layer_opacity.
tgrant-nv Jan 4, 2024
dca249c
Refactor manual BSDF "construction" a bit.
tgrant-nv Jan 4, 2024
0066ab1
Adjust the test cases so that they work with the updated testrender.
tgrant-nv Jan 4, 2024
7e44a37
Get rid of the _gpu versions of prepare/eval/sample/get_albedo.
tgrant-nv Jan 5, 2024
1a96921
Strip out the old raygen and unused occlusion programs.
tgrant-nv Jan 5, 2024
dd1b3b6
Plumb through support for TRANSLUCENT_ID. Untested.
tgrant-nv Jan 5, 2024
a13146d
Disable some asserts in CompositeBSDF::prepare() on the CUDA path.
tgrant-nv Jan 6, 2024
8ad387c
Use mipmaps in testshade.
tgrant-nv Jan 6, 2024
31abe77
Tweak texture creation, and unify the implementations between testren…
tgrant-nv Jan 6, 2024
159c1f2
Update the reference images for test_spline and test_texture, since t…
tgrant-nv Jan 6, 2024
0f6b710
Remove unused code from sphere.cu.
tgrant-nv Jan 9, 2024
99a92b6
Add the self check to quads, and tweak the check slightly for spheres.
tgrant-nv Jan 9, 2024
4d591ce
Get rid of the "precise" intersection programs.
tgrant-nv Jan 9, 2024
e59210d
Streamline the Payload type a bit. Assorted cleanup and refactoring.
tgrant-nv Feb 7, 2024
f7d79cd
Fix some compile issues with testshade.
tgrant-nv Feb 8, 2024
a3445dd
Update the reference images for testoptix-noise
tgrant-nv Feb 8, 2024
e937b66
Simplify the guard around the OSL_DASSERT definition, and add a note …
tgrant-nv May 2, 2024
648ebfc
Use the pointer type instead of uint64_t in the Payload struct. Get r…
tgrant-nv May 2, 2024
ec7a026
Add a note about the provenance of the Sphere/Quad sample functions.
tgrant-nv May 2, 2024
221a150
Get rid of the sizeof_params lambda.
tgrant-nv May 9, 2024
b75b9eb
Rename the *_gpu functions *_cuda.
tgrant-nv May 9, 2024
6d75456
Move a bunch of functions from shading_cuda.cpp to optix_raytracer.cu.
tgrant-nv May 10, 2024
7938f5b
Use the right cast macro for a couple vector type conversions.
tgrant-nv May 11, 2024
8a69ba9
Factor out a trace_ray function.
tgrant-nv May 12, 2024
a70fe49
Use the normal BSDF constructors in add_bsdf_cuda.
tgrant-nv May 12, 2024
07e874e
Eliminate add_bsdf_cuda.
tgrant-nv May 12, 2024
64b405b
Add default CUDA implementations for BSDF::eval and BSDF::sample.
tgrant-nv May 12, 2024
169b071
Use a regular divide in the PDF calculation in CompositeBSDF::prepare().
tgrant-nv May 12, 2024
d0626af
Expand the comments about add_bsdf and get_albedo/sample/eval.
tgrant-nv May 13, 2024
a6dc8d3
Add common get_albedo, eval, and sample functions to CompositeBSDF.
tgrant-nv May 13, 2024
281074e
Get rid of the albedo/sample/eval helper functions.
tgrant-nv May 13, 2024
efa8c2c
Get rid of the add_bsdf CUDA wrapper.
tgrant-nv May 15, 2024
7d389df
Use regular stack objects instead of placement new for the BSDFs in e…
tgrant-nv Jun 10, 2024
3c70d81
Add back energy_compensation field that was lost in the shuffle.
tgrant-nv Jun 10, 2024
500a289
Use a unified version of SimpleRaytracer::subpixel_radiance, instead …
tgrant-nv Jun 11, 2024
064921d
Drop the library-style leading underscore names in upper_bound_cuda.
tgrant-nv Jun 13, 2024
88cd20c
Make process_medium_closure iterative.
tgrant-nv Jun 17, 2024
f329ba1
Make evaluate_layer_opacity iterative.
tgrant-nv Jun 17, 2024
2c60ac9
Make process_bsdf_closure iterative.
tgrant-nv Jun 17, 2024
f6d65d5
Make process_background_closure iterative.
tgrant-nv Jun 17, 2024
9b7c31c
Make the CUDA path use the host closure evaluation functions.
tgrant-nv Jun 17, 2024
7958fe1
"Fix" the closure id in the MX_CONDUCTOR_ID case in process_bsdf_clos…
tgrant-nv Jun 17, 2024
1a71f24
Get rid of the closure evaluation code in optix_raytracer.cu.
tgrant-nv Jun 17, 2024
33d3576
Switch the host path to basic ID-based dispatch.
tgrant-nv Jun 17, 2024
e73c4c3
Simplify 'tracedata' handling by using explicit object IDs. Encapsula…
tgrant-nv Jun 18, 2024
6bb82fe
Tuck TraceData into CudaScene::intersect.
tgrant-nv Jun 20, 2024
41ffaf9
Share SimpleRaytracer::eval_background between host and CUDA.
tgrant-nv Jun 20, 2024
ddcfd22
Get rid of the vestigial virtual function calls.
tgrant-nv Jul 16, 2024
02bb427
Use the Sphere and Quad intersect functions on the GPU.
tgrant-nv Jul 17, 2024
db46f09
Use the Sphere and Quad uv functions on the GPU.
tgrant-nv Jul 17, 2024
eb5a865
Use the Sphere and Quad sample and shapepdf functions on the GPU.
tgrant-nv Jul 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion src/include/OSL/platform.h
Original file line number Diff line number Diff line change
Expand Up @@ -488,7 +488,11 @@
/// to use regular assert() for this purpose if you need to eliminate the
/// dependency on this header from a particular place (and don't mind that
/// assert won't format identically on all platforms).
#ifndef NDEBUG
///
/// These macros are no-ops when compiling for CUDA because they were found
/// to cause strange issues in device code (e.g., function bodies being
/// eliminated when OSL_DASSERT is used).
#if !defined(NDEBUG) && !defined(__CUDACC__)
# define OSL_DASSERT OSL_ASSERT
# define OSL_DASSERT_MSG OSL_ASSERT_MSG
#else
Expand Down
18 changes: 14 additions & 4 deletions src/testrender/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,23 @@ if (OSL_USE_OPTIX)
)

# We need to make sure that the PTX files are regenerated whenever these
# headers change.
# files change.
set (testrender_cuda_headers
cuda/rend_lib.h)
cuda/rend_lib.h
background.h
optics.h
render_params.h
raytracer.h
sampling.h
shading.h
shading.cpp
simpleraytracer.cpp
cuda/vec_math.h
)

# Generate PTX for all of the CUDA files
foreach (cudasrc ${testrender_cuda_srcs})
NVCC_COMPILE ( ${cudasrc} "" ptx_generated "" )
NVCC_COMPILE ( ${cudasrc} "${testrender_cuda_headers}" ptx_generated "" )
list (APPEND ptx_list ${ptx_generated})
endforeach ()

Expand All @@ -48,7 +58,7 @@ if (OSL_USE_OPTIX)
list (APPEND ptx_list ${rend_lib_ptx})

add_custom_target (testrender_ptx ALL
DEPENDS ${ptx_list}
DEPENDS ${ptx_list} ${testrender_cuda_headers}
SOURCES ${testrender_cuda_srcs} )

# Install the PTX files in a fixed location so that they can be
Expand Down
158 changes: 145 additions & 13 deletions src/testrender/background.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,51 @@

OSL_NAMESPACE_ENTER


#ifdef __CUDACC__
// std::upper_bound is not supported in device code, so define a version of it here.
// Adapted from the LLVM Project, see https://llvm.org/LICENSE.txt for license information.
template<typename T>
inline OSL_HOSTDEVICE const T*
upper_bound_cuda(const T* data, int count, const T value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many times linear search will be faster on gpu :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience not really, though it would be worth benchmarking. Also I believe this is used for the background pdf which can be a few thousand elements long.

{
const T* first = data;
const T value_ = value;
int len = count;
while (len != 0) {
int l2 = len / 2;
const T* m = first;
m += l2;
if (value_ < *m)
len = l2;
else {
first = ++m;
len -= l2 + 1;
}
}
return first;
}
#endif


struct Background {
OSL_HOSTDEVICE
Background() : values(0), rows(0), cols(0) {}

OSL_HOSTDEVICE
~Background()
{
#ifndef __CUDACC__
delete[] values;
delete[] rows;
delete[] cols;
#endif
}

template<typename F, typename T> void prepare(int resolution, F cb, T* data)
template<typename F, typename T>
void prepare(int resolution, F cb, T* data)
{
// These values are set via set_variables() in CUDA
res = resolution;
if (res < 32)
res = 32; // validate
Expand All @@ -29,6 +63,7 @@ struct Background {
values = new Vec3[res * res];
rows = new float[res];
cols = new float[res * res];

for (int y = 0, i = 0; y < res; y++) {
for (int x = 0; x < res; x++, i++) {
values[i] = cb(map(x + 0.5f, y + 0.5f), data);
Expand All @@ -43,8 +78,9 @@ struct Background {
cols[i - res + x] /= cols[i - 1];
}
// normalize the pdf across all scanlines
for (int y = 0; y < res; y++)
for (int y = 0; y < res; y++) {
rows[y] /= rows[res - 1];
}

// both eval and sample below return a "weight" that is
// value[i] / row*col_pdf, so might as well bake it into the table
Expand All @@ -65,6 +101,7 @@ struct Background {
#endif
}

OSL_HOSTDEVICE
Vec3 eval(const Vec3& dir, float& pdf) const
{
// map from sphere to unit-square
Expand All @@ -90,6 +127,7 @@ struct Background {
return values[i];
}

OSL_HOSTDEVICE
Vec3 sample(float rx, float ry, Dual2<Vec3>& dir, float& pdf) const
{
float row_pdf, col_pdf;
Expand All @@ -101,8 +139,96 @@ struct Background {
return values[y * res + x];
}

#ifdef __CUDACC__
OSL_HOSTDEVICE
void set_variables(Vec3* values_in, float* rows_in, float* cols_in,
int res_in)
{
values = values_in;
rows = rows_in;
cols = cols_in;
res = res_in;
invres = __frcp_rn(res);
invjacobian = __fdiv_rn(res * res, float(4 * M_PI));
assert(res >= 32);
}

template<typename F>
OSL_HOSTDEVICE void prepare_cuda(int stride, int idx, F cb)
{
prepare_cuda_01(stride, idx, cb);
if (idx == 0)
prepare_cuda_02();
prepare_cuda_03(stride, idx);
}

// Pre-compute the 'values' table in parallel
template<typename F>
OSL_HOSTDEVICE void prepare_cuda_01(int stride, int idx, F cb)
{
for (int y = 0; y < res; y++) {
const int row_start = y * res;
const int row_end = row_start + res;
int i = row_start + idx;
for (int x = idx; x < res; x += stride, i += stride) {
if (i >= row_end)
continue;
values[i] = cb(map(x + 0.5f, y + 0.5f));
}
}
}

// Compute 'cols' and 'rows' using a single thread
OSL_HOSTDEVICE void prepare_cuda_02()
{
for (int y = 0, i = 0; y < res; y++) {
for (int x = 0; x < res; x++, i++) {
cols[i] = std::max(std::max(values[i].x, values[i].y),
values[i].z)
+ ((x > 0) ? cols[i - 1] : 0.0f);
}
rows[y] = cols[i - 1] + ((y > 0) ? rows[y - 1] : 0.0f);
// normalize the pdf for this scanline (if it was non-zero)
if (cols[i - 1] > 0) {
for (int x = 0; x < res; x++) {
cols[i - res + x] = __fdiv_rn(cols[i - res + x],
cols[i - 1]);
}
}
}
}

// Normalize the row PDFs and finalize the 'values' table
OSL_HOSTDEVICE void prepare_cuda_03(int stride, int idx)
{
// normalize the pdf across all scanlines
for (int y = idx; y < res; y += stride) {
rows[y] = __fdiv_rn(rows[y], rows[res - 1]);
}

// both eval and sample below return a "weight" that is
// value[i] / row*col_pdf, so might as well bake it into the table
for (int y = 0; y < res; y++) {
float row_pdf = rows[y] - (y > 0 ? rows[y - 1] : 0.0f);
const int row_start = y * res;
const int row_end = row_start + res;
int i = row_start + idx;
for (int x = idx; x < res; x += stride, i += stride) {
if (i >= row_end)
continue;
float col_pdf = cols[i] - (x > 0 ? cols[i - 1] : 0.0f);
const float divisor = __fmul_rn(__fmul_rn(row_pdf, col_pdf),
invjacobian);
values[i].x = __fdiv_rn(values[i].x, divisor);
values[i].y = __fdiv_rn(values[i].y, divisor);
values[i].z = __fdiv_rn(values[i].z, divisor);
}
}
}
#endif

private:
Dual2<Vec3> map(float x, float y) const
OSL_HOSTDEVICE Dual2<Vec3> map(float x, float y) const
{
// pixel coordinates of entry (x,y)
Dual2<float> u = Dual2<float>(x, 1, 0) * invres;
Expand All @@ -115,14 +241,20 @@ struct Background {
return make_Vec3(sin_phi * ct, sin_phi * st, cos_phi);
}

static float sample_cdf(const float* data, unsigned int n, float x,
unsigned int* idx, float* pdf)
static OSL_HOSTDEVICE float sample_cdf(const float* data, unsigned int n,
float x, unsigned int* idx,
float* pdf)
{
OSL_DASSERT(x >= 0);
OSL_DASSERT(x < 1);
OSL_DASSERT(x >= 0.0f);
OSL_DASSERT(x < 1.0f);
#ifndef __CUDACC__
*idx = std::upper_bound(data, data + n, x) - data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason to keep using the std:: version on CPU (the function is simple enough and I'm not aware of any c++ standard library that does anything fancier than your implementation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it as CUDA-only for now to possibly do some linear search experiments, to see if that is a win on the GPU.

#else
*idx = upper_bound_cuda(data, n, x) - data;
#endif
OSL_DASSERT(*idx < n);
OSL_DASSERT(x < data[*idx]);

float scaled_sample;
if (*idx == 0) {
*pdf = data[0];
Expand All @@ -137,12 +269,12 @@ struct Background {
return std::min(scaled_sample, 0.99999994f);
}

Vec3* values; // actual map
float* rows; // probability of choosing a given row 'y'
float* cols; // probability of choosing a given column 'x', given that we've chosen row 'y'
int res; // resolution in pixels of the precomputed table
float invres; // 1 / resolution
float invjacobian;
Vec3* values = nullptr; // actual map
float* rows = nullptr; // probability of choosing a given row 'y'
float* cols = nullptr; // probability of choosing a given column 'x', given that we've chosen row 'y'
int res = -1; // resolution in pixels of the precomputed table
float invres = 0.0f; // 1 / resolution
float invjacobian = 0.0f;
};

OSL_NAMESPACE_EXIT
Loading