Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OptiX testrender overhaul #1829

Draft
wants to merge 110 commits into
base: main
Choose a base branch
from

Conversation

tgrant-nv
Copy link
Contributor

Description

This PR adds support for full path tracing in the OptiX mode of testrender, including full BSDF sampling and evaluation. The benefit of this change is that we're able to render most of the scenes from the testsuite in OptiX mode with results that closely match the host output. This comes at the cost of increased coupling between the host and OptiX renderers, and therefore an increased maintenance burden.

ID-based dispatch

The main difference between the host and OptiX paths is how the individual BSDFs are evaluated in the CompositeBSDF class. Virtual function calls aren't well supported in OptiX, so rather than using regular C++ polymorphism to invoke the sample(), eval(), and get_albedo() functions for each of the BSDF sub-types, we manually invoke the correct function based on the closure ID (which we have added as a member of the BSDF class).

// from shading_cuda.cpp

#define BSDF_CAST(BSDF_TYPE, bsdf) reinterpret_cast<const BSDF_TYPE*>(bsdf)

OSL_HOSTDEVICE Color3
CompositeBSDF::get_bsdf_albedo(const BSDF* bsdf, const Vec3& wo) const
{
    ...
    switch (bsdf->id) {
    case DIFFUSE_ID:
        albedo = BSDF_CAST(Diffuse<0>, bsdf)->get_albedo(wo);
        break;
    case TRANSPARENT_ID:
    case MX_TRANSPARENT_ID:
        albedo = BSDF_CAST(Transparent, bsdf)->get_albedo(wo);
        break;
    ...

Iterative closure evaluation

Another key difference from the host path is the non-recursive closure evaluation. We retain the same style of iterative tree traversal used in the previous OptiX version of process_closure(). This PR also adds evaluate_layer_opacity(), process_medium_closure(), process_background_closure(), which follow the same evaluation pattern.

subpixel_radiance()

The raytracing pipeline mirrors the host code very closely, including camera ray generation and the spawning of secondary rays. This allows a close visual match between the host and OptiX modes.

We've implemented a CUDA version of subpixel_radiance() (in optix_raytracer.cu) that closely mirrors the host version, with the main difference being in how rays are traced and how the shaders are executed. It might be possible to unify the implementations if it would ease the maintenance burden, but for now it seemed cleaner to leave them separate.

Background sampling

We've included support for background closures. This includes a CUDA implementation of the Background::prepare() function. We've broken that function into three phases, where phases 1 and 3 are parallelized across a warp and phase 2 is executed on a single thread. This offers a decent speedup over a single-threaded implementation without the complexity of a more sophisticated implementation.

    // from background.h
    
    template<typename F>
    OSL_HOSTDEVICE void prepare_cuda(int stride, int idx, F cb)
    {
        prepare_cuda_01(stride, idx, cb);
        if (idx == 0)
            prepare_cuda_02();
        prepare_cuda_03(stride, idx);
    }

Tests

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format v17 before submitting, I definitely will look at
    the CI test that runs clang-format and fix anything that it highlights as
    being nonconforming.

…or CUDA mipmapped textures in testrender. Update the signature of osl_tex2DLookup in testshade.
// Adapted from Sphere::sample in ../raytracer.h
static __device__ float3
sample_sphere(const Vec3& x, const SphereParams& sphere, float xi, float yi,
float& pdf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can call the original code to avoid duplication? Even if creating a temporary Sphere, the compiler should optimize out cruft.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a possibility I had considered. I'll see how well it works in practice.

Sample sample(const Vec3& /*wo*/, float rx, float ry,
float /*rz*/) const override
OSL_HOSTDEVICE Sample sample(const Vec3& /*wo*/, float rx, float ry,
float /*rz*/) const OSL_HOSTDEVICE_OVERRIDE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this OSL_HOSTDEVICE_OVERRIDE mean you are keeping the virtual calls on CPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good catch. I'll get rid of the virtual calls.

sample = BSDF_CAST(Diffuse<1>, bsdf)->eval(wo, wi);
break;
case PHONG_ID: sample = BSDF_CAST(Phong, bsdf)->eval(wo, wi); break;
case WARD_ID: sample = BSDF_CAST(Ward, bsdf)->eval(wo, wi); break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to wrap my mind around this, is this switch/case particular to CPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, The CompositeBSDF::get_albedo, CompositeBSDF::eval, and CompositeBSDF::sample functions are shared between the CPU and CUDA paths.

@tgrant-nv tgrant-nv marked this pull request as draft July 24, 2024 17:39
@tgrant-nv
Copy link
Contributor Author

I've converted this PR to a draft, pending Chris's mesh support.

@fpsunflower, I should point out that testrender does already run in OptiX mode, but it only supports simple diffuse shading. This PR is to enable full path tracing. I can definitely help out with the OptiX side of mesh support, so don't hesitate to reach out.

@fpsunflower
Copy link
Contributor

Will do! I've pushed my branch so far to https://github.com/fpsunflower/OpenShadingLanguage/tree/testrender-triangles in case you want to take a look. I was working on this on my mac, so the existing Optix path is most likely broken.

We should definitely circle back on this after Siggraph. There's a few more loose ends to fix up around derivatives (which are technically not set up right in the current implementation either).

@fpsunflower
Copy link
Contributor

Also, that branch adds a few scene to the testsuite, but github complained about the files being too large -- we'll probably want to remove those from the history before merging and find smaller models to use instead.

@tgrant-nv
Copy link
Contributor Author

Sounds good. I'll check out your branch and see what needs to be done to get OptiX up and running again. This PR includes a refactor of the OptixRaytracer class that we can fold into the mesh effort.

@lgritz
Copy link
Collaborator

lgritz commented Jul 24, 2024

Also, that branch adds a few scene to the testsuite, but github complained about the files being too large -- we'll probably want to remove those from the history before merging and find smaller models to use instead.

Much like how OIIO has oiio-images and OpenEXR has openexr-images, OSL could also have a second repo just for "big things only used for tests" so it doesn't clutter the main repo or make a big download for people who just want to inspect or build the code. The testing scripts can enable certain tests only if that other repo is found at build time in a sibling directory,

@tgrant-nv
Copy link
Contributor Author

@fpsunflower, please check out the testrender-triangles-optix branch from my fork to see what I've done to get the OptiX path working. It includes most of the host-side refactoring from this PR. Having only one primitive type is a nice simplification.

@fpsunflower
Copy link
Contributor

I haven't go through your branch in great detail, but it sounds like you got everything working already which is great :) What would you say is the best path forward in terms of sequencing the changes into the main repo?

I was thinking I should probably put forward a first PR to move testrender over to triangles. It would temporarily break the optix side, but it looks we could probably follow up quickly with your changes.

@lgritz would that be ok? Or would you prefer if Tim and I coordinate to make a single PR that submits both things in one go?

@lgritz
Copy link
Collaborator

lgritz commented Aug 21, 2024

I'm ok doing it in stages. It's ok to temporarily break as long as we are confident that the part 2 is coming right on its heels.

@fpsunflower
Copy link
Contributor

Sounds good. When I get some cycles I will polish off my end of the branch and submit a PR.

Do you want me to prepare splitting out the shaderball test into its own repo (in preparation of maybe having more big obj files and textures)?

With maps/geo, the current render-shaderball folder is 61Mb. For comparison, the rest of the testsuite folder is ~42Mb, so its more than doubling the size for just one test.

At the same time, I'm not sure its really worth the hassle to create a second repo + extra cmake hoops to jump through (same amount of data to download in the end, and a slightly clunkier workflow). We'll just have to be somewhat disciplined in what we add to the testsuite.

I can also punt and skip adding that particular test in the initial PR.

@lgritz
Copy link
Collaborator

lgritz commented Aug 22, 2024

Do you want me to prepare splitting out the shaderball test into its own repo (in preparation of maybe having more big obj files and textures)?

With maps/geo, the current render-shaderball folder is 61Mb. For comparison, the rest of the testsuite folder is ~42Mb, so its more than doubling the size for just one test.

At the same time, I'm not sure its really worth the hassle to create a second repo + extra cmake hoops to jump through (same amount of data to download in the end, and a slightly clunkier workflow). We'll just have to be somewhat disciplined in what we add to the testsuite.

I feel like with the addition of meshes and surely other features to come, it's just a matter of time before we're going to want to add a number of bigger tests. So I'm inclined to bite the bullet and just set it up now. Most tests will continue to live in the main repo, but large data files referenced by tests -- say, more than a couple MB -- should live in the test data repo.

This is analogous to the OpenEXR-images or OpenImageIO-images repos, which are used to store large things used only for tests, to keep them from cluttering the main repo or forcing people to download and store them merely to build the main project.

Maybe ask John to create a new repo for this purpose. OpenShadingLanguage-tests? OpenShadingLanguage-data? -testdata? -models? -scenes? I'm open to suggestions.

@tgrant-nv tgrant-nv mentioned this pull request Sep 21, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants