-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GafferCycles : Refactor the problematic reset code #5101
GafferCycles : Refactor the problematic reset code #5101
Conversation
4d532ee
to
b71defb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alex, it definitely seems like this is something we need to simplify. I don't really understand what we're doing here still though. Could you step back a bit and explain clearly what the problem is (what constraints Cycles is placing on us), and what our strategy is?
Implementationwise, I wonder if these are possible and would help with clarity?
- Could we construct the session only once, the first time
render()
is called, so everything is deferred till that moment, instead of trying to reset things that were done earlier? - Could we query Cycles for what is changeable and what is not, so we're not duplicating information?
Some test coverage for all this would be pretty useful too...
{ | ||
// Make sure to apply any scene/session parameters first so we only hard-reset once | ||
// before any rendering starts as we cannot set these after rendering starts. | ||
if( m_renderState != RENDERSTATE_RENDERING ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intention of this check? That we will only ever call updateSessionParams()
once? If so, how does that interact with the code in progress()
that sets m_renderState = RENDERSTATE_STOPPED;
. The comment there seems to imply that will happen whenever an interactive render converges and stops sampling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alex, it definitely seems like this is something we need to simplify. I don't really understand what we're doing here still though. Could you step back a bit and explain clearly what the problem is (what constraints Cycles is placing on us), and what our strategy is?
Implementationwise, I wonder if these are possible and would help with clarity?
Yep, so the goals of this draft PR (keyword draft! :) ) was to tackle the windows deadlock from @ericmehl 's #5100, and the comments in there probably explain my beginning process the best (copy-pasted):
There was talks/suggestions before that we could defer Cycles' scene/session creation to the first call to render()
so we don't have the reset at all, but the current code as it stands needs the first reset so that the desired scene/session parameters can be set before rendering starts. I've not started this yet and the annoying thing is any calls to option()
needs to be caching all of these somewhere without scene/session existing yet, and maybe other things I'm not thinking of.
- Could we construct the session only once, the first time
render()
is called, so everything is deferred till that moment, instead of trying to reset things that were done earlier?
I eventually want to move it to render()
if possible but I still think it'll be a big headache before that. The biggest one is the need for scene
to exist when converting objects and shaders eg. scene->shader_manager
usage, and I know when we get VDB's to work again, scene->image_manager
is needed as well. This PR just has better intentions where we only allow all session/scene parameter changes before rendering starts, and not the possibility of those changing while interactive rendering like it was intended in the past. The warning in option()
was just a print-out and nothing more.
- Could we query Cycles for what is changeable and what is not, so we're not duplicating information?
https://github.com/blender/cycles/blob/master/src/session/session.h#L90 - this is the best I've found, the omission of some parameters in the modified() call.
Some test coverage for all this would be pretty useful too...
After draft status no worries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intention of this check? That we will only ever call
updateSessionParams()
once? If so, how does that interact with the code inprogress()
that setsm_renderState = RENDERSTATE_STOPPED;
. The comment there seems to imply that will happen whenever an interactive render converges and stops sampling.
Alright going over the enum states I've just now made a bool m_rendering
, the states were needed pre-CyclesX when a finished interactive render would cease to update, but that's not needed anymore.
@@ -2500,6 +2488,10 @@ IECore::InternedString g_textureBlurGlossyOptionName( "cycles:texture:blur_gloss | |||
IECore::InternedString g_textureUseCustomCachePathOptionName( "cycles:texture:use_custom_cache_path" ); | |||
IECore::InternedString g_textureCustomCachePathOptionName( "cycles:texture:custom_cache_path" ); | |||
|
|||
// Some of the session parameters that can be changed in interactive rendering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of? Does that mean there are others? Is it possible to query what is changeable and what is not via the Cycles API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I know of, the best place is the modified function's omitted ones to flag:
https://github.com/blender/cycles/blob/master/src/session/session.h#L90
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, could we just build a new set of session parameters, and then call modified()
to see if we can apply them? Rather than duplicating the logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I've got something in the OPTION
macro to do the error print now which simplifies things quite a bit.
OPTION(bool, m_sessionParams, g_useProfilingOptionName, use_profiling); | ||
OPTION(bool, m_sessionParams, g_useAutoTileOptionName, use_auto_tile); | ||
OPTION(int, m_sessionParams, g_tileSizeOptionName, tile_size); | ||
OPTION(bool, m_sessionParams, g_useResolutionDividerOptionName, use_resolution_divider); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing new options in this commit doesn't help with clarity.
// Check if an OSL shader exists & set the shadingsystem | ||
if( m_sessionParams.shadingsystem == ccl::SHADINGSYSTEM_SVM && m_shaderCache->hasOSLShader() ) | ||
{ | ||
IECore::msg( IECore::Msg::Warning, "CyclesRenderer", "OSL Shader detected, forcing OSL shading-system (CPU-only)" ); | ||
m_sessionParams.shadingsystem = ccl::SHADINGSYSTEM_OSL; | ||
m_sceneParams.shadingsystem = ccl::SHADINGSYSTEM_OSL; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is useful. It doesn't stop someone starting a render in SVM and then adding an OSL shader later on. And it doesn't look like it actually switches to CPU mode either? I think we would be better off not trying to be clever here, and just rely on the warning in updateSceneObjects()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to init()
will use the fallback to CPU code, and later when I think 3.5 lands in with Optix allowing it, we can keep the device choosing code in init()
was the intention.
Not too fussed about this call to remove the cleverness, I wrote it originally when SVM was the default and I didn't want the shaderball to be useless.
IECore::msg( IECore::Msg::Error, "CyclesRenderer", "OSL Shader detected, this will cause problems in a running interactive render" ); | ||
m_shaderCache->removeOSLShaders(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What problems will this cause? This seems like something we should definitely have some test coverage for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I remember it crashed it but I forget. I can make a test for it sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No crashing now with me playing with the viewport and added a unit test for it also, so I removed the removeOSLShaders();
m_sceneParams.shadingsystem = ccl::SHADINGSYSTEM_OSL; | ||
} | ||
|
||
// If anything changes in scene or session, we reset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the description of this work, I assumed we weren't doing any resets, so I'm confused by this comment. I don't think we've ever clearly written down what constraints we're dealing with (from the Cycles and Renderer APIs), what we're trying to achieve, and what our strategy is. Can you do that?
|
||
init(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not doing resets any more, why does init()
still make a new Session, with the trick to reuse the memory address of the old one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only does it before rendering starts and not during rendering. There's still roadblocks to consider before moving the init()
to render()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the tricky to reuse the memory address is redundant now, and can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a way to avoid these tags I think so:
https://github.com/GafferHQ/gaffer/blob/main/src/GafferCycles/IECoreCyclesPreview/Renderer.cpp#L2009
A forward declare of CyclesRender
and giving it a session or scene getter didn't work, not too sure on what other solutions to have here - is there another way to find out if an object got a transform or attribute change from the Renderer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see that nodes have a is_modified()
now which can defer to the CyclesRenderer
, that's good.
b71defb
to
fcd9dec
Compare
a255ab2
to
293d760
Compare
293d760
to
8307715
Compare
8307715
to
a984209
Compare
Hey John I've made a few changes since we last looked at it, there's 2 unit tests which prod at what this MR ultimately solves (those pesky params on Scene/Session classes before rendering).
A refresher as it's been awhile since I looked at this myself and I'll try to condense/simplify: Some scene/session parameters cannot be changed once rendering starts is the constraint, but scene/session needs to be created before those options get set via the Gaffer render interface. This code refactors the previous assumption that we could change these post-live rendering by re-creating session/scene and transplanting stuff back in aka the reset codepath. In doing so we avoid many problems, with one being that the mutex for the scene doesn't get deleted while it is locked which causes deadlocks on windows, as well as things like #5234 due to stale pointers to shaders/lights that existed in the previous scene but would be re-created and have different pointers post-reset. Some of the reset code needs to stay but only so that we can set scene/session parameters before rendering starts and before any scene-related objects can be placed into cycles itself, avoiding any dodgy transplanting or stale pointer fixing. Removing the reset code entirely and making scene/session be created in I think this would be a good one to land in soon as it solves #5234 and deadlocks on windows (can't verify that without some help). To get this over the line I think I'll need some Windows assistance from @ericmehl or whoever wants to see if those deadlocks still exist with this patch. Cheers |
Hey @boberfly, 🪟 🪟 = 👍 ! |
- Refactor the problematic reset code as we now only allow scene/session parameters to be set before rendering starts. - This will also avoid a mutex lock issue with Windows. - Remove render states and just have an m_rendering bool - Removed `removeOSLShaders()`, Cycles seems more hardy at not crashing when this happens now
a984209
to
1b2f51d
Compare
While working on some production assets, I've been putting Cycles through its paces and passed it through a debugger to discover that the reset code also needs to change ownership of all the objects to the newly created scene to prevent assertions when freeing them later. I've sometimes got it to crash in Embree+TBB with an invalid pointer, but that one is hard to narrow down, as well as a crash in OIIO. I'll keep at it... |
As the new test (courtesy of Alex via GafferHQ#5101) shows, we don't need to delete OSL shaders when in SVM mode to avoid Cycles crashes (but perhaps this was not always the case?). This avoids a linear-in-the-number-of-shaders check on every render, and simplifies some logic. The outcome is now no longer dependent on whether the SVM/OSL mismatch occurs before or after rendering has started : in both cases we now just give Cycles what the user asked for, and let it ignore shaders it can't handle. We could really do with a warning when there are OSL shaders in an SVM render, but that will be easier to achieve without expensive checks after a bit more refactoring. Look out for it in an upcoming commit.
This was an attempt to deal with a fundamental mismatch between the Cycles and Renderer APIs : you can't create a `ccl::session` without already having `ccl::SessionParams` (threads, shading system, device etc), but the values for those will arrive via `Renderer::option()` _after_ construction of the renderer. And unfortunately the Renderer constructor is the most natural place to construct the `ccl::Session`. Our strategy was to create a default session in the renderer constructor, and then store options into `m_sessionParams`, and then in `reset()` attempt to replace the original session with a new one constructed from `m_sessionParams`. This was made tricky by the fact that the session also owns the scene, and we needed to transplant all the old scene objects into the new scene. This almost worked, but had several flaws : - On Windows, we were getting deadlocks due to deleting the scene lock while holding it. This could have been fixed separately while keeping the reset concept alive - see GafferHQ#5101. - The Cycles objects we transplanted had pointers to the old scene, and this would trigger assertions when freeing them in debug builds. I suspect they were also the cause of the crash exposed by `InteractiveCyclesRenderTest.testShadingModes` - with the reset code removed, the test now passes. - More generally, it was working against the Cycles API, and could be broken in any number of ways by future Cycles changes. Just removing the `reset()` code isn't the end of the story though : it means that the options for devices, threads etc are now completely non-functioning. This results in test failures for `RendererTest.testCustomAOV` and `RendererTest.testOSLInSVMShadingSystem`. We'll deal with that in future commits : the purpose of this commit is to document why the old approach wasn't working, and clear the decks a little in preparation for a new approach.
As the new test (courtesy of Alex via GafferHQ#5101) shows, we don't need to delete OSL shaders when in SVM mode to avoid Cycles crashes (but perhaps this was not always the case?). This avoids a linear-in-the-number-of-shaders check on every render, and simplifies some logic. The outcome is now no longer dependent on whether the SVM/OSL mismatch occurs before or after rendering has started : in both cases we now just give Cycles what the user asked for, and let it ignore shaders it can't handle. We could really do with a warning when there are OSL shaders in an SVM render, but that will be easier to achieve without expensive checks after a bit more refactoring. Look out for it in an upcoming commit.
As the new test (courtesy of Alex via GafferHQ#5101) shows, we don't need to delete OSL shaders when in SVM mode to avoid Cycles crashes (but perhaps this was not always the case?). This avoids a linear-in-the-number-of-shaders check on every render, and simplifies some logic. The outcome is now no longer dependent on whether the SVM/OSL mismatch occurs before or after rendering has started : in both cases we now just give Cycles what the user asked for, and let it ignore shaders it can't handle. We could really do with a warning when there are OSL shaders in an SVM render, but that will be easier to achieve without expensive checks after a bit more refactoring. Look out for it in an upcoming commit.
This was an attempt to deal with a fundamental mismatch between the Cycles and Renderer APIs : you can't create a `ccl::session` without already having `ccl::SessionParams` (threads, shading system, device etc), but the values for those will arrive via `Renderer::option()` _after_ construction of the renderer. And unfortunately the Renderer constructor is the most natural place to construct the `ccl::Session`. Our strategy was to create a default session in the renderer constructor, and then store options into `m_sessionParams`, and then in `reset()` attempt to replace the original session with a new one constructed from `m_sessionParams`. This was made tricky by the fact that the session also owns the scene, and we needed to transplant all the old scene objects into the new scene. This almost worked, but had several flaws : - On Windows, we were getting deadlocks due to deleting the scene lock while holding it. This could have been fixed separately while keeping the reset concept alive - see GafferHQ#5101. - The Cycles objects we transplanted had pointers to the old scene, and this would trigger assertions when freeing them in debug builds. I suspect they were also the cause of the crash exposed by `InteractiveCyclesRenderTest.testShadingModes` - with the reset code removed, the test now passes. - More generally, it was working against the Cycles API, and could be broken in any number of ways by future Cycles changes. Just removing the `reset()` code isn't the end of the story though : it means that the options for devices, threads etc are now completely non-functioning. This results in test failures for `RendererTest.testCustomAOV` and `RendererTest.testOSLInSVMShadingSystem`. We'll deal with that in future commits : the purpose of this commit is to document why the old approach wasn't working, and clear the decks a little in preparation for a new approach.
This was an attempt to deal with a fundamental mismatch between the Cycles and Renderer APIs : you can't create a `ccl::session` without already having `ccl::SessionParams` (threads, shading system, device etc), but the values for those will arrive via `Renderer::option()` _after_ construction of the renderer. And unfortunately the Renderer constructor is the most natural place to construct the `ccl::Session`. Our strategy was to create a default session in the renderer constructor, and then store options into `m_sessionParams`, and then in `reset()` attempt to replace the original session with a new one constructed from `m_sessionParams`. This was made tricky by the fact that the session also owns the scene, and we needed to transplant all the old scene objects into the new scene. This almost worked, but had several flaws : - On Windows, we were getting deadlocks due to deleting the scene lock while holding it. This could have been fixed separately while keeping the reset concept alive - see GafferHQ#5101. - The Cycles objects we transplanted had pointers to the old scene, and this would trigger assertions when freeing them in debug builds. I suspect they were also the cause of the crash exposed by `InteractiveCyclesRenderTest.testShadingModes` - with the reset code removed, the test now passes. - More generally, it was working against the Cycles API, and could be broken in any number of ways by future Cycles changes. Just removing the `reset()` code isn't the end of the story though : it means that the options for devices, threads etc are now completely non-functioning. This results in test failures for `RendererTest.testCustomAOV` and `RendererTest.testOSLInSVMShadingSystem`. We'll deal with that in future commits : the purpose of this commit is to document why the old approach wasn't working, and clear the decks a little in preparation for a new approach.
This was an attempt to deal with a fundamental mismatch between the Cycles and Renderer APIs : you can't create a `ccl::session` without already having `ccl::SessionParams` (threads, shading system, device etc), but the values for those will arrive via `Renderer::option()` _after_ construction of the renderer. And unfortunately the Renderer constructor is the most natural place to construct the `ccl::Session`. Our strategy was to create a default session in the renderer constructor, and then store options into `m_sessionParams`, and then in `reset()` attempt to replace the original session with a new one constructed from `m_sessionParams`. This was made tricky by the fact that the session also owns the scene, and we needed to transplant all the old scene objects into the new scene. This almost worked, but had several flaws : - On Windows, we were getting deadlocks due to deleting the scene lock while holding it. This could have been fixed separately while keeping the reset concept alive - see GafferHQ#5101. - The Cycles objects we transplanted had pointers to the old scene, and this would trigger assertions when freeing them in debug builds. I suspect this was also the cause of the crash exposed by `InteractiveCyclesRenderTest.testSVMRenderWithCPU`. - More generally, it was working against the Cycles API, and could be broken in any number of ways by future Cycles changes. Just removing the `reset()` code isn't the end of the story though : it means that the options for devices, threads etc are now completely non-functioning. This results in test failures for `RendererTest.testCustomAOV` and `RendererTest.testOSLInSVMShadingSystem`. We'll deal with that in future commits : the purpose of this commit is to document why the old approach wasn't working, and clear the decks a little in preparation for a new approach.
This was an attempt to deal with a fundamental mismatch between the Cycles and Renderer APIs : you can't create a `ccl::session` without already having `ccl::SessionParams` (threads, shading system, device etc), but the values for those will arrive via `Renderer::option()` _after_ construction of the renderer. And unfortunately the Renderer constructor is the most natural place to construct the `ccl::Session`. Our strategy was to create a default session in the renderer constructor, and then store options into `m_sessionParams`, and then in `reset()` attempt to replace the original session with a new one constructed from `m_sessionParams`. This was made tricky by the fact that the session also owns the scene, and we needed to transplant all the old scene objects into the new scene. This almost worked, but had several flaws : - On Windows, we were getting deadlocks due to deleting the scene lock while holding it. This could have been fixed separately while keeping the reset concept alive - see GafferHQ#5101. - The Cycles objects we transplanted had pointers to the old scene, and this would trigger assertions when freeing them in debug builds. I suspect this was also the cause of the crash exposed by `InteractiveCyclesRenderTest.testSVMRenderWithCPU`. - More generally, it was working against the Cycles API, and could be broken in any number of ways by future Cycles changes. Just removing the `reset()` code isn't the end of the story though : it means that the options for devices, threads etc are now completely non-functioning. This results in test failures for `RendererTest.testCustomAOV` and `RendererTest.testOSLInSVMShadingSystem`. We'll deal with that in future commits : the purpose of this commit is to document why the old approach wasn't working, and clear the decks a little in preparation for a new approach.
This was an attempt to deal with a fundamental mismatch between the Cycles and Renderer APIs : you can't create a `ccl::session` without already having `ccl::SessionParams` (threads, shading system, device etc), but the values for those will arrive via `Renderer::option()` _after_ construction of the renderer. And unfortunately the Renderer constructor is the most natural place to construct the `ccl::Session`. Our strategy was to create a default session in the renderer constructor, and then store options into `m_sessionParams`, and then in `reset()` attempt to replace the original session with a new one constructed from `m_sessionParams`. This was made tricky by the fact that the session also owns the scene, and we needed to transplant all the old scene objects into the new scene. This almost worked, but had several flaws : - On Windows, we were getting deadlocks due to deleting the scene lock while holding it. This could have been fixed separately while keeping the reset concept alive - see GafferHQ#5101. - The Cycles objects we transplanted had pointers to the old scene, and this would trigger assertions when freeing them in debug builds. I suspect this was also the cause of the crash exposed by `InteractiveCyclesRenderTest.testSVMRenderWithCPU`. - More generally, it was working against the Cycles API, and could be broken in any number of ways by future Cycles changes. Just removing the `reset()` code isn't the end of the story though : it means that the options for devices, threads etc are now completely non-functioning. This results in test failures for `RendererTest.testCustomAOV` and `RendererTest.testOSLInSVMShadingSystem`. We'll deal with that in future commits : the purpose of this commit is to document why the old approach wasn't working, and clear the decks a little in preparation for a new approach.
This was an attempt to deal with a fundamental mismatch between the Cycles and Renderer APIs : you can't create a `ccl::session` without already having `ccl::SessionParams` (threads, shading system, device etc), but the values for those will arrive via `Renderer::option()` _after_ construction of the renderer. And unfortunately the Renderer constructor is the most natural place to construct the `ccl::Session`. Our strategy was to create a default session in the renderer constructor, and then store options into `m_sessionParams`, and then in `reset()` attempt to replace the original session with a new one constructed from `m_sessionParams`. This was made tricky by the fact that the session also owns the scene, and we needed to transplant all the old scene objects into the new scene. This almost worked, but had several flaws : - On Windows, we were getting deadlocks due to deleting the scene lock while holding it. This could have been fixed separately while keeping the reset concept alive - see GafferHQ#5101. - The Cycles objects we transplanted had pointers to the old scene, and this would trigger assertions when freeing them in debug builds. I suspect this was also the cause of the crash exposed by `InteractiveCyclesRenderTest.testSVMRenderWithCPU`. - More generally, it was working against the Cycles API, and could be broken in any number of ways by future Cycles changes. Just removing the `reset()` code isn't the end of the story though : it means that the options for devices, threads etc are now completely non-functioning. This results in test failures for `RendererTest.testCustomAOV` and `RendererTest.testOSLInSVMShadingSystem`. We'll deal with that in future commits : the purpose of this commit is to document why the old approach wasn't working, and clear the decks a little in preparation for a new approach.
Closing, since this should now be covered by the merge of #5711. |
Generally describe what this PR will do, and why it is needed
Related issues
Dependencies
Breaking changes
Checklist
This will probably need a bit of testing before I'd be confident in merging, this could break interactive scene updating.