Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant instructions from tile hash #3898

Merged

Conversation

curtisblack
Copy link
Contributor

Description

Optimises the code in the tile hash function. During a render this function is called billions of times so this small improvement adds up, in some test scenes it reduces total render time by an average of about 2%.

See https://godbolt.org/z/jMzz4PTce for the exact details. A summary would be:

  • Reduced instruction count from 75 to 61, this is due to the redundant add and shift operations that can be implicitely handled by taking advantage of the member layout in the class.
  • Halves the number of memory fetch instructions because all accesses are now 64-bit.
  • Removes branching instructions and extra memory indirection inside ustring::hash().

There is a slight change in behaviour, which is that the hash is now computed from the m_file pointer rather than from m_file->filename().hash(), so runtime reproducibility is lost. However the performance gain seems worth it.

Tests

Checklist:

  • I have read the contribution guidelines.
  • If this is more extensive than a small change to existing code, I
    have previously submitted a Contributor License Agreement
    (individual, and if there is any way my
    employers might think my programming belongs to them, then also
    corporate).
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite
    (adding new test cases if necessary).
  • My code follows the prevailing code style of this project.

@@ -590,6 +594,7 @@ struct TileID {
int m_miplevel; ///< MIP-map level
short m_chbegin, m_chend; ///< Channel range
int m_colortransformid; ///< Colorspace id (0 == default)
int m_padding = 0; ///< Unused
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be evaluated separately, but maybe instead of adding 32-bits of padding, m_miplevel and m_colortransformid could maybe be changed to 16-bit to shave off an extra 8 bytes?

I'd be at least curious to know if that saves any measurable amount of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I didn't necessarily wish to bundle that type of change into this PR but I also thought something similar.

Unfortunately I don't have a build with all the recent color space additions to test performance. The performance was measured on v2.4.6, prior to the addition of m_colortransformid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can save this for another pass, but it seems like a good idea to me.

Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great idea.
I will merge when CI finishes.

@lgritz
Copy link
Collaborator

lgritz commented Jul 12, 2023

@curtisblack Please look at the failed CI test, which implicates this code. The one test that fails is when using gcc12, presumably triggering a new warning that didn't exist in older gcc versions. It's behaving as if there is an uninitialized field in the TileID struct, but looking at your code I don't see why that would be the case. Can you please take a close look and see if you can convince yourself whether the warning is correct (and fix) or if it's some kind of false positive (in which case you should probably use some kind of pragma to locally disable it where it is causing trouble, so that we can continue to pass CI).

@curtisblack
Copy link
Contributor Author

Thanks for checking @lgritz, as far as I can tell it does appear to be a false positive that only seems to affect GCC. Clang doesn't see this as a warning or error. I will look into locally disabling this specific warning for that part of the code.

See https://godbolt.org/z/5q7Y7ndfb for a comparison with clang.

It might also be showing in that CI build because it enables -Werror, The bug does also seem to affect earlier versions of GCC but without the -Werror flag they pass CI.

I did some additional testing, using the testtex --testhash tool (and increasing the test iterations to 1e9 for a sturdy comparison) this PR increases the hash rate from 361 to 495 Mhash/sec on my machine.

// This appears to be a false positive which only affects GCC.
// https://godbolt.org/z/5q7Y7ndfb
OIIO_PRAGMA_WARNING_PUSH
OIIO_GCC_ONLY_PRAGMA(GCC diagnostic ignored "-Wuninitialized")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still failing. Based on the error message, I think this might need to be -Wmaybe-uninitialized?

@curtisblack curtisblack changed the base branch from master to release July 23, 2023 23:57
@curtisblack curtisblack changed the base branch from release to master July 23, 2023 23:57
@lgritz
Copy link
Collaborator

lgritz commented Jul 24, 2023

I think I have the right fix, @curtisblack -- the pragma needs to be in hash.h, immediately surrounding the line where it thinks there is a problem (and the other spots were not needed, and I think also were turning off the wrong warning). I've taken the liberty of pushing my fix on top of yours. We'll see, if it passes CI then I'll merge the combined patch. Cross fingers...

@lgritz lgritz merged commit d757bff into AcademySoftwareFoundation:master Jul 25, 2023
21 of 23 checks passed
@lgritz
Copy link
Collaborator

lgritz commented Jul 25, 2023

Relevant tests passed, warning squashed, so I merged!

lgritz added a commit that referenced this pull request Oct 7, 2023
…4006)

A static_assert was added recently as part of PR #3898, which assures
that the size of a TileID is a multiple of 8 bytes. But on architectures
with 32 bit pointers, this assertion fails. Which is not quite what we
want, because the code isn't wrong if the struct isn't a multiple of 8
bytes; it's just less efficient. And we don't want to fail for 32 bit
arch.

Bottom line: only do this check to ensure we're in the most efficient
case if we're 64 bits. Nobody's expecting highest performance on a 32
bit machine anyway.

Fixes #4001

Signed-off-by: Larry Gritz <[email protected]>
lgritz added a commit to lgritz/OpenImageIO that referenced this pull request Oct 7, 2023
…cademySoftwareFoundation#4006)

A static_assert was added recently as part of PR AcademySoftwareFoundation#3898, which assures
that the size of a TileID is a multiple of 8 bytes. But on architectures
with 32 bit pointers, this assertion fails. Which is not quite what we
want, because the code isn't wrong if the struct isn't a multiple of 8
bytes; it's just less efficient. And we don't want to fail for 32 bit
arch.

Bottom line: only do this check to ensure we're in the most efficient
case if we're 64 bits. Nobody's expecting highest performance on a 32
bit machine anyway.

Fixes AcademySoftwareFoundation#4001

Signed-off-by: Larry Gritz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants