Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 4598 - Copy attribute syntax hashtables to each worker thread #5693

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mreynolds389
Copy link
Contributor

Description:

There is a lot of contention around the rwlocks for the attribute syntax hashtables. Since syntaxes rarely change we can just keep a copy in each thread and avoid locking.

Then the worker can check for changes on each loop and rebuild the hashtables as needed.

Did some code cleanup in schema.c

relates: #4598

@progier389
Copy link
Contributor

Still reviewing but I have a general comment:

  • I would have put the code that check version change and duplicate the hash table (in connection.c) in a separate function (and in libslapd rather than in ns-slapd)
    And same thing for the cleanup code.
    The rational is that we will likely want to reuse that code later on in other threads (import threads and maybe some plugins threads)

@mreynolds389
Copy link
Contributor Author

Still reviewing but I have a general comment:

This is technically still a WIP. Need to do some performance testing with it. I just wanted to get the working code into a PR :-)

* I would have put the code that check version change and duplicate the hash table (in connection.c) in a separate function (and in libslapd rather than in ns-slapd)
  And same thing for the cleanup code.
  The rational is that we will likely want to reuse that code later on in other threads (import threads and maybe some plugins threads)

Sounds good, I'll work on that next...

Copy link
Contributor

@progier389 progier389 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except a few minor points, its looks fine.
I hope we will see the impact in performance test.

P.S: Avoid doing massive code cleanup in a PR impacting sensitive code:
It really makes the review much harder:
For example I am not absolutely sure about what was really changed in schema.c
because the cleanup just hide the functional changes ! -:(
( I think it is all around the DSE_SCHEMA_USE_GLOBAL flags but I am not absolutely sure.)

@mreynolds389
Copy link
Contributor Author

Except a few minor points, its looks fine. I hope we will see the impact in performance test.

P.S: Avoid doing massive code cleanup in a PR impacting sensitive code: It really makes the review much harder: For example I am not absolutely sure about what was really changed in schema.c because the cleanup just hide the functional changes ! -:( ( I think it is all around the DSE_SCHEMA_USE_GLOBAL flags but I am not absolutely sure.)

I know I know, the cleanup started off simple and small, and then it turned into a can of worms. But yes in schema.c the main change was to add a flag so we always update/access the global hash tables and not the local thread hash tables.

@mreynolds389
Copy link
Contributor Author

All changes made. I hope to get performance testing done next week.

Copy link
Contributor

@progier389 progier389 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Firstyear
Copy link
Contributor

There is a lot of contention around the rwlocks for the attribute syntax hashtables. Since syntaxes rarely change we can just keep a copy in each thread and avoid locking.

Wouldn't contention on an rwlock imply that there are attempts to frequently write to these values?

Rwlocks with .read() are effectively free since it becomes an atomic ref count.

How did you determine there was contention here?

@mreynolds389
Copy link
Contributor Author

There is a lot of contention around the rwlocks for the attribute syntax hashtables. Since syntaxes rarely change we can just keep a copy in each thread and avoid locking.

Wouldn't contention on an rwlock imply that there are attempts to frequently write to these values?

Rwlocks with .read() are effectively free since it becomes an atomic ref count.

How did you determine there was contention here?

Thierry did this investigation in the parent Issue:

#4598

Using modrate, and removing these RWlock gave a 5% performance increase. I'm not sure what tool he used to measure the contention but that report is also in the parent issue.

@progier389
Copy link
Contributor

I do not think that we are speaking of the usual lock contention here (as the lock is never held in write mode)
but rather memory cache contention
FYI: I also noticed the same thing when I run the performance test about lmdb import.
In the final tests once dn cache was disabled, , according to '/usr/bin/perf c2c', the first four top contentions were in rw lock below attr_syntax_get_by_name_locking_optional (__pthread_rwlock_unlock and ___pthread_rwlock_rdlock)

@Firstyear
Copy link
Contributor

I do not think that we are speaking of the usual lock contention here (as the lock is never held in write mode) but rather memory cache contention FYI: I also noticed the same thing when I run the performance test about lmdb import. In the final tests once dn cache was disabled, , according to '/usr/bin/perf c2c', the first four top contentions were in rw lock below attr_syntax_get_by_name_locking_optional (__pthread_rwlock_unlock and ___pthread_rwlock_rdlock)

Memory cache contention doesn't make sense though, because the pages will be in the shared state in the various cpu caches. By splitting this we'll be putting more pressure on the cpu caches because we are no longer sharing the pages.

So I'm still really curious about this and the assumption about the cause that is occuring.

@progier389
Copy link
Contributor

As I told the you the perf c2c tool reports these as a the top cache contention so there are contention even if we do not understand why it happens ! -;)
That said I can understand that you would like to know exactly why it is the case. I am a bit curious too (but not srprised as is not the first time that I have seen that a mostly read rw lock was surprisingly costly).
For my part, I learned to trust the "/usr/bin/perf" tool because it helped me to understand why in my first tests the lmdb online import was ten time slower than the off line one (while the import algorithm is exactly the same)
Anyway I guess we will see the result when we will be able to run the performance test on the new code.

@Firstyear
Copy link
Contributor

I would certainly trust perf here, but I think understanding why something happens is going to give a better solution than just splitting up a structure and copying it (which might reduce the contention but it's going to cause data-duplication in the cache meaning we get worse cache sharing and overall worse performance).

So understanding the problem seems like a better first step :)

@mreynolds389
Copy link
Contributor Author

mreynolds389 commented Mar 16, 2023

Initial Performance Results


This was all done on a single system (client and server) with 16 cores

Import

1-2% performance increase (only tested bdb)

Modrate

modrate --hostname localhost --port 389 --bindDN 'cn=dm' --bindPassword XXXXXXX --entryDN "uid=user[1-10000],ou=People,dc=example,dc=com" --attribute givenname --attribute mail --valueLength 12 --numThreads 5

No improvement

Searchrate

searchrate --hostname localhost --port 389 --bindDN 'cn=dm' --bindPassword XXXXXX --baseDN dc=example,dc=com --scope sub --filter FILTER --attribute givenName --attribute sn --attribute mail --numThreads 10

Single Filter: (uid=[1-10000])

19% improvement

Compound Filter: (&(uid=[1-10000])(objectclass=posixaccount))

15% improvement

Conclusion

The original issue said "modrate" got an improvement, but I don't see that. However the search improvement is significant!

@mreynolds389
Copy link
Contributor Author

mreynolds389 commented Mar 16, 2023

Also... For the import test I also tried adding hashtable copies to the bdb_import_foreman(), bdb_import_producer(), and bdb_public_bdb_import_main() threads, but it did not seem to help import performance (with bdb).

@mreynolds389
Copy link
Contributor Author

Well I was playing around with the test cases and when I reran the same tests I ran above the performance improvement was very small. Sigh. I ran those tests multiple times with each version and the results were very consistent (15-19% improvement), now they are not. I think I need to wait until we get those dedicated lab machines to run these tests correctly (separating client from server).

@mreynolds389
Copy link
Contributor Author

OK testing on large lab machines shows no noticeable improvement with this fix :-( That doesn't mean this should be closed, as we are most likely hitting another bottle neck, so this fix "could" improve things down the road when other issues are addressed

Description:

There is a lot of contention around the rwlocks for the attribute
syntax hashtables.  Since syntaxes rarely change we can just keep a
copy in each thread and avoid locking.

Then the worker can check for changes on each loop and rebuild
the hashtables as needed.

Did some code cleanup in schema.c

relates: 389ds#4598

Reviewed by: progier(Thanks!)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants