Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new multithreaded concurrency configuration #5015

Open
wants to merge 590 commits into
base: develop
Choose a base branch
from

Conversation

qkoziol
Copy link
Contributor

@qkoziol qkoziol commented Oct 26, 2024

Added infrastructure support for multithreaded concurrency by adding an optional way to switch to using a non-recursive R/W lock for the global API lock. This is enabled with a new 'concurrency' configuration flag for the autotools & CMake builds, which is disabled by default.

When the 'concurrency' build option is chosen, the global API lock will use the R/W lock and all API calls currently will acquire a write lock, ensuring exclusive access by one thread. Over time, the API routines that are converted to support multithreaded concurrency will switch to acquiring a read lock instead.

Reentering the library from application callbacks is managed by the 'disable locking for this thread' (DLFTT) threadsafety protocol. This is internally handled within the H5_API_LOCK / H5_API_UNLOCK macros in H5private.h (as before), which invoke the 'dlftt' routines in H5TSint.c.

To support this change, the threadsafety configuration macros for the library have been updated:
- --enable-threadsafe now defines the H5_HAVE_THREADSAFE macro
- --enable-concurrency defines the H5_HAVE_CONCURRENCY macro
The new H5_HAVE_THREADSAFE_API macro is set if either H5_HAVE_THREADSAFE or H5_HAVE_CONCURRENCY is enabled.

New Github actions are added to include the concurrency configuration in the CI for the develop branch.

To support the new non-recursive R/W locking for API routines, some other changes are necessary:

  • Added macro wrappers around all callback invocations that could call an
    application function, and therefore re-enter the library:
    H5_BEFORE_USER_CB* / H5_AFTER_USER_CB*

  • Added H5_user_cb_prepare / H5_user_cb_restore routines that save the
    state of the library when callback leaves the library. Includes error
    stack and threadsafe reentry state currently.

There's also some small cleanups to various places in the library:

  • Moved the H5E_mpi_error_str / H5E_mpi_error_str_len globals to be local for
    pushing MPI errors, so that multiple threads can't interfere with each
    other.

  • Added H5TS_rwlock_trywrlock() routine to R/W lock interface.

  • Emulate R/W locks on MacOS because its implementation of
    pthread_rwlock_wrlock() does not conform to the POSIX standard.

  • Don't acquire the global API lock in H5close, since it's acquired in H5_term_library, which is necessary because H5_term_library is invoked via other code paths that don't hold the global API lock.

  • Don't call H5Eget_auto2 API routine within H5_term_library.

  • Switched 'return NULL' in H5allocate_memory to HGOTO_DONE(NULL).

  • Switched H5Pget_file_space_strategy / H5Pset_file_space_strategy to use
    internal routines instead of API routines.

  • Switched H5Oopen_by_addr & H5Ovisit1 to use internal routines instead of
    API routines.

  • Fixed a few places in src/H5Odeprec.c where a major
    error ID was passed as a minor ID.

qkoziol and others added 30 commits May 24, 2024 20:18
Signed-off-by: Quincey Koziol <[email protected]>
Also added more atomic types

Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
And put it into a macro

Signed-off-by: Quincey Koziol <[email protected]>
Also switched to use H5_HAVE_THREADSAFE_API for indicating that either
the --enable-threadsafe or --enable-concurrency options are enabled.

Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
Signed-off-by: Quincey Koziol <[email protected]>
@@ -214,14 +215,14 @@ H5_init_library(void)
*/
if (!H5_dont_atexit_g) {

#if defined(H5_HAVE_THREADSAFE)
#ifdef H5_HAVE_THREADSAFE_API
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of macro name change. H5_HAVE_THREADSAFE_API if either H5_HAVE_THREADSAFE or H5_HAVE_CONCURRENCY is enabled

@@ -321,7 +322,7 @@ H5_term_library(void)
H5CX_push(&api_ctx);

/* Check if we should display error output */
(void)H5Eget_auto2(H5E_DEFAULT, &func, NULL);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove API call within the library.

@@ -332,8 +333,13 @@ H5_term_library(void)
while (curr_atclose) {
H5_atclose_node_t *tmp_atclose; /* Temporary pointer to 'atclose' node */

/* Invoke callback, providing context */
(*curr_atclose->func)(curr_atclose->ctx);
/* Prepare & restore library for user callback */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of macro wrappers around callback invocations that could call an application function, and therefore re-enter the library.

@@ -1097,13 +1103,14 @@ H5allocate_memory(size_t size, bool clear)
FUNC_ENTER_API_NOINIT

if (0 == size)
return NULL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to leave via FUNC_LEAVE, so the API lock can be dropped.

@@ -1226,3 +1233,63 @@ H5is_library_terminating(bool *is_terminating /*out*/)

FUNC_LEAVE_API_NOINIT(ret_value)
} /* end H5is_library_terminating() */

/*-------------------------------------------------------------------------
* Function: H5_user_cb_prepare
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save the state of the library when callback leaves the library. Includes error stack and threadsafe reentry state currently

src/H5Dfill.c Outdated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some error cleanups along with the callback wrappers.

@@ -363,6 +355,82 @@ H5E_term_package(void)
FUNC_LEAVE_NOAPI(n)
} /* end H5E_term_package() */

/*-------------------------------------------------------------------------
* Function: H5E_user_cb_prepare
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save and restore the default error reporting settings, to prevent an application from changing them in a callback.

@@ -1369,6 +1458,40 @@ H5E__get_auto(const H5E_stack_t *estack, H5E_auto_op_t *op, void **client_data)
FUNC_LEAVE_NOAPI(SUCCEED)
} /* end H5E__get_auto() */

/*-------------------------------------------------------------------------
* Function: H5E_get_default_auto_func
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used when terminating the library.

@@ -181,29 +186,58 @@
/*
* MPI error handling macros.
*/

extern char H5E_mpi_error_str[MPI_MAX_ERROR_STRING];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these from global to local, so that multiple threads won't stomp on the values.

src/H5FDint.c Outdated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some style cleanups along with the callback warappers.

src/H5FO.c Outdated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added missing H5private.h

/* Deserialize VOL object token into object address */
if (H5VLnative_token_to_addr(obj_id, oinfo2->token, &oinfo.addr) < 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't call API routines within the library.

/* This is a native-specific routine that requires serialization of the token */
if (H5VLnative_addr_to_token(loc_id, addr, &obj_token) < 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't call API routines within the library.

@@ -423,7 +441,7 @@ H5Oget_info1(hid_t loc_id, H5O_info1_t *oinfo /*out*/)

/* Must use native VOL connector for this operation */
if (!is_native_vol_obj)
HGOTO_ERROR(H5E_OHDR, H5E_VOL, FAIL,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use major error codes as minor codes.

@@ -543,9 +551,11 @@ H5Pset_file_space(hid_t plist_id, H5F_file_space_type_t strategy, hsize_t thresh
* the existing threshold is retained.
*/
if (!in_strategy)
H5Pget_file_space(plist_id, &in_strategy, NULL);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't call API routines within the library.

@@ -1165,32 +1165,21 @@ H5Pget_shared_mesg_phase_change(hid_t plist_id, unsigned *max_list /*out*/, unsi
} /* end H5Pget_shared_mesg_phase_change() */

/*-------------------------------------------------------------------------
* Function: H5Pset_file_space_strategy
* Function: H5P__set_file_space_strategy
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made package routines for these calls, to support H5Pget/set_file_space changes.

src/H5SMcache.c Outdated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added missing H5private.h

@@ -74,6 +77,12 @@ typedef struct H5TS_tinfo_node_t {
/* Local Prototypes */
/********************/
static H5TS_tinfo_node_t *H5TS__tinfo_create(void);
#ifdef H5_HAVE_CONCURRENCY
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routines to support the "disable locking for this thread" ("DLFTT") protocol when making user callbacks.

if (H5_UNLIKELY(H5TS_mutex_init(&H5TS_api_info_p.api_mutex, H5TS_MUTEX_TYPE_RECURSIVE) < 0))
HGOTO_DONE(FAIL);
H5TS_api_info_p.lock_count = 0;
#else /* H5_HAVE_CONCURRENCY */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use non-recursive R/W lock for global API lock when concurrency configure option is enabled.

if (*acquired) {
for (unsigned u = 0; u < (lock_count - 1); u++)
if (H5_UNLIKELY(H5TS_mutex_lock(&H5TS_api_info_p.api_mutex) < 0))
HGOTO_DONE(FAIL);
H5TS_api_info_p.lock_count += lock_count;
}
#else /* H5_HAVE_CONCURRENCY */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement the DLFTT protocol - don't reacquire global lock when locking is disabled for a thread.

/*
* Emulated pthread rwlock for MacOS
*
* Can't use pthread rwlock on MacOS due to: "The results [of calling
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh

@@ -304,6 +341,8 @@ H5_DLL herr_t H5TS_rwlock_init(H5TS_rwlock_t *lock);
static inline herr_t H5TS_rwlock_rdlock(H5TS_rwlock_t *lock);
static inline herr_t H5TS_rwlock_rdunlock(H5TS_rwlock_t *lock);
static inline herr_t H5TS_rwlock_wrlock(H5TS_rwlock_t *lock);
static inline herr_t H5TS_rwlock_trywrlock(H5TS_rwlock_t *lock, bool *acquired)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New non-recursive R/W lock routine

src/H5UC.c Outdated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing H5private.h

H5E_user_cb_state_t h5e_state; /* State for H5E package */
} H5_user_cb_state_t;

#define H5_BEFORE_USER_CB(err) \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Macros to wrap user callbacks.

/* Both the 'threadsafe' and 'concurrency' options provide threadsafely for
* API calls.
*/
#if defined(H5_HAVE_THREADSAFE) || defined(H5_HAVE_CONCURRENCY)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set H5_HAVE_THREADSAFE_API if either of the macros that indicate threadsafe APIs are set.

#else /* H5_HAVE_THREADSAFE */
#else /* H5_HAVE_CONCURRENCY */
/* Local variable for 'disable locking for this thread' (DLFTT) state */
#define H5DLFTT_DECL unsigned dlftt = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates API locking for the DLFTT protocol

test/hdfs.c Outdated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format

is_native = H5VL__is_native_connector_test(vol_id);
CHECK(is_native, FAIL, "H5VL__is_native_connector_test");

if (is_native) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass-through VOL connectors increment the API counter by various amounts, so just skip this portion of the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - Build CMake, Autotools Component - C Library Core C library issues (usually in the src directory) Component - Testing Code in test or testpar directories, GitHub workflows Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub Type - Improvement Improvements that don't add a new feature or functionality Type - New Feature Add a new API call, functionality, or tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants