diff --git a/doc/README.consistency.md b/doc/README.consistency.md index 56a445873..d4c916afd 100644 --- a/doc/README.consistency.md +++ b/doc/README.consistency.md @@ -1,7 +1,25 @@ ## Note on parallel I/O data consistency -PnetCDF follows the same parallel I/O data consistency as MPI-IO standard. -Refer the URL below for more information. +PnetCDF follows the same parallel I/O data consistency as MPI-IO standard, +quoted below. + +``` +Consistency semantics define the outcome of multiple accesses to a single file. +All file accesses in MPI are relative to a specific file handle created from a +collective open. MPI provides three levels of consistency: + * sequential consistency among all accesses using a single file handle, + * sequential consistency among all accesses using file handles created from a + single collective open with atomic mode enabled, and + * user-imposed consistency among accesses other than the above. +Sequential consistency means the behavior of a set of operations will be as if +the operations were performed in some serial order consistent with program +order; each access appears atomic, although the exact ordering of accesses is +unspecified. User-imposed consistency may be obtained using program order and +calls to MPI_FILE_SYNC. +``` + +Users are referred to the MPI standard Chapter 14.6 Consistency and Semantics +for more information. http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report/node296.htm#Node296 Readers are also referred to the following paper. @@ -9,19 +27,27 @@ Rajeev Thakur, William Gropp, and Ewing Lusk, On Implementing MPI-IO Portably and with High Performance, in the Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems, pp. 23-32, May 1999. -If users would like PnetCDF to enforce a stronger consistency, they should add -NC_SHARE flag when open/create the file. By doing so, PnetCDF adds -MPI_File_sync() after each MPI I/O calls. - * For PnetCDF collective APIs, an MPI_Barrier() will also be called right - after MPI_File_sync(). - * For independent APIs, there is no need for calling MPI_Barrier(). - -Users are warned that the I/O performance when using NC_SHARE flag could become -significantly slower than not using it. - -If NC_SHARE is not set, then users are responsible for their desired data -consistency. To enforce a stronger consistency, users can explicitly call -ncmpi_sync(). In ncmpi_sync(), MPI_File_sync() and MPI_Barrier() are called. +* NC_SHARE has been deprecated in PnetCDF release of 1.13.0. + + NC_SHARE is a legacy flag inherited from NetCDF-3, whose purpose is to + provide some degree of data consistency for multiple processes concurrently + accessing a shared file. To achieve a stronger consistency, user + applications are required to also synchronize the processes, such as + calling MPI_Barrier, together with nc_sync. + + Because PnetCDF follows the MPI file consistency, which only addresses the + case when all file accesses are relative to a specific file handle created + from a collective open, NC_SHARE becomes invalid. Note that NetCDF-3 + supports only sequential I/O and thus has no collective file open per se. + +If users would like a stronger consistency, they may consider using the code +fragment below after each collective write API call (e.g. +`ncmpi_put_vara_int_all`, `ncmpi_wait_all` `ncmpi_enddef`, `ncmpi_redef`, +`ncmpio_begin_indep_data`, `ncmpio_end_indep_data`). +``` + ncmpi_sync(ncid); + MPI_Barrier(comm); + ncmpi_sync(ncid); +``` +Users are warned that the I/O performance could become significantly slower. ### Note on header consistency in memory and file In data mode, changes to file header can happen in the following scenarios. diff --git a/man/pnetcdf.m4 b/man/pnetcdf.m4 index 1af08100d..cc8fb653b 100644 --- a/man/pnetcdf.m4 +++ b/man/pnetcdf.m4 @@ -495,10 +495,9 @@ Creates a new netCDF dataset at ARG(path) collectively by a group of MPI processes specified by ARG(comm), returning a netCDF ID in ARG(ncid). The argument ARG(cmode) may <> the bitwise-or of the following flags: MACRO(NOCLOBBER) to protect existing datasets (default is MACRO(CLOBBER), -silently blows them away), MACRO(SHARE) for stronger metadata data consistency -control, MACRO(64BIT_OFFSET) to create a file in the 64-bit offset format -(CDF-2), as opposed to classic format, the default, or MACRO(64BIT_DATA) to -create a file in the 64-bit data format (CDF-5). +silently blows them away), MACRO(64BIT_OFFSET) to create a file in the +64-bit offset format (CDF-2), as opposed to classic format, the default, or +MACRO(64BIT_DATA) to create a file in the 64-bit data format (CDF-5). Use either MACRO(64BIT_OFFSET) or MACRO(64BIT_DATA). The 64-bit offset format allows the creation of very large files with far fewer restrictions than netCDF classic format, but can only be read by the netCDF @@ -530,7 +529,7 @@ Opens an existing netCDF dataset at ARG(path) collectively by a group of MPI processes specified by ARG(comm), returning a netCDF ID in ARG(ncid). The type of access is described by the ARG(mode) parameter, which may <> the bitwise-or of the following flags: MACRO(WRITE) for read-write access (default -read-only), MACRO(SHARE) for stronger metadata data consistency control. +read-only). .sp ifelse(DAP,TRUE, <> flushes cached data by calling MPI_File_sync. .HP FDECL(abort, (INCID())) diff --git a/man/pnetcdf_f90.m4 b/man/pnetcdf_f90.m4 index cb88aa9a9..cf2f7489b 100644 --- a/man/pnetcdf_f90.m4 +++ b/man/pnetcdf_f90.m4 @@ -74,10 +74,9 @@ Creates a new netCDF dataset at \fIpath\fP collectively by a group of MPI processes specified by \fIcomm\fP, returning a netCDF ID in \fIncid\fP. The argument \fIcmode\fP may include the bitwise-or of the following flags: \fBnf90_noclobber\fR to protect existing datasets (default is \fBnf90_clobber\fR, -silently blows them away), \fBnf90_share\fR for stronger metadata data consistency -control, \fBnf90_64bit_offset\fR to create a file in the 64-bit offset format -(CDF-2), as opposed to classic format, the default, or \fBnf90_64bit_data\fR to -create a file in the 64-bit data format (CDF-5). +silently blows them away), \fBnf90_64bit_offset\fR to create a file in the +64-bit offset format (CDF-2), as opposed to classic format, the default, or +\fBnf90_64bit_data\fR to create a file in the 64-bit data format (CDF-5). Use either \fBnf90_64bit_offset\fR or \fBnf90_64bit_data\fR. The 64-bit offset format allows the creation of very large files with far fewer restrictions than netCDF classic format, but can only be read by the netCDF @@ -115,7 +114,7 @@ Opens an existing netCDF dataset at \fIpath\fP collectively by a group of MPI processes specified by \fIcomm\fP, returning a netCDF ID in \fIncid\fP. The type of access is described by the \fImode\fP parameter, which may include the bitwise-or of the following flags: \fBnf90_write\fR for read-write access (default -read-only), \fBnf90_share\fR for stronger metadata data consistency control. +read-only). .sp The argument \fImode\fP must be consistent among all MPI processes that @@ -158,11 +157,7 @@ integer, intent(in) :: ncid integer :: nf90mpi_sync .fi .sp -Unless the -\fBnf90_share\fR -bit is set in -\fBnf90mpi_open(\|)\fR or \fBnf90mpi_create(\|)\fR, -data written by PnetCDF APIs may be cached by local file system on each compute +Data written by PnetCDF APIs may be cached by local file system on each compute node. This API flushes cached data by calling MPI_File_sync. .RE .HP diff --git a/sneak_peek.md b/sneak_peek.md index 228080c3e..31c073ac6 100644 --- a/sneak_peek.md +++ b/sneak_peek.md @@ -22,6 +22,9 @@ This is essentially a placeholder for the next release note ... + none * Configure options + + `--disable-file-sync` is now deprecated. This configure option alone does + not provide a sufficient data consistency. Users are suggested to call + `ncmpi_sync` and `MPI_Barrier` to achieve a desired consistency. + `--enable-install-examples` to install example programs under folder `${prefix}/pnetcdf_examples` along with run script files. An example is `${prefix}/pnetcdf_examples/C/run_c_examples.sh`. The default of this @@ -53,10 +56,16 @@ This is essentially a placeholder for the next release note ... + none * API syntax changes - + none + + File open flag NC_SHARE is now deprecated. It is still defined, but takes + no effect. * API semantics updates - + none + + NC_SHARE alone is not sufficient to provide data consistency for accessing + a shared file in parallel and thus is now deprecated. Because PnetCDF + follows the MPI file consistency, which only addresses the case when all + file accesses are relative to a specific file handle created from a + collective open, NC_SHARE becomes invalid. See doc/README.consistency.md + for more information. * New error code precedence + none