[WIP] ENH: DAOS and DFS modules #1014

shanedsnyder · 2024-10-31T19:25:36Z

This PR adds new instrumentation of DAOS storage APIs and corresponding updates to our analysis tools to integrate this DAOS data. Specifically, 2 new Darshan modules are defined: DARSHAN_DFS_MOD for instrumenting usage of the DAOS file system (DFS) API and DARSHAN_DAOS_MOD for instrumenting native DAOS object APIs. More details on each module below.

DFS module:

For each DFS file, Darshan captures a fixed set of integer/FP counters (see full list in dfs-log-format.h) and the corresponding DAOS pool/container UUIDs.
DFS file record names are based on the full path in the DFS directory tree, similar to our other file-based modules.
DFS file record IDs are based off of the underlying DAOS OID, not the file name.
- This approach was used, because not all DFS file open routines take a filename as input (e.g., dfs_obj_global2local()), meaning not all processes will have the filename available to generate a consistent record ID -- using the object OID allows all processes to agree on a consistent record ID value.
- One side effect of this approach worth mentioning is that, since Darshan records are based on underlying OIDs and not file names, deleting/recreating files will result in multiple Darshan records corresponding to the same file -- this behavior can be easily observed in benchmarks like IOR which delete/recreate the output file on each iteration. It will ultimately be the responsibility of analysis tools to aggregate file records in this case.
The pool_uuid:cont_uuid combo is used in place of the mount pt in tools like darshan-parser.

Example darshan-parser output line:

#<module>       <rank>  <record id>     <counter>       <value> <file name>     <mount pt>      <fs type>
DFS     -1      13156018442998895329    DFS_OPENS       2       /testFile       f4996f65-9c9a-41c6-ac18-88059a11aeb1:b445df4d-0f29-4
62a-9c70-a80bf5a5a0f9       N/A

DAOS module:

For each DAOS object, Darshan captures a fixed set of integer/FP counters (see full list in daos-log-format.h), the corresponding DAOS pool/container UUIDs, and the full DAOS OID.
- There are actually 3 distinct DAOS object APIs tracked in the Darshan DAOS module: object (DAOS_OBJ), array (DAOS_ARRAY), and KV (DAOS_KV).
DAOS object records have no name -- when printing these records in darshan-util programs, we just print the OID in string format (i.e., oid_hi.oid_lo, same approach as DAOS's own utilities)
- Small changes were made to darshan-runtime and darshan-util libraries to allow for records that have no name associated.
DAOS file record IDs are based off of the underlying DAOS OID.
- This makes it trivial to identify which DAOS object records correspond to which DFS file records, as they will have the same Darshan record identifier.
The pool_uuid:cont_uuid combo is used in place of the mount pt in tools like darshan-parser.

Example darshan-parser output line:

#<module>       <rank>  <record id>     <counter>       <value> <file name>     <mount pt>      <fs type>
DAOS    -1      13156018442998895329    DAOS_OBJ_OPENS  1       937047793718163273.416  f4996f65-9c9a-41c6-ac18-88059a11aeb1:b445df4d-0f29-462a-9c70-a80bf5a5a0f9       N/A

Both DFS and DAOS modules integrate with the Darshan heatmap module to generate histograms of I/O activity on each process. Both DFS and DAOS modules have also fully implemented darshan-util and PyDarshan functionality, including support for generating PyDarshan summary reports detailing DFS/DAOS access patterns. PyDarshan tests have been updated to ensure expected behavior when parsing logs containing DFS/DAOS data.

There are a few outstanding items that are not addressed in this PR:

There is no DXT support for DAOS modules, yet. It seems like the right call to try to limit the scope of changes here and weigh that capability with other development priorities going forward.
DAOS data is integrated into most of the relevant sections in PyDarshan summary reports, but not in the "data access by category" plots. I created an issue to track this: ENH: add new DAOS module data to PyDarshan "data access by category" plots #1015

Replaces #739

* add CFFI shims needed to access DFS record data at the Python level * adjust `test_main_all_logs_repo_files()` to handle the new `ior` `DFS` log file from Shane--it has a single runtime heatmap for `STDIO` * `test_module_table()` has been updated with a regression case for Shane's new DFS log file * add `test_dfs_daos_posix_match()` to ensure counter equivalence between similar `ior..` runs with DAOS vs. POSIX (NOTE: these actually don't look that similar yet--xfailed for now..)

* adjust `test_dfs_daos_posix_match()` to handle the two new POSIX/DAOS "mirror files" from Shane; the `xfail` has been removed and it now passes * there seems to be soem reasonable agreement between the logs, which is good; see the test proper for data columns that do not match or required special handling for DFS-POSIX equivalence testing * a few other test suite shims after Shane changed the POSIX/DAOS mirror files

* add DFS support to I/O cost graph in summary reports, with some light unit testing

* add a DFS per-module stats section to the Python summary report, and some initial tests

* simplify the "time" counter handling in `test_dfs_daos_posix_match()` based on reviewer feedback * `DFS_SLOWEST_RANK` is ignored in the comparisons in `test_dfs_daos_posix_match()` based on reviewer feedback * the comment about `STAT` counter differences in `test_dfs_daos_posix_match` was removed, based on reviewer feedback

The OID backing a DFS file can change if the file is deleted and recreated.

This reverts commit c6e6936.

we don't currently have a way to generate darshan record IDs given only a pathname -- they are based on OIDs

* requires interception of `daos_cont_open` routines to allow mapping of container handles to pool/cont UUIDs * DAOS module record ID now based on OID, cont UUID, and pool UUID

* add logic to allow name records with zero-length names to be updated with names in later register_record calls - this is useful because DAOS/DFS generate the same record IDs for "file objects", but the DAOS module does not register a name with the record and registers the record before DFS module

when reading name records from the log file, allow for updating an existing zero-length name record

also, cleanup file/object terminology in job summary

Shane Snyder and others added 30 commits October 22, 2024 19:01

initial stubbed out DAOS DFS module

e2073ce

first cut at entire dfs runtime/util code

daace3c

autoconf/automake support for daos module

c2d99f1

adopt new darshan-core module api

8d3beca

fix up new compile errors/warnings

6762c06

teach automake about daos ld-opts

b2b239b

updated comments on missing functionality

6240488

comment out move/exchange wrappers, need more work

7e16b75

changes to support instrumenting obj_global2local

a090647

add example log to temporarily test with

7ae1129

added new IOR example log files

073f639

MAINT: PR 739 revisions

0f46d6a

* add DFS support to I/O cost graph in summary reports, with some light unit testing

MAINT: PR 739 revisions

0aa838f

* add a DFS per-module stats section to the Python summary report, and some initial tests

rename existing DAOS files to DFS

804a2b7

fix header guard

e76b437

instrument initial DAOS obj/array routines

7bf1463

more instrumentation of native daos APIs

e00d27a

more includes needed for DAOS header ac checks

bbd8000

use filename rather than OID to generate DFS IDs

0365cef

The OID backing a DFS file can change if the file is deleted and recreated.

Revert "use filename rather than OID to generate DFS IDs"

3f6f845

This reverts commit c6e6936.

make sure to set object oid in daos redux

3f02d83

add DAOS module access size histogram

5b12999

add wrapper for dfs_remove

98fe560

only instrument DFS calls if no error

aec7a23

move locking for DAOS module

073075a

add daos kv api instrumentation

542fae7

drop support for dfs_stat, dfs_move, dfs_exchange

1275d03

we don't currently have a way to generate darshan record IDs given only a pathname -- they are based on OIDs

Shane Snyder and others added 10 commits October 22, 2024 19:01

update DAOS module to include pool/cont uuids

e845c8f

* requires interception of `daos_cont_open` routines to allow mapping of container handles to pool/cont UUIDs * DAOS module record ID now based on OID, cont UUID, and pool UUID

add darshan-util code to handle multiple namerecs

d74235f

when reading name records from the log file, allow for updating an existing zero-length name record

improved comments

db2c2f6

finish implementing DAOS/DFS logutils functions

7bfd7fc

add accumulator and advanced parser support

15bef37

updated comments about global2local support

b7102e6

proper size calculation for array API

ff0d0ac

drop DFS_USE_DTX counter

331dc95

filter out DFS records that do no I/O operations

8d9fb59

shanedsnyder added this to the 3.4.7 milestone Oct 31, 2024

github-actions bot added the pydarshan label Oct 31, 2024

Shane Snyder and others added 5 commits November 4, 2024 17:52

cleanup some runtime daos/dfs code

8a4cc35

updated darshan-runtime docs for daos

4d573cb

updated darshan-util docs for daos

2adec07

small darshan-util tweaks

31f45f2

add checks for libuuid to darshan-util configure

a3f2b32

shanedsnyder closed this Nov 8, 2024

shanedsnyder reopened this Nov 8, 2024

shanedsnyder added 9 commits November 8, 2024 19:04

updated pydarshan for DFS module

c83718f

pydarshan updates for DAOS module

1f0532b

forgot to compare DFS vs DAOS values

4f99f64

drop DAOS records with no real I/O activity

b14c589

updated pydarshan to support DFS/DAOS heatmaps

47a1a97

enforce order for DAOS heatmag figs

1320444

add more DAOS ops to opcount plots

24b1891

generate module overview table for DAOS

e44ea71

also, cleanup file/object terminology in job summary

update DAOS opcount tests

9d96289

shanedsnyder changed the title ~~WIP: DAOS and DFS modules~~ ENH: DAOS and DFS modules Nov 12, 2024

shanedsnyder changed the title ~~ENH: DAOS and DFS modules~~ [WIP] ENH: DAOS and DFS modules Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] ENH: DAOS and DFS modules #1014

[WIP] ENH: DAOS and DFS modules #1014

shanedsnyder commented Oct 31, 2024 •

edited

Loading

[WIP] ENH: DAOS and DFS modules #1014

Are you sure you want to change the base?

[WIP] ENH: DAOS and DFS modules #1014

Conversation

shanedsnyder commented Oct 31, 2024 • edited Loading

shanedsnyder commented Oct 31, 2024 •

edited

Loading