[CNEUR-379] Use `/dev/shm` as a cache in multinode simulations #15

iomaganaris · 2023-07-25T15:24:56Z

Currently in multinode simulations without /dev/shm all node write the coreneuron_input files in the same folder. This hurts a lot the GPFS performance for all users while this is happening.
Instead of all ranks writing to GPFS this PR adds a CACHE mode for /dev/shm where the coreneuron_input data are first staged to /dev/shm and then written by a single run in different folders in GPFS named coreneuron_input_{node_id}.
Then there symlinks are created from coreneuron_input_{node_id}/*_{1,2,3}.dat to /dev/shm/.../coreneuron_input and then CoreNEURON launches the simulation using /dev/shm/.../coreneuron_input.
The only drawback of this approach is that there extra memory needed to use the /dev/shm cache since everything we dump to /dev/shm is saved in the RAM of the node

neurodamus/node.py

sergiorg-hpc · 2023-08-17T08:27:26Z

neurodamus/node.py

+
+                group_id = int(SHMUtil.node_id / 20)
+                node_specific_corenrn_output_in_storage = \
+                    Path(corenrn_output) / f"coreneuron_input/cycle_{self._cycle_i}/group_{group_id}/node_{SHMUtil.node_id}"


(Pasting from Slack)

Thanks to a reminder by @1uc, I completely forgot that I discussed with
@iomaganaris about generating the output in subfolders on GPFS, and that he already had something working recently 😅. Hence, I made some changes to improve it a bit by dividing the coreneuron_input directory into:

coreneuron_input/cycle_X/group_Y/node_Z

Being cycle_X the current cycle in the model instantiation, node_Z the node ID (i.e., from 0 to 799 in the 800-node simulation), and group_Y a simple division to group the set of nodes in subfolders dividided by 20 (i.e., group_id=floor(node_id / 20)). Why 20? Well, a magic number just to divide the amount of folders into something reasonable.

With this simple approach, inside the coreneuron_input there would be at most 32 folders (i.e., one per cycle). Inside each cycle folder, there would be at most 40 subfolders (i.e., 800/20 = 40, one per subset of nodes). Inside the group of nodes, there would be at most 20 subfolders corresponding to the node IDs (i.e., from 0 to 19 in the first group, 20 to 39 in the second, and so on). Finally, inside the specific node folder, there would be at most 120 files from CoreNEURON (i.e., 3 files_per_rank x 40 ranks_per_node = 120).

In other words, we go from a folder with 3.1M files, to a tree that is much more manageable by GPFS and IME, regardless of the target file system that we use.

Note that there has been a slight update in the code since this comment was written, but the reasoning is still valid.

iomaganaris · 2023-08-21T14:57:21Z

neurodamus/node.py

@@ -1694,17 +1695,6 @@ def cleanup(self):
                    data_folder_shm = SHMUtil.get_datadir_shm(data_folder)
                    logging.info("Deleting intermediate SHM data in %s", data_folder_shm)
                    subprocess.call(['/bin/rm', '-rf', data_folder_shm])
-                    # Remove also the coreneuron_input_{node_id} folders


Shouldn't we delete the symlinks in /dev/shm and folders in GPFS in the end? @sergiorg-hpc

iomaganaris · 2024-01-08T14:56:19Z

If I'm not mistaken @sergiorg-hpc has been working in an improved version of this fix so this PR can now be closed?

iomaganaris · 2024-01-12T11:56:46Z

We discussed about this change with Sergio offline and we concluded that it's something that it's not necessary for MMB simulations but might be beneficial so until needed this can be stayed open and Sergio can continue working on it if necessary

iomaganaris requested a review from sergiorg-hpc July 25, 2023 15:24

ferdonline reviewed Jul 26, 2023

View reviewed changes

neurodamus/node.py Outdated Show resolved Hide resolved

ferdonline force-pushed the main branch from 05c336c to f388c35 Compare July 26, 2023 12:33

This comment has been minimized.

Sign in to view

sergiorg-hpc reviewed Aug 17, 2023

View reviewed changes

This comment has been minimized.

Sign in to view

sergiorg-hpc force-pushed the magkanar/improve_cnrn_input branch from aadc918 to 9bb0951 Compare August 17, 2023 13:08

This comment has been minimized.

Sign in to view

iomaganaris and others added 11 commits August 17, 2023 17:26

Added descriptions about the changes needed to be done

d228ce1

Initial simple implementation

aa29616

Trying to set the proper neurodamus branch for blueconfigs

0efe3ea

Cleaning up few things and deleting folders from gpfs

62b8692

Enable CACHE mode for /dev/shm

815560a

Remove small comment

740d852

Remove debug code

93565ca

Fix formatting

8b75f4a

Remove coreneuron_input_{node_id} files

9823270

Improve a bit path handling

79c24d4

Divide output folder by cycle, nodegroup ID, group ID, and node

49b87a1

sergiorg-hpc force-pushed the magkanar/improve_cnrn_input branch from 9bb0951 to 9a57819 Compare August 17, 2023 15:27

This comment has been minimized.

Sign in to view

sergiorg-hpc force-pushed the magkanar/improve_cnrn_input branch from 9a57819 to ed01a14 Compare August 17, 2023 16:42

Fix for IME syncing data back to GPFS

8a285f4

sergiorg-hpc force-pushed the magkanar/improve_cnrn_input branch from ed01a14 to 8a285f4 Compare August 17, 2023 16:43

This comment has been minimized.

Sign in to view

Refactor the original implementation to simplify the moving of the data

21baca3

sergiorg-hpc force-pushed the magkanar/improve_cnrn_input branch from 69955e3 to 21baca3 Compare August 18, 2023 16:53

This comment was marked as outdated.

Sign in to view

iomaganaris commented Aug 21, 2023

View reviewed changes

iomaganaris assigned sergiorg-hpc Jan 12, 2024

iomaganaris marked this pull request as draft January 24, 2024 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CNEUR-379] Use `/dev/shm` as a cache in multinode simulations #15

[CNEUR-379] Use `/dev/shm` as a cache in multinode simulations #15

iomaganaris commented Jul 25, 2023

This comment has been minimized.

sergiorg-hpc Aug 17, 2023

sergiorg-hpc Aug 18, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment was marked as outdated.

iomaganaris Aug 21, 2023 •

edited

Loading

iomaganaris commented Jan 8, 2024

iomaganaris commented Jan 12, 2024

[CNEUR-379] Use /dev/shm as a cache in multinode simulations #15

Are you sure you want to change the base?

[CNEUR-379] Use /dev/shm as a cache in multinode simulations #15

Conversation

iomaganaris commented Jul 25, 2023

This comment has been minimized.

sergiorg-hpc Aug 17, 2023

Choose a reason for hiding this comment

sergiorg-hpc Aug 18, 2023

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment was marked as outdated.

iomaganaris Aug 21, 2023 • edited Loading

Choose a reason for hiding this comment

iomaganaris commented Jan 8, 2024

iomaganaris commented Jan 12, 2024

[CNEUR-379] Use `/dev/shm` as a cache in multinode simulations #15

[CNEUR-379] Use `/dev/shm` as a cache in multinode simulations #15

iomaganaris Aug 21, 2023 •

edited

Loading