You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The definitions four times in the legends need to be documented somewhere. Actually I now realise that "data sent" and "data received" are sizes not times! Rename to "amount of data sent (bytes)" and "amount of data received (bytes)", or if it is just one frame of data sent then "size of data sent (bytes)" and "size of data received (bytes)" might be better
The meaning of execution time is not clear - does it mean the duration of the MPI_alltoallv call as seen from the program that calls that?
The meaning of late arrival timing is not at all explained - what events is it calculated from and how?
How is the bandwidth calculated?
Would a plot of rank/machne name no vs data size vs execution time and/or late arrival timing be more helpful. This is a 3D plot but one dimension could be rendered with color. Putting all the calls on the same plot would show ranges of the behaviour.
I have just realised that repeating the experiment with different ranks assigned to different machines might reveal which poor performance is caused by the algorithm of the code under test (should go with rank if the code is deterministic) and which by a poorly perfoming compute node (because it has dodgy harware or because it has other stuff running on it)
The text was updated successfully, but these errors were encountered:
Maybe a separate issue needs to be opened about the following point: Would a plot of rank/machne name no vs data size vs execution time and/or late arrival timing be more helpful. This is a 3D plot but one dimension could be rendered with color. Putting all the calls on the same plot would show ranges of the behaviour.
I have just realised that repeating the experiment with different ranks assigned to different machines might reveal which poor performance is caused by the algorithm of the code under test (should go with rank if the code is deterministic) and which by a poorly perfoming compute node (because it has dodgy harware or because it has other stuff running on it)
This is a great point and the tool could help for sure but I believe it is also greatly dependent on the application and what it does. So maybe we should open a separate issues and think more deeply about it.
The definitions four times in the legends need to be documented somewhere. Actually I now realise that "data sent" and "data received" are sizes not times! Rename to "amount of data sent (bytes)" and "amount of data received (bytes)", or if it is just one frame of data sent then "size of data sent (bytes)" and "size of data received (bytes)" might be better
The meaning of execution time is not clear - does it mean the duration of the MPI_alltoallv call as seen from the program that calls that?
The meaning of late arrival timing is not at all explained - what events is it calculated from and how?
How is the bandwidth calculated?
Would a plot of rank/machne name no vs data size vs execution time and/or late arrival timing be more helpful. This is a 3D plot but one dimension could be rendered with color. Putting all the calls on the same plot would show ranges of the behaviour.
I have just realised that repeating the experiment with different ranks assigned to different machines might reveal which poor performance is caused by the algorithm of the code under test (should go with rank if the code is deterministic) and which by a poorly perfoming compute node (because it has dodgy harware or because it has other stuff running on it)
The text was updated successfully, but these errors were encountered: