-
Notifications
You must be signed in to change notification settings - Fork 41
RMA WG 09 19 2019
(Please add/delete/update if you remember differently)
Participants: Akhil, David, Manju, Jim, Naveen, Wasi
- OpenSHMEM Performance Variables and APIs proposal
- Discussion led by Wasi
- Wasi’s presentation: Distributed via openshmem list email
-
The motivation for the proposal
- Resource usage, debugging and performance analysis
-
Example collector tool demonstrating the use of various interfaces
- Interface to use initializing and reading different class of variables
-
How do you implement the performance variables such as RMA operations counter in the one-sided networks? What is the performance degradation?
- Implemented in libfabrics and performance degradation is negligible (demonstrated in the OpenSHMEM 2018 workshop paper)
- Evaluation on InfiniBand/Aries? Yet, to be done.
-
Implementation on the RDMA networks
- How does one implement the performance variables (such as counting operations, completions) with low overhead? Maybe, the implementations on the RDMA networks should not implement those. On libfabrics, implementing byte counter adds overhead and we might not implement that.
-
Does the progress thread degrade the performance?
- Yes, in this example but it is negligible. However, it is not required to extract information in this way. Also, given we might be using the performance counters in the debug mode, the performance degradation might be tolerable.
-
Would this be more useful when in future we have capabilities like adaptive routing?
- It should not matter (at least on InfiniBand), the connection is still reliable.
-
Performance Variable and Categories
- Based on the feedback, the classes are transformed and it has now four classes — Counter, Total, State, LEVEL
- Three categories of performance variables - resource, communication, and runtime.
-
Communication Info Objects and Variables
- What is the difference between pending and issued? Pending - the hardware has not received the post. Issued - the hardware information about the post. Completed - It is a remote completion, not the local completion.
- How do you leverage the various categories without having more fine-grained information such as type about the post? If we demonstrate the need for more counters, we would add. Discussion of potential use cases with the users can drive this.
-
Resource Info Objects and Variables
-
How is allocation/deallocation a performance variable?
-
Does the counter in a private context require locking?
- Yes.
-
Can we do with atomic operations?
- It depends if the variables are software/hardware counter.
-
For future meetings:
- Wasi to add more information about the interfaces and implementation for the presentation at F2F.
- Collect more information about the usage of performance counters in MPI
-
Working Groups
-
Errata