Parallel Matrix Multiplication Methods

Various Parallel Matrix Multiplication techniques are available which can be categorized based on the Memory architecture
1. Shared Memory
2. Distributed Memory
Matrix Multiply can be embedded in the hardware

Shared Memory

In a shared memory architecture, every Processing Element(PE) has the same view of memory and access memory irrespective of what other processor does.
We have to assume that Matrices fit into the shared memory or handled by virtual memory mechanisms.

But we also assume there is a much faster but smaller local memory(cache) that is available to each PE and is managed by hardware(not always).
The memory is also private to each processing element and there is no mechanism for acessing other's memory
Every memory access to the memory is routed through the local memory and the results are cached there. The local memory supports certain memory access pattern and policies on which elements should stay there. The two typical memory access patterns are
1. Temporal Locality
2. Spatial Locality

Algorithmic Optimization for Shared Memory

Since all the PE have access all the elements there is no problem of data placement. Also the local memory is hardware managed which is governed by the data access pattern of the PE.
Now the PE has to make sure that the elements accessed fit into the local memory and doesnot go to the shared memory frequenly. Also PE follow spatial and temporal memory access patterns.

Naive Matrix Multiplication

// C = (m x n), A = (m x k), B = (k, n)
for(size_t i = 0; i < m; i++){
    for(size_t j = 0; j < n; j++){
        for(size_t l = 0; l < k; l++){
            C[i][j] += A[i][l] * B[l][j];
        }
    }
}

The approach is the access schedule one or more non overlapping row to each PE.
Each PE performs computation dot product with the other matrix and stores the result to the resultant matrix.

Problems with Naive Matrix Multiplication

If the dimensions of the matrix exceeds the size of local memory, every fetch will be directed to the shared memory instead of local memory.
Also Matrix B is not access in a row major fashion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Survey on Matrix Computation.md

A Survey on Matrix Computation.md

Parallel Matrix Multiplication Methods

Shared Memory

Algorithmic Optimization for Shared Memory

Naive Matrix Multiplication

Problems with Naive Matrix Multiplication

Files

A Survey on Matrix Computation.md

Latest commit

History

A Survey on Matrix Computation.md

File metadata and controls

Parallel Matrix Multiplication Methods

Shared Memory

Algorithmic Optimization for Shared Memory

Naive Matrix Multiplication

Problems with Naive Matrix Multiplication