Skip to content

column major vs row major matrix

Wu Jie edited this page Mar 30, 2014 · 1 revision

Matrix use column major or row major is very import. Most time the performance bottle neck is in GPU. so you need to know which pack method been used by the hardware. In PC-d3d9, the default setting is column major in shader.

D3DXCompileShader provide us two flags to config the pack method for compile result, they are: D3DXSHADER_PACKMATRIX_COLUMNMAJOR and D3DXSHADER_PACKMATRIX_ROWMAJOR.

D3DXSHADER_PACKMATRIX_COLUMNMAJOR: each column of the matrix will be packed in one register. so the mul(vec,mat) will use dot method. For example:

float4x4 mat == (c0,c1,c2,c3)
float4 vec == r0 
mul(vec,mat) == ( dot(r0,c0), dot(r0,c1), dot(r0,c2), dot(r0,c3) )

D3DXSHADER_PACKMATRIX_ROWMAJOR: each row of the matrix will be packed in one register. so the mul(mat,vec) will use dot method. For example:

float4x4 mat == (c0,
                 c1,
                 c2,
                 c3)
float4 vec == r0 
mul(mat,vec) == ( dot(c0,r0), dot(c1,r0), dot(c2,r0), dot(c3,r0) )

Here is a column major shader, and its assamble output:

// ====================================================
// hlsl code
// ====================================================

uniform float4x4 ViewMatrix : register(c0);

struct SVertexInput
{
    float3 Pos      : POSITION;
}; // end struct SVertexInput 

struct SVertexToPixel
{
    float4 ProjectedPos : POSITION0;
}; // end struct SVertexToPixel 

SVertexToPixel MainVS( SVertexInput _input )
{
    SVertexToPixel output;
    output.ProjectedPos = mul( float4(_input.Pos,1.0f), ViewProjectionMatrix );
    return output;
}

// ====================================================
// asm output
// ====================================================

vs_3_0
def c0, 1, 0, 0, 0
dcl_position v0
dcl_position o0
mad r0, v0.xyzx, c0.xxxy, c0.yyyx
dp4 o0.x, r0, c0
dp4 o0.y, r0, c1
dp4 o0.z, r0, c2
dp4 o0.w, r0, c3

As the code show above, since we need to send matrix "!ViewProjectionMatrix" to hlsl from cpu, we need to confirm what kind of pack method we used in cpu, and how it will be at the end in gpu. In my engine, I would like the same multiply method in both shader and cpp code, so I choose column-major matrix in cpu & gpu. And finally when in

pD3dDevice->SetVertexShaderConstantF ( registerIndex, m_ViewProjectionMatrix.GetData(), 16 );

It will be wrong. the problem is when you send the !ViewProjection from memory, as:

m[16] = { 0,1,2,3, 4,5,6,7, 8,9,10,11, 12,13,14,15 }

it will be pack to gpu 4 by 4, that means {0,1,2,3} will go to {c0}, {4,5,6,7} will go to {c1}... so at the end in shader, the matrix appear to be:

c0 c1 c2 c3
0 4 8 12
2 6 10 14
3 7 11 15

So the right way to let cpp & shader code all use column major ( right multiply ) method is in cpu, when send matrix to gpu, transpose first

pD3dDevice->SetVertexShaderConstantF ( registerIndex, m_ViewProjectionMatrix.GetTranspose().GetData(), 16 );
Clone this wiki locally