# Understanding the View Matrix

In this article, I will attempt to explain how to construct the view matrix correctly and how to use the view matrix to transform a model’s vertices into clip-space.  I will also try to explain how to compute the camera’s position in world space (also called the Eye position) from the view matrix.

# Introduction

Understanding how the view matrix works in 3D space is one of the most underestimated concepts of 3D game programming.  The reason for this is the abstract nature of this elusive matrix.  The camera transformation matrix is the matrix that determines the position and orientation of an object that represents the camera in 3D space. The view matrix is used to transform a model’s vertices from world-space to view-space.  Don’t be mistaken to think these two things are the same thing!

You can think of it like this:

Imagine you are holding a video camera, taking a picture of a beautiful mountain scene.  You can get a different view of the scene by moving your camera around and it appears that the scene is moving when you view the image through your camera’s view finder.  This is how it work in real-life.  In a computer program however, things work a little bit differently.  The camera doesn’t move at all in a 3D program, and in actuality, the world is just moving in the opposite direction and orientation of how you would want the camera to move in reality.

In order to understand this correctly, we must think in terms of two different things:

1. The Camera Transformation Matrix: The transformation that places the camera in the correct position and orientation in world space (this is the transformation that you would apply to a 3D model of the camera if you wanted to represent it in the scene).
2. The View Matrix: This matrix will transform vertices from world-space to view-space.  This matrix is the inverse of the camera’s transformation matrix described above.

# The Camera Transformation

The camera transformation is the transformation matrix that can be used to position and orient an object or a model in the scene that represents the camera.  If you wanted to represent several cameras in the scene and you wanted to visualize where each camera was placed in the world, then this transformation would be used to transform the vertices of the model that represents the camera from object-space into world space. This is the same as a world-matrix or model-matrix that positions any model in the scene.  This transformation should not be mistaken as the view matrix.  It cannot be used directly to transform vertices from world-space into view-space.

To compute the camera’s transformation matrix is no different from computing the transformation matrix of any object placed in the scene.

If $\mathbf{R}$ represents the orientation of the camera, and $\mathbf{T}$ represents the translation of the camera in world space, then the camera’s transform matrix $\mathbf{M}$ can be computed by multiplying the two matrices together.

$\mathbf{M}=\mathbf{R}*\mathbf{T}$

To get the position of the camera in world-space (also called the eye position) from the matrix $\mathbf{M}$, you simply take 4th row of the resulting 4×4 matrix:

$\mathbf{eye}=(\mathbf{M}_{4,1},\mathbf{M}_{4,2},\mathbf{M}_{4,3},\mathbf{M}_{4,4})$

# The View Matrix

The view matrix on the other hand is used to transform vertices from world-space to view-space.  This matrix is usually concatenated together with the object’s world matrix and the projection matrix so that vertices can be transformed from object-space directly to clip-space in the vertex program.

If $\mathbf{M}$ represents the object’s world matrix (or model matrix), and $\mathbf{V}$ represents the view matrix, and $\mathbf{P}$ is the projection matrix, then the concatenated world (or model), view, projection can be represented by $\mathbf{MVP}$ simply by multiplying the three matrices together:

$\mathbf{MVP}=\mathbf{P}*\mathbf{V}*\mathbf{M}$

And a vertex $\mathbf{v}$ can be transformed to clip-space by multiplying by the combined matrix $\mathbf{MVP}$:

$\mathbf{v}\prime=\mathbf{MVP}*\mathbf{v}$

So that’s how it’s used, so how is the view matrix computed?

In OpenGL, there is a method called gluLookAt method that will post-multiply the the current matrix in the OpenGL matrix stack.  This method takes the position of the camera in world-space (called the eye position), the point that the camera is looking at (called the camera target), and a global up-axis that is used to orient the up-direction of the camera (this is usually set to $(0, 1, 0)$ but the user should be careful not to allow the camera to be orientated looking directly in up axis otherwise the matrix calculation will fail. This issue is commonly called Gimbal-lock).

A typical implementation of this function may look something like this:

Matrix4 LookAt( Vector3 eye, Vector3 target, Vector3 up )
{
Vector3 zaxis = normal(target - eye);    // The "look-at" vector.
Vector3 xaxis = normal(cross(up, zaxis));// The "right" vector.
Vector3 yaxis = cross(zaxis, xaxis);     // The "up" vector.

// Create a 4x4 orientation matrix from the right, up, and at vectors
Matrix4 orientation = {
xaxis.x, yaxis.x, zaxis.x, 0,
xaxis.y, yaxis.y, zaxis.y, 0,
xaxis.z, yaxis.z, zaxis.z, 0,
0,       0,       0,     1
};

// Create a 4x4 translation matrix by negating the eye position.
Matrix4 translation = {
1,      0,      0,     0,
0,      1,      0,     0,
0,      0,      1,     0,
-eye.x, -eye.y, -eye.z,  1
};

// Combine the orientation and translation to compute the view matrix
return ( translation * orientation );
}


I should point out that you may have seen a similar method in the DirectX documentation called LookAtLH. In the DirectX documentation, the final row of the resulting matrix is computed by taking the dot-product of the X, Y, and Z axis together with the negated eye position. This is probably a more efficient way to compute the final row than the way shown here, but the result is the same. Using the dot-product on the final row of the matrix is just a short-cut to doing a full matrix multiply.

If you only have the resulting view matrix $\mathbf{V}$ but you need to know what the world-position of the camera would be (also known as the eye position), you can take the 4th row of the inverse view matrix:

$\mathbf{eye}=(\mathbf{V}^{-1}_{4,1},\mathbf{V}^{-1}_{4,2},\mathbf{V}^{-1}_{4,3},\mathbf{V}^{-1}_{4,4})$

# Converting between Camera Transformation and View Matrix

If you only have the camera transformation $\mathbf{M}$ and you want to compute the view matrix $\mathbf{V}$ that will correctly transform vertices from world-space to view-space, you only need to take the inverse of the camera transform.

$\mathbf{V}=\mathbf{M}^{-1}$

If you only have the view matrix $\mathbf{V}$ and you need to find a camera transform $\mathbf{M}$ that can be used to position a visual representation of the camera in the scene, you can simply take the inverse of the view matrix.

$\mathbf{M}=\mathbf{V}^{-1}$

# Conclusion

I hope that I have made clear the differences between the camera’s transform matrix and the view matrix and how you can convert between one and the other. It is also very important to be aware of which matrix you are dealing with so that you can correctly obtain the eye position of the camera. When working with the camera’s world transformation, the eye position is the 4th row of the world transform, but if you are working with the view matrix, you must first inverse the matrix before you can extract the eye position in world space.

## 22 thoughts on “Understanding the View Matrix”

1. In other words: the view matrix is the absolute rotation and position of the camera, with the camera’s position inverted, i.e. (8, 8, 8) becomes (-8, -8, -8).

• The view matrix is the inverse of the camera’s transformation matrix in world-space. The rotation must also be inverted.

2. Why does the LookAt function return ( translation * orientation ) instead of ( orientation * translation ) ? I think that the second expression is correct.

• Keep in mind that this function is returning the inverse of the camera matrix that would position and orient this camera in world space. That is, the function returns the View matrix.

And since we know the orientation is orthonormalized then we also know that the inverse is equivalet to the transpose (see Matrices for a evidence that the inverse is equivalent to the transpose in the case of orthonormalized matrices).

If R is the rotation matrix and T is the translation matrix then we can also write T * R == transpose(R) * T because the only thing we are doing when we change the order of matrix multiplication is making row-major matrices column-major and visa-versa (if we remember from our linear algebra courses).

Also keep in mind if you are switching from row-major (primarily used in DirectX) to column-major (primarily used in OpenGL) matrices, then you must also change the order in which matrices are multiplied.

The rule-of-thumb is: If it doesn’t look right, swap your matrix multiplies.

• Ok, so in the LookAt function translation and orientation matrices are in a row-major order, and I should pass GL_TRUE in glUniformMatrix4fv when uploading the LookAt result (translation * orientation)

• You should only transpose a matrix if you are sure you are passing a row-matrix when a column matrix is expected or visa-versa.

If you are primarily working with column matrices and OpenGL, then I would strongly suggest you use the OpenGL Mathmatics library (http://glm.g-truc.net/). This library has an extensive math library including functions to build view matrices and world transformation matrices (as well as many other features)

• This clears up why the view matrix was returned instead of the inverse view (camera transformation matrix) but one question why does the camera not move and why does the world move ?

• Michael,

This is a matter of perspective. From the perspective of the camera, the world moves while the camera is stationary. In the view space, the camera is at the origin and everything else in the world is expressed relative to that. Does this make sense?

3. I believe example in the code you provided is in row vector format and the math you have shown could cause confusion. For the Row vector format you would instead multiply v * Model * View * Projection = Clipspace.

• Michael,

Thank you for pointing this out. I should rewrite that part of the post to be clear how to work with row and column matrices differently.
When I get time, I will definitely fix this discrepancy.

4. Thanks for the tutorial. I do have one question though. If the view matrix is just the inverse of the world matrix, what’s the point of making a world-view-projection matrix? Matrix multiplication is associative, so you could write MVP as (MV)P. If V is the inverse of M that would make (MV) the identity matrix I, so (MV)P would be equal to IP, which would be equal to P. And yet I always see MVP used, not just P.

• Xavier,

That is ONLY true when you are talking about the camera.

Every object in your scene has it’s OWN world matrix. So a character in your world will have a different world matrix than the camera’s world matrix (referred to as the camera matrix in this article).

But yes, if you take the camera matrix and multiply it by the view matrix, you will absolutely get the identity matrix. That makes sense if you consider the camera to be fixed at the origin and you simply move the world around you (which is what I said in the first paragraph).

5. I’m trying out a particle system example that came out with RenderMonkey and it uses a “view inverse matrix” to billboard the quads for the particle effect.

I can see all the values, but I can’t work out how they calculate the “view inverse matrix”, it’s not the inverse of the view matrix or the inverse of the view projection.

Are you familure with the term, if so how is this calculated?

• The “view inverse matrix” is probably a 3×3 matrix (or a 4×4 matrix with no translation) that represents a matrix that will anti-rotate the particle to face the camera. This matrix cannot take the translation of the camera into consideration because if it did, all of your particles would be placed on the position of the camera.

In short: it is a matrix that will anit-rotate the particle to always face the camera.

• Thanks Michael for the kind words. I do realize that this article needs to be rewritten to be more clear and maybe include a few examples of working camera models (such as FPS camera, 3rd person camera, or orbit camera). This is on my TODO list.

• sorry, guys but i want someone to correct my ideas if they are wrong
first, I’m trying to develop a 3d engine but the understanding of with respect to which
(the camera or the object ) vectors are computed for example when trying to compute the lookAt vector ..should i write camerapos – objectpos or
objectpos – camerapos ??
and how to compute the up and right vectors corresponding to the previous determination

and when to use row-major matrices vs column-major matrices
and when to right-multiply matrices vs left-multiply matrices

and how to implement a camera free look rather than looking at a particular object
and should i use a left handed coordinate system or right handed

6. I’m confused. Given an orientation, R, and a translation, T of a camera, wouldn’t the camera’s transform matrix, M, be T * R? That is, you rotate it, then you translate it? Or do I have that backwards?

I’m getting frustrated trying to learn this stuff because there’s so many gaps in every resource on the web. For example, in your Matrices tutorial, you describe various matrices, and their properties, but nowhere do you describe the effects of multiplying them together. And finding those effects elsewhere on the web has been really hard.

I think I need to get a book, sorry this turned into a rant

Also, seeing your matrices confused the crap out of me because I didn’t know they were in column major order until I read the comments.

• Brian,

I’m sorry for the confusion. I do plan on improving this article to include various camera models and explaining the difference between the Right-Handed coordinate systems and Left-Handed coordinate systems and the differences between column matrices and row matrices and the effect it has on the math (the order of the matrix multiples must be reversed if you are using a different system).

I don’t mind the rant because then I know where I need to improve… It’s just a matter of finding the time to do it!

Regards,

Jeremiah

7. I’m confused, based on what I learned in college about linear translations, the last column would be used for translations not the last row. Why isn’t this the case? Aren’t matrices multiplied with vectors row by row?

• In linear algebra, you should have learned this:

 M = A * B ; Mij = sum( Aik * Bkj ) 

In words, the elements of the rows of matrix A are multiplied by the elements of the columns of matrix B and their results are summed.
Also, if you change the order of multiplication, it may change the result (so A * B != B * A). That is, (unlike scalar multiplication) matrix multiplication is not commutative.

In computer graphics, the 4th row (for row-major matrices) or the 4th column (for column-major matrices) is used to store the translation of the local coordinate system. If you see an example using row-major matrices (as shown here) but you are using column-major matrices, then you only have to change the order of multiplication to get the same results.

For example:
If you want to transform a 4-component vector v by a 4×4 matrix M then you must perform the transformation in a specific order dependent on the matrix layout.

For Row-Major matrices, you must perform the transformation in this order:

 v' = ( v * M ); 

For Column-Major matrices, you must perform the transformation in this order:

 v' = ( M * v ); 
I hope this makes sense.