Hidden surface removal

Now that we can render any scene from any point of view, let’s improve on the wireframe representation we’re getting so far.

The obvious first step is to make solid objects look solid. To do this, let’s use DrawFilledTriangle() to draw each triangle of a random color, and see what happens:

These doesn’t quite look like cubes, do they?

The problem here is that some triangles that should be behind others are instead drawn in front of them. Why? Because we’re blindly drawing 2D triangles on the canvas in an essentially random order - in the order they happen to be defined in the model.

There’s no “right” order for the triangles to be defined in the model, though. Suppose the model’s triangles are sorted in such a way the back faces are drawn first, and then overdrawn by the front faces. You get the result you expect. However, if you rotate the cube \(180^\circ\), you get the exact opposite - farther triangles cover nearer triangles.

Painter’s algorithm

A first solution to this issue is known as the Painter’s Algorithm. Real-life painters first draw the backgrounds and then cover parts of them with foreground objects. We could achieve the same effect by taking every triangle in the scene, applying the model and camera transforms, sorting them back to front, and drawing them in that order.

Although this would indeed paint the triangles in the right order, it has some drawbacks that make it impractical.

First, it doesn’t scale well. The best known sort algorithm is \(O(n.log(n))\), which means the run time more than doubles if you double the number of triangles. In other words, it works for small scenes, but it quickly becomes a bottleneck when the complexity of the scene increases.

Second, it requires knowing the whole list of triangles at once. This would require a lot of memory, and prevents a stream-like approach to rendering. If you think of rendering as a pipeline, where model triangles enter on one end and pixels come out the other end, it’s not possible to start outputting pixels until every triangle has been processed. This in turn means we can’t parallelize this algorithm.

Third, even if you’re willing to live with these limitations, there are cases where the right ordering of triangles just doesn’t exist at all. Consider the following case:

No matter in which order you draw these triangles, you’ll always get an incorrect result.

Depth buffering

While solving the problem at the triangle level doesn’t work, solving it at the pixel level certainly does, and at the same time overcomes the other limitations of the Painter’s Algorithm.

Essentially, for each pixel on the canvas, we want to paint it with the “right” color. In this case, the “right” color is the color of the object that is closest to the camera (in this case, \(P_1\)):

It turns out we can do just that. Suppose we keep the value of \(Z\) of the point currently represented by each pixel in the canvas. When we’re considering whether to paint a pixel with a certain color, we will do so only if the \(Z\) coordinate of the point we’re about to paint is less than the \(Z\) coordinate of the point that is already there.

Suppose the ordering of the triangles is such that we want to paint \(P_2\) first and \(P_1\) second. The pixel is painted red, and its \(Z\) is noted as \(Z_{P_2}\). Then we paint \(P_1\), and since \(Z_{P_2} > Z_{P_1}\), the pixel is overwritten with blue; we get the correct results.

Of course, we’d have gotten the correct result regardless of the values of \(Z\). What if we wanted to paint \(P_1\) first and \(P_2\) second? The pixel is first painted blue, and \(Z_{P_1}\) is stored; but when we want to paint \(P_2\), since \(Z_{P_2} > Z_{P_1}\), we don’t paint it (because if we did, we’d be painting a distant point obscuring a closer one). We get a blue pixel again, which is the correct result.

In terms of implementation, you need a buffer to hold the \(Z\) coordinate of every pixel in the canvas; this is called the Depth Buffer, and its dimensions are naturally the same as those of the canvas.

But where do the \(Z\) values come from?

These should be the \(Z\) values of the points after being transformed but before being perspective-projected. This is why in Setting up the Scene we set up the transform matrices so that the end result would have \(1/Z\) in them.

So we could get \(Z\) values from these \(1/Z\) values. But we only have this value for vertexes; we need it on a per-pixel basis.

And this is yet another application of the attribute mapping algorithm. Why not use \(Z\) as the attribute and interpolate it across the face of the triangle? By now you know the drill; take the values of Z0, Z1 and Z2, compute Z01, Z02 and Z02, obtain z_left and z_right from them, and then compute z_segment for each pixel of each horizontal segment. And instead of blindly PutPixel(x, y, color), we do this:

z = z_segment[x - xl]
if (z < depth_buffer[x][y]) {
    canvas.PutPixel(x, y, color)
    depth_buffer[x][y] = z
}

For this to work, depth_buffer should be initialized to \(+\infty\) (or just “a very big value”).

The results we get now are much better:

Source code and live demo >>

Why 1/Z instead of Z

This isn’t the whole story, though. The values of \(Z\) at the vertexes are correct (they come from data, after all), but in most cases the linearly interpolated values of \(Z\) for the rest of the pixels are wrong. While this approximation works “well enough” for depth buffering, it will fail further on.

To see how the values are wrong, consider the simple case of a line from \(A (-1, 0, 2)\) to \(B (1, 0, 10)\). Its midpoint \(M\) is \((0, 0, 6)\):

Let’s compute the projection of these points with \(d = 1\). \(A'_x = A_x / A_z = -1 / 2 = -0.5\). Similarly, \(B'_x = 0.1\) and \(M'_x = 0\):

Now what happens if we linearly interpolate the values of \(A_z\) and \(B_z\) as described above, to get the computed value for \(M_z\)? The linear function looks like this:

From here we can conclude that

\[ { M_z - A_z \over M'_x - A'_x } = { B_z - A_z \over B'_x - A'_x } \]

\[ M_z = A_z + (M'_x - A'_x) ({B_z - A_z \over B'_x - A'_x}) \]

If we plug the numbers and so some arithmetic, we get

\[ M_z = 2 + (0 - (-0.5))({10 - 2 \over 0.1 - (-0.5)}) = 2 + (0.5)({8 \over 0.6}) = 8.666 \]

which is clearly not \(M_z = 6\).

So where is the problem? We’re using attribute mapping, which we know works well; we’re feeding it the right values, which come from data; so why are the outputs wrong?

The problem is with the implicit assumption we make with linear interpolation: that the function we are interpolating is linear. And in this case, it isn’t!

If \(Z = f(x', y')\) was a linear function of \(x'\) and \(y'\), we could write it as \(Z = A \cdot x' + B \cdot y' + C\) for some value of \(A\), \(B\) and \(C\). This kind of function has the property that the difference in its value between two points depends on the difference between the points, but not on the points themselves:

\[ f(x' + \Delta x, y' + \Delta y) - f(x', y') = [A (x' + \Delta x) + B (y' + \Delta y) + C] - [A \cdot x' + B \cdot y' + C] \]

\[ = A (x' + \Delta x - x') + B (y' + \Delta y - y') + C - C \]

\[ = A \Delta x + B \Delta y \]

That is, for a given difference in screen coordinates, the difference in \(Z\) would always be the same.

More formally, the equation of the plane that contains the triangle we’re studying is

\[ AX + BY + CZ + D = 0 \]

On the other hand we have the perspective projection equations:

\[ x' = {Xd \over Z} \]

\[ y' = {Yd \over Z} \]

We can get \(X\) and \(Y\) back from these:

\[ X = {Zx' \over d} \]

\[ Y = {Zy' \over d} \]

If we replace \(X\) and \(Y\) in the plane equation with these expressions, we get

\[ {Ax'Z + By'Z \over d} + CZ + D = 0 \]

Multiplying by \(d\) and then solving for \(Z\),

\[ Ax'Z + By'Z + dCZ + dD = 0 \]

\[ (Ax' + By' + dC)Z + dD = 0 \]

\[ Z = {-dD \over Ax' + By' + dC} \]

Which is clearly not a linear function of \(x'\) and \(y'\).

However, if we just compute \(1/Z\), we get

\[ 1/Z = {Ax' + By' + dC \over -dD} \]

Which clearly is a linear function of \(x'\) and \(y'\).

In order to prove this really works, here’s the example above but this time computed using linear interpolation of \(1/Z\):

\[ { M_{1 \over z} - A_{1 \over z} \over M'_x - A'_x } = { B_{1 \over z} - A_{1 \over z} \over B'_x - A'_x } \]

\[ M_{1 \over z} = A_{1 \over z} + (M'_x - A'_x) ({B_{1 \over z} - A_{1 \over z} \over B'_x - A'_x}) \]

\[ M_{1 \over z} = {1 \over 2} + (0 - (-0.5)) ({{1 \over 10} - {1 \over 2} \over 0.1 - (-0.5)}) = 0.166666 \]

And therefore

\[ M_z = {1 \over M_{1 \over z}} = {1 \over 0.166666} = 6 \]

Which is indeed the correct value.

All of this means we need to use values of \(1/Z\) instead of values of \(Z\) for depth buffering. The only practical difference in the pseudocode is that the buffer should be initialized to \(0\) (which is conceptually \(1 \over + \infty\)) and the comparison should be inverted (keep the bigger value of \(1/Z\), which corresponds to the smaller value of \(Z\)).

Back face culling

Depth buffering produces the desired results. But can we get the same thing in a faster way?

Going back to the cube, even if each pixel ends up having the right color, some of them are painted over and over several times. For example, if a back face of the cube is rendered before a front face, many pixels will be painted twice. As the number of operations we need to perform per-pixel increase (so far we’re computing \(1/Z\) for every pixel, but soon we’ll add illumination, for example), computing pixels that will never be visible becomes more and more wasteful.

Can we discard pixels earlier, before we go into all of this computation? It turns out we can discard entire triangles before even starting to draw!

So far we have been talking informally about front faces and back faces. Imagine every triangle has two distinct sides; at any time we can only see one of the sides of the triangle. In order to distinguish the two sides, we stick an arrow to each triangle, perpendicular to its surface. Then we take the cube and make sure every arrow is pointing out:

Now this arrow lets us classify every triangle as “front” or “back”, depending on whether it points towards the camera or away from the camera; more formally, if the view vector and this arrow (actually a normal vector of the triangle) form an angle of less or more than \(90^\circ\) respectively:

On the other hand, having two-sided triangles oriented in the same way lets us define what’s “inside” and what’s “outside” any closed object. By definition, we’re not supposed to see the inside of a closed object. What this means is that for any closed object, no matter where the camera is, the front faces completely cover the back faces.

Which in turn means we don’t need to draw the back faces at all, because they will be overdrawn by front faces anyway!

Classifying triangles

Let’s formalize and implement this. Suppose we have the normal vector \(\vec{N}\) of the triangle, and the vector \(\vec{V}\) from a vertex of the triangle to the camera. Suppose \(\vec{N}\) points to the outside of the object. In order to classify the triangle as front or back, we compute the angle between \(\vec{N}\) and \(\vec{V}\) and see whether they’re within \(90^\circ\) of each other.

We can again use the properties of the dot product to make this even simpler. Remember that if we call \(\alpha\) the angle between \(\vec{N}\) and \(\vec{V}\),

\[ {{\langle \vec{N}, \vec{V} \rangle} \over {|\vec{N}||\vec{V}|}} = cos(\alpha) \]

Because \(cos(\alpha)\) is non-negative for \(|\alpha| \le 90^\circ\), we only need to know its sign to classify a face as front or back. Note that \(|\vec{N}|\) and \(|\vec{V}|\) are always positive, so they don’t affect the sign of the expression. Therefore

\[ sign(\langle \vec{N}, \vec{V} \rangle) = sign(cos(\alpha)) \]

So the classification is simply this:

\(\langle \vec{N}, \vec{V} \rangle \le 0\) Back Face
\(\langle \vec{N}, \vec{V} \rangle > 0\) Front Face

The edge case \(\langle \vec{N}, \vec{V} \rangle = 0\) corresponds to the case where we see the edge of the triangle, that is, when the camera and the triangle are coplanar. We can classify this either way without affecting the result much, so we choose to classify it as a back face to avoid dealing with degenerate triangles.

Where do we get the normal vector from? It turns out there’s a vector operation, the cross product \(\vec{A} \times \vec{B}\), that takes two vectors \(\vec{A}\) and \(\vec{B}\) and yields a vector perpendicular to them1. We can easily get two vectors coplanar with the triangle by subtracting its vertexes from each other, so computing the direction of the normal vector of the triangle \(ABC\) is straightforward:

\[ \vec{V_1} = B - A \]

\[ \vec{V_2} = C - A \]

\[ \vec{P} = \vec{V_1} \times \vec{V_2} \]

Note that I said “the direction of the normal vector”, not “the normal vector”. There are two reasons for this. The first one is that \(|\vec{P}|\) isn’t necessarily equal to \(1\). This isn’t really important because normalizing \(\vec{P}\) would be trivial, and because we only care about the sign of \(\langle \vec{N}, \vec{V} \rangle\).

But the second reason is that if \(\vec{N}\) is a normal vector of \(ABC\), so is \(\vec{-N}\)!

Of course, in this case we care a lot in which direction \(\vec{N}\) points, because this is exactly what lets us classify triangles as either front or back. And the cross product happens not to be commutative; in particular, \(\vec{A} \times \vec{B} = -\vec{B} \times \vec{A}\). This means we can’t happily subtract the vertexes of the triangle in any order, because that determines whether the normal points “in” or “out”.

Although the cross product isn’t commutative, it’s also not random, of course.

The coordinate system we’ve been using so far (X right, Y up, Z forward) is called left-handed, because you can naturally point the thumb, index and middle finger of your left hand in these directions (thumb up, index forward, middle right). A right-handed coordinate system is similar, but the index finger of the right hand points left20.

(TODO: Finish this section)

<< Clipping · Shading >>
Computer Graphics from scratch · Introduction · Table of contents · Common concepts
Part I: Raytracing · Basic ray tracing · Light · Shadows · Reflection · Arbitrary camera · Beyond the basics · Raytracer pseudocode
Part II: Rasterization · Lines · Filled triangles · Shaded triangles · Perspective projection · Scene setup · Clipping · Hidden surface removal · Shading · Textures
Found an error? Everything is in Github.


  1. For a definition of this operation, see the Linear Algebra appendix.