Basic ray tracing

No Starch Press is publishing a book based on this website in early 2021. Stay tuned!

Part of the charm of Computer Graphics is drawing things on the screen (or maybe that’s all its charm). To achieve this as soon as possible, we’ll make some simplifying assumptions at first, so we can get something on the screen now. Of course, these assumptions impose some restrictions over what we can do, but we’ll lift the restrictions in the successive chapters.

First of all, we will assume a fixed viewing position. The viewing position, the place where you’d put the eye in the analogy we were using before, is commonly called the camera position; let’s call it \(O = (O_x, O_y, O_z)\). We will assume the camera is located at the origin of the coordinate system, so \(O = (0, 0, 0)\) for now.

Second, we will assume a fixed camera orientation, that is, where the camera is pointed at. We will assume it looks down the positive Z axis, with the positive Y axis up, and the positive X axis to the right:

The camera position and orientation are now fixed. Still missing from the analogy is the “frame” through which the scene was viewed. We’ll assume this frame has dimensions \(V_w\) and \(V_h\), it’s frontal to the camera orientation (that is, it’s perpendicular to \(\vec{Z_+}\)) at a distance \(d\), its sides are parallel to the X and Y axes, and it’s centered respect to \(\vec{Z_+}\). That’s a mouthful, but it’s actually quite simple:

This rectangle that will act as our window to the world is called the viewport. Essentially, we’ll draw on the canvas whatever we see through the viewport. Note that the size of the viewport and the distance to the camera determine the angle visible from the camera, called the field of view or FOV for short. Humans have an almost \(180^\circ\) horizontal FOV, although much of it is blurry peripheral vision with no sense of depth. For simplicity, we’ll set \(V_w = V_h = d = 1\); this results in a FOV of approximately \(53^\circ\), which produces reasonable-looking images (i.e. not overly distorted).

Let’s go back to the “algorithm” presented in the previous section, and number the steps:

Place the eye and the frame as desired (1)
For each pixel on the screen
    Determine the square on the grid corresponding to this pixel (2)
    Determine the color seen through that square (3)
    Paint the pixel with that color (4)

We have just done Step 1 (or, more precisely, gotten it out of the way for now). Step 4 is trivial (canvas.PutPixel(x, y, color) ). Let’s do Step 2 quickly, and then focus most of our attention in increasingly sophisticated ways of doing Step 3.

Canvas to Viewport

Step 2 asks us to “Determine the square on the grid corresponding to this pixel”. We know the canvas coordinates of the pixel (we’re painting all of them) - let’s call them \(C_x\) and \(C_y\). Notice how we conveniently placed the viewport so that its axes match the orientation of those of the canvas, and its center matches the center of the viewport. So going from canvas coordinates to space coordinates is just a change of scale!

\[ V_x = C_x {V_w \over C_w} \]

\[ V_y = C_y {V_h \over C_h} \]

There’s an extra detail. Although the viewport is 2-dimensional, it is embedded in 3-dimensional space. We defined it to be at a distance \(d\) of the camera; every point in this plane (called the projection plane) has, by definition, \(z = d\). Therefore,

\[ V_z = d \]

And we’re done with Step 2. For each pixel \((C_x, C_y)\) of the canvas we can determine the corresponding point of the viewport \((V_x, V_y, V_z)\). Now what Step 3 is asking to do is to figure out what color is the light coming through \((V_x, V_y, V_z)\) as seen from the camera’s point of view \((O_x, O_y, O_z)\).

Tracing rays

So what color is the light reaching \((O_x, O_y, O_z)\) after passing through \((V_x, V_y, V_z)\)?

In the real world, light comes from a light source (the sun, a light bulb, etc), bounces off several objects, and it finally reaches our eyes. We could try simulating the path of every photon leaving our simulated light sources, but it would be extremely time consuming1. Not only we’d have to simulate millions and millions of photons, only a tiny minority of them would happen to reach \((O_x, O_y, O_z)\) after coming through the viewport.

Instead, we’ll trace the rays “in reverse”; we’ll start with a ray originating from the camera, going through a point in the viewport, and following it until it hits some object in the scene. This object is what is “seen” from the camera through that point of the viewport. So, as a first approximation, we’ll just take the color of that object as “the color of the light coming through that point”.

Now we just need some equations.

The ray equation

The best way to represent rays for our purpose is using a parametric equation. We know the ray passes through \(O\), and we know its direction (from \(O\) to \(V\)), so we can express any point P in the ray as

\[ P = O + t(V - O) \]

where t is an arbitrary real number.

Let’s call \((V - O)\), the direction of the ray, \(\vec{D}\); then the equation becomes simply

\[ P = O + t\vec{D} \]

Read the Linear Algebra appendix for details; intuitively, if we start at the origin and advance some multiple of the direction of the ray, we are always moving along the ray:

The sphere equation

Now we need some sort of objects in the scene, so that our rays can hit something. We could choose any arbitrary geometric primitive as the building block of our scenes; for raytracing, the easiest primitive to manipulate mathematically is a sphere.

What is a sphere? A sphere is the set of points that lie at a fixed distance (called the radius of the sphere) from a fixed point (called the center of the sphere):

Note that with this definition, our spheres are hollow.

If \(C\) is the center and \(r\) is the radius of the sphere, the points \(P\) on the surface of the sphere satisfy the following equation:

\[ distance(P, C) = r \]

Let’s play a bit with this equation. The distance between \(P\) and \(C\) is the length of the vector from \(P\) to \(C\):

\[ |P - C| = r \]

The length of a vector is the square root of its dot product with itself:

\[ \sqrt{\langle P - C, P - C \rangle} = r \]

And to get rid of the square root,

\[ \langle P - C, P - C \rangle = r^2 \]

Ray meets sphere

We now have two equations, one describing the points on the sphere, and one describing the points on the ray:

\[ \langle P - C, P - C \rangle = r^2 \]

\[ P = O + t\vec{D} \]

The point \(P\) where the ray hits the sphere is both a point in the ray and a point in the surface of the sphere, so it must satisfy both equations at the same time. Note that the only variable in these equations is the parameter \(t\), since \(O\), \(\vec{D}\), \(C\) and \(r\) are given, and \(P\) is the point we’re trying to find.

Since \(P\) is the same point in both equations, we can substitute \(P\) in the first with the expression for \(P\) in the second one. This gives us

\[ \langle O + t\vec{D} - C, O + t\vec{D} - C \rangle = r^2 \]

What values of \(t\) satisfy this equation?

In its current form, the equation is somewhat unwieldy. Let’s do some algebraic manipulation to see what can we get out of it.

First of all, let \(\vec{OC} = O - C\). Then the equation can be written as

\[ \langle \vec{OC} + t\vec{D}, \vec{OC} + t\vec{D} \rangle = r^2 \]

Then we expand the dot product into its components, using its distributive properties:

\[ \langle \vec{OC} + t\vec{D}, \vec{OC} \rangle + \langle \vec{OC} + t\vec{D}, t\vec{D} \rangle = r^2 \]

\[ \langle \vec{OC}, \vec{OC} \rangle + \langle t\vec{D}, \vec{OC} \rangle + \langle \vec{OC}, t\vec{D} \rangle + \langle t\vec{D}, t\vec{D} \rangle = r^2 \]

Rearranging it a bit, we get

\[ \langle t\vec{D}, t\vec{D} \rangle + 2\langle \vec{OC}, t\vec{D} \rangle + \langle \vec{OC}, \vec{OC} \rangle = r^2 \]

Moving the parameter \(t\) out of the dot products and moving \(r^2\) to the other side of the equation gives us

\[ t^2 \langle \vec{D}, \vec{D} \rangle + t(2\langle \vec{OC}, \vec{D} \rangle) + \langle \vec{OC}, \vec{OC} \rangle - r^2 = 0 \]

Doesn’t it look less unwieldy now? Note that the dot product of two vectors is a real number, so every parenthesized term is a real number. If we give them names, we’ll get something much more familiar:

\[ k_1 = \langle \vec{D}, \vec{D} \rangle \]

\[ k_2 = 2\langle \vec{OC}, \vec{D} \rangle \]

\[ k_3 = \langle \vec{OC}, \vec{OC} \rangle - r^2 \]

\[ k_1t^2 + k_2t + k_3 = 0 \]

This is nothing more and nothing less than a good old quadratic equation. Its solution yields the values of the parameter \(t\) where the ray intersects the sphere:

\[ \{ t_1, t_2 \} = {{-k_2 \pm \sqrt{ {k_2}^2 -4k_1k_3} \over {2k_1}}} \]

Fortunately, this makes geometrical sense. As you may recall, a quadratic equation can have no solutions, one double solution, or two different solutions, depending on the value of the discriminant \({k_2}^2 -4k_1k_3\). This corresponds exactly to with the cases where the ray doesn’t intersect the sphere, the ray is tangent to the sphere, and the ray enters and exits the sphere, respectively:

If we take a value of \(t\) and plug it back into the ray equation, we finally get the intersection point \(P\) corresponding to that value of \(t\).

Rendering our first spheres

To recap, for each pixel of the canvas, we can compute the corresponding point in the viewport. Given the position of the camera, we can express the equation of a ray that starts at the camera and goes through that point of the viewport. Given a sphere, we can compute where the ray intersects that sphere.

So all we need to do is to compute the intersections of the ray and each sphere, keep the closest one to the camera, and paint the pixel in the canvas with the appropriate color. We’re almost ready to render our first spheres!

The parameter \(t\) deserves some extra attention, though. Let’s go back to the ray equation:

\[ P = O + t(V - O) \]

Since the origin and direction of the ray are fixed, varying \(t\) across all the real numbers will yield every point \(P\) in this ray. Note that for \(t = 0\) we get \(P = O\), and for \(t = 1\) we get \(P = V\). Negative numbers yield points in the opposite direction, that is, behind the camera. So we can divide the parameter space in three parts:

\(t < 0\) Behind the camera
\(0 \le t \le 1\) Between the camera and the projection plane
\(t > 1\) The scene

Here’s a diagram of the parameter space:

Note that nothing in the intersection equation says that the sphere has to be in front of the camera; the equation will happily produce solutions for intersections behind the camera. Obviously, this isn’t what we want; so we should ignore any solutions with \(t < 0\). To avoid further mathematical unpleasantness, we’ll restrict the solutions to \(t > 1\), that is, we’ll render whatever is beyond the projection plane.

On the other hand, we don’t want to put an upper limit to the value of \(t\); we want to see objects in front of the camera, no matter how far away they are. Because in later stages we will want to cut rays short, however, we’ll introduce this formalism now, and give \(t\) an upper value of \(+\infty\)2.

We can now formalize everything we’ve done so far with some pseudocode. As a general rule, I’ll assume the code has access to whatever data it needs, so I won’t bother explicitly passing parameters except the really necessary ones.

The main method now looks like this:

O = <0, 0, 0>
for x in [-Cw/2, Cw/2] {
    for y in [-Ch/2, Ch/2] {
        D = CanvasToViewport(x, y)
        color = TraceRay(O, D, 1, inf)
        canvas.PutPixel(x, y, color)

The CanvasToViewport function is very simple:

CanvasToViewport(x, y) {
    return (x*Vw/Cw, y*Vh/Ch, d)

In this snippet, d is the distance to the projection plane.

The TraceRay method computes the intersection of the ray with every sphere, and returns the color of the sphere at the nearest intersection which is inside the requested range of \(t\):

TraceRay(O, D, t_min, t_max) {
    closest_t = inf
    closest_sphere = NULL
    for sphere in scene.Spheres {
        t1, t2 = IntersectRaySphere(O, D, sphere)
        if t1 in [t_min, t_max] and t1 < closest_t
            closest_t = t1
            closest_sphere = sphere
        if t2 in [t_min, t_max] and t2 < closest_t
            closest_t = t2
            closest_sphere = sphere
    if closest_sphere == NULL
        return BACKGROUND_COLOR
    return closest_sphere.color

In this snippet, O is the Origin of the ray; although we’re shooting rays from the camera, which is placed at the origin, this won’t necessarily be the case in later stages, so it has to be a parameter. Same with t_min and t_max.

Note that when the ray doesn’t intersect any sphere, we still need to return some color - I’ve chosen white in most of these examples.

And finally, IntersectRaySphere just solves the quadratic equation:

IntersectRaySphere(O, D, sphere) {
    C =
    r = sphere.radius
    oc = O - C

    k1 = dot(D, D)
    k2 = 2*dot(OC, D)
    k3 = dot(OC, OC) - r*r

    discriminant = k2*k2 - 4*k1*k3
    if discriminant < 0:
        return inf, inf

    t1 = (-k2 + sqrt(discriminant)) / (2*k1)
    t2 = (-k2 - sqrt(discriminant)) / (2*k1)
    return t1, t2

Let’s define a very simple scene:

In pseudo-scene language, it’s something like this:

viewport_size = 1 x 1
projection_plane_d = 1
sphere {
    center = (0, -1, 3)
    radius = 1
    color = (255, 0, 0)  # Red
sphere {
    center = (2, 0, 4)
    radius = 1
    color = (0, 0, 255)  # Blue
sphere {
    center = (-2, 0, 4)
    radius = 1
    color = (0, 255, 0)  # Green

When we run our algorithm on this scene, we’re finally rewarded with an incredibly awesome ray traced scene:

Source code and live demo >>

I know, it’s a bit of a letdown, isn’t it? Where are the reflections and the shadows and the polished look? We’ll get there. This is a good start. The spheres look like circles, which is better than if they looked like cats. The reason they don’t quite look like spheres is that we’re missing a key component of how human beings determine the shape of an object - the way it interacts with light.

<< Part I: Raytracing · Light >>
Computer Graphics from scratch · Introduction · Table of contents · Common concepts
Part I: Raytracing · Basic ray tracing · Light · Shadows · Reflection · Arbitrary camera · Beyond the basics · Raytracer pseudocode
Part II: Rasterization · Lines · Filled triangles · Shaded triangles · Perspective projection · Scene setup · Clipping · Hidden surface removal · Shading · Textures
Found an error? Everything is in Github.

  1. And the results would be stunning. This technique is called Photon Tracing or Photon Mapping; unfortunately, it’s outside the scope of this work.

  2. For languages that can’t represent infinitely directly, a really really big number does the trick.