Common concepts

The Canvas

Throughout this work, we’ll be drawing things on a canvas. The canvas is a rectangular array of pixels which can be colored individually. Will this be shown on a screen, printed on paper, or used as a texture in a subsequent rendering? That’s irrelevant to our purposes; we’ll focus on rendering stuff to this abstract, rectangular array of pixels.

We will build everything in this book out of a single primitive: paint a pixel in the canvas with a given color:

canvas.PutPixel(x, y, color)

Next we’ll examine the parameters to that method - coordinates and colors.

Coordinate systems

The canvas has a specific width and height in pixels, which we’ll call \(C_w\) and \(C_h\). It’s possible to use any coordinate system to address its pixels. In most computer screens, the origin is at the top left, \(x\) increases to the right, and \(y\) increases down the screen:

This is a very natural coordinate system given the way video memory is organized, but it’s not the most natural for us humans to work with. Instead, we’ll use the coordinate system typically used to draw graphs on paper: the origin is at the center, \(x\) increases to the right, and \(y\) increases up:

Using this coordinate system, the range of the \(x\) coordinate is \([{-C_w \over 2}, {C_w \over 2}]\) and the range of the \(y\) coordinate is \([{-C_h \over 2}, {C_h \over 2}]\)1. For simplicity, attempting to operate on pixels outside the valid ranges will just do nothing.

In the examples, the canvas will be drawn on the screen, so it’s necessary to convert from one coordinate system to the other. Assuming the screen is the same size of the canvas, the conversion equations are simply

\[ S_x = {C_w \over 2} + C_x \]

\[ S_y = {C_h \over 2} - C_y \]

Color models

The whole theory of how color works is fascinating, but unfortunately outside the scope of this text. The following is a simplified version of the aspects that will be relevant to us.

A color is what we call the way our brain interprets photons hitting the eyes. These photons carry energy in different frequencies; our eyes maps these frequencies to colors. The lowest energy we can perceive is around 450 THz; we see this as “red”. On the other end of the scale is 750 THz, which we see as “purple”. Between these two frequencies we see the continuous spectrum of colors (for example, green is around 575 THz).

We can’t normally see frequencies outside of these ranges. Higher frequencies carry more energy; this is why infrared (frequencies lower than 450 THz) is harmless, but ultraviolet (frequencies higher than 750 THz) can burn your skin.

Every color imaginable can be described as different combinations of these colors (in particular, “white” is the sum of all colors, and “black” is the absence of all colors). It would be impractical to describe colors by describing the exact frequencies they’re made of. Fortunately, it’s possible to create almost all colors as a linear combination of just three colors which we call “primary colors”.

Subtractive color model

Subtractive Color Model is a fancy name for that thing you did with crayons as a toddler. Take a white piece of paper and red, blue and yellow crayons. You draw a yellow circle, then a blue circle that overlaps it, and lo and behold, you get green! Yellow and red - orange! Red and blue - purple! Mix the three together - something darkish! Wasn’t childhood amazing?

Things are of different colors because they absorb and reflect light in different ways. Let’s start with white light, like sunlight2. White light contains every light frequency. When light hits some object, depending on what the object is made of, its surface absorbs some of the frequencies and reflects others. Some of the reflected light hits our eyes, and our brains convert that to color. What color? The sum of the frequencies that were reflected3.

So what happens with the crayons? You start with white light reflecting off the paper. Since it’s white paper, it means it reflects most of the light it gets. When you draw with a “yellow” crayon, you’re adding a layer of a material that absorbs some frequencies but lets others pass through it. They’re reflected by the paper, pass through the yellow layer again, hit your eyes, and your brain interprets that particular combination of frequencies as “yellow”. So what the yellow layer does is subtract a bunch of frequencies from the original white light.

When you then draw a blue circle over the yellow one, you’re subtracting even more frequencies to what was left after the yellow circle subtracted its own, so what hits your eyes is whatever frequencies weren’t filtered by either the blue or yellow circles - which your brain sees as “green”.

In summary, we start with all frequencies, and subtract some amount of the primary colors, to create any other color. Because we’re subtracting frequencies, this is called the Subtractive Color Model.

This model is not quite right, though. The actual primary colors in the subtractive model aren’t Blue, Red and Yellow as taught to toddlers and art students, but Cyan, Magenta and Yellow. Furthermore, mixing the three primaries produces a somewhat darkish color which isn’t quite black, so pure black is added as a fourth “primary”. Because the B is taken by blue, black is denoted by K - and so we arrive at the CMYK color model, used for example by printers.

Additive color model

But that’s only half of the story. If you ever watched a TV or monitor from a close distance or with a magnifying glass (or, let’s be honest, accidentally sneezed at it), you probably have seen tiny colored dots - one red, one green, and one blue for each pixel.

Monitor screens are the opposite of paper. Paper doesn’t emit light; it merely reflects part of the light that hits it. Screens, on the other hand, are black, but they do emit light on their own. With paper we start with white light and subtract the frequencies we don’t want; with a screen we start with no light, and add the frequencies we want.

It turns out different primary colors are necessary for this. Most colors can be created by adding different amounts of red, green and blue to a black surface; this is the RGB color model, an Additive Color Model:

Forget the details

Now that you know all this, you can selectively forget most of the details, and focus on what’s important for our work.

Most colors can be represented in either RGB or CYMK (or indeed in any of the many other color models) and it’s possible to convert from one color space to another. Since we’ll be focusing on rendering things to a screen, the rest of this work will use the RGB color model.

As described above, objects absorb part of the light reaching them, and reflect the rest. Which frequencies are absorbed and which are reflected is what we perceive as the “color” of the surface. From now on, we’ll simply treat the color as a property of a surface, and forget about absorbed light frequencies.

Color depth and representation

As explained in the previous section, monitors can create colors out of different amounts of Red, Green and Blue. They do this by lighting the tiny Red, Green and Blue dots in their surface at different intensities.

How many different intensities? Although voltage is a continuous variable, we’ll be manipulating colors with a computer, which uses discrete values. The more different shades of red, green and blue we can represent, the more total colors we’ll be able to produce.

The most common format nowadays uses 8 bits per primary color (also called color channel). 8 bits per channel give us 24 bits per pixel, for a total of \(2^{24}\) different colors (approximately 16.7 million). This format, known as 888, is what we’ll be using throughout this work. We say this format has a color depth of 24 bits.

This is by no means the only possible format, though. Not so long ago, in order to save memory, 15- and 16-bit formats were popular, assigning 5 bits per channel in the 15-bit case, and 5 bits for red, 6 for green, and 5 for blue in the 16-bit case (known as a 565 format). Why does green get the extra bit? Because our eyes are more sensitive to changes in green than to changes in red or blue.

16 bits give you \(2^{16}\) colors (approximately 65,000). This means you get one color for every 256 colors in a 24-bit mode. Although 65,000 colors is plenty, for images where colors change very gradually you would be able to see very subtle “steps” which just aren’t visible with 16.7 million colors, because there are enough bits to represent the colors in-between. 16.7 million colors is also more colors than the eye can distinguish, so we’ll probably continue using 24-bit colors for the foreseeable future4.

We’ll use three bytes to represent a color, each holding the value of an 8-bit color channel. In the text, we’ll express the colors as \((R, G, B)\) - for example, \((255, 0, 0)\) is pure red; \((255, 255, 255)\) is white; and \((255, 0, 128)\) is a reddish purple.

Color manipulation

We’ll use a handful of operations to manipulate colors5.

We can alter the intensity of a color, by multiplying each color channel by a constant:

\[ k(R, G, B) = (kR, kG, kB) \]

We can add two colors together, by adding the color channels separately:

\[ (R_1, G_1, B_1) + (R_2, G_2, B_2) = (R_1 + R_2, G_1 + G_2, B_1 + B_2) \]

For example, if we have a reddish purple \((252, 0, 66)\) and want to have a color with the exact same hue but only one third as bright, we multiply channel-wise by \(1 \over 3\) and get \((84, 0, 22)\). If we want to combine red \((255, 0, 0)\) and green \((0, 255, 0)\) we add channel-wise and get \((255, 255, 0)\), which is yellow.

The astute reader will notice that these operations can yield invalid values; for example, doubling the intensity of \((192, 64, 32)\) produces an \(R\) value outside our color range. We’ll treat any value over 255 as 255, and any value below 0 as 0. This is more or less equivalent to what happens when you take an under- or over-exposed picture - you get either completely black or completely white areas.

The Scene

The canvas is the abstraction where you will render. Render what? Another abstraction: the scene.

The Scene is the set of objects you may be interested in rendering. It could be anything, from a single sphere floating in the empty infinity of space (we’ll start there) to an incredibly detailed model of the inside of an ogre’s nose.

We need a coordinate system to talk about objects within the scene. The choice is arbitrary, so we’ll pick something useful for our purposes. \(Y\) is up. \(X\) and \(Z\) are horizontal. So the plane \(XZ\) is the “floor”, while \(XY\) and \(YZ\) are vertical “walls”.

Since we’re talking about “physical” objects here, we need to agree on the units. This is, again, arbitrary, and heavily dependent on what the scene represents. “1” could be 1 millimeter if you’re modeling a teacup, or it could be 1 astronomical unit if you’re modeling the Solar System. Fortunately, none of what follows depends on the units, so we’ll just ignore them. As long as you’re consistent (e.g. “1” always means the same thing throughout the scene) everything will work fine.

<< Table of contents · Part I: Raytracing >>
Computer Graphics from scratch · Introduction · Table of contents · Common concepts
Part I: Raytracing · Basic ray tracing · Light · Shadows · Reflection · Arbitrary camera · Beyond the basics · Raytracer pseudocode
Part II: Rasterization · Lines · Filled triangles · Shaded triangles · Perspective projection · Scene setup · Clipping · Hidden surface removal · Shading · Textures
Found an error? Everything is in Github.

  1. Strictly speaking, either \({-C_h \over 2}\) or \({C_h \over 2}\) is outside the range, but we’ll just ignore this.

  2. Sunlight isn’t quite white, but it’s close enough for our purposes.

  3. Because of thermodynamics, the rest of the energy isn’t lost; it’s mostly turned to heat. That’s why black things get hotter than white ones - they absorb most of the frequencies!

  4. This applies only to showing images; storing images with a wider range is an entirely different matter, which will be dealt with in the Lighting chapter.

  5. If you know Linear Algebra, think of colors as vectors in 3D color space. Here I present only the operations we’ll be using for readers not familiar with Linear Algebra.