Chapter XVI. Video Introduction
While creating the CPU was (relatively) simple conceptually but tedious to implement, the Pixel Processing Unit (PPU) is the opposite — not too tricky to program once you understand it, but contains more fine detail to grasp and sift through. We'll begin with a high-level introduction. The PPU is a component of the Game Boy that focuses on its own sections of memory. This includes not only the color data for each pixel, but also information about larger building blocks such as the tiles used in the backgrounds or information about sprites.
Pixels
For the CPU, the byte is the foundational building block. All of the mathematical functions of the processor is broken down into managing and processing bytes. For the video display, its fundamental unit is the pixel. The goal of the screen and its accompanying circuitry is to construct a series of images of 160 pixels wide by 144 pixels tall. On an original Game Boy, these pixels can be one of four colors — white, light gray, dark gray, or black. The pixels in a typical modern display are capable of millions of possible colors each. To encode each possible color, you'll often see each shade stored in terms of its red, green, and blue components. The more possible pixel colors you wish to express, the larger the possible RGB values need to be. It's common today to store colors as 24-bit values, giving one byte to each of the RGB components. For our Game Boy's screen though, where each pixel can only display a whopping four options, requiring 24 bits for every pixel is not only wasteful, but also doesn't really make sense. Instead, we can use an "indexing" system, where we simply number all of the possible color combinations, then assign each pixel an index. For the Game Boy, the typical enumeration is that white is 0, light gray 1, dark gray 2, and black is 3. With four total combinations, this can be stored as two bits per pixel (bpp), rather than the 24 bits needed for full color.
At a glance, this method seems like enough to store our video data. VRAM was defined in the address space from 0x8000 to 0x9FFF, that's 8 KiB of memory. 2 bbp for a 160x144 screen would require 5760 bytes. Plenty left over! However, keeping a complete video buffer like this would be rather inefficient for several reasons. Firstly, many games have a lot of repeated sections displayed on screen. It would be ideal if there was some mechanism to reuse the same graphic many times, almost like copy and pasting. Secondly, anytime we need to update the screen, which we'll be doing 60 times per second, we'd either have to recreate the whole thing from scratch or have some convoluted method of keeping track of what has and hasn't changed. While modern hardware is capabile of these strategies, the Game Boy was not. Fortunately, the designers of the system had a clever solution.
1 If you've used an original Game Boy, you'll know that the display was actually a very green tint. I find this effect a bit nauseating, so I'll refer to the display in traditional shades of gray.
Tiles
In order to decrease RAM usage and increase asset reuse, the Game Boy uses "tiles", a small image of 8 pixels by 8 pixels. At 2 bbp, each row of a tile would require two bytes, with 16 bytes required for the whole tile. You've probably seen pixel artists use this exact size when creating 8-bit inspired artwork, but a tile's true usefulness is implied by its name — they tile together. The Game Boy again utilizes the indexing technique when its constructing the screen, using a system referred to as a "tile map". 160x144 is 20 tiles wide by 18 tiles tall, so rather than maintain 23,904 pixels worth of color information, we can instead maintain "map" these 360 tiles worth of on-screen information to which tile should be displayed at that position. By keeping indices of tiles, we can also reuse them easily. For example, it's very common to play Game Boy games where the background is mostly white. Rather than storing all that white pixel data, the developer can simply define a single tile of all white pixels, then reuse that same tile in as many places as they need, with no additional overhead.
TODO: Graphic showing Background layer spit into tiles with indices
There are some limitations to tiles. First, the tiles must precisely line up side-by-side and can't overlap. This is useful for displaying a background, but prevents us from moving a character freely. The Game Boy also has a finite limit to how many tiles can be defined at once, which results in many Game Boy games obviously reusing assets in the background. This isn't a good or bad thing, but a style imposed upon the developer by the hardware. The tile map also does not support transparency. This might sound like it isn't needed, since the tiles aren't allowed to overlap each other anyway, but it can cause some odd effects. As a child playing Pokemon Red, I noticed that the ground behind the slanted roofs of many buildings didn't quite match the surrounding ground texture. The developers only made a single slanted roof tile, which as part of its 8x8 space would also show some of the ground behind it. If they used a different ground texture adjacent to it, there would be a stark contrast.
TODO: Screenshot of Pokemon roof tiles
Layers
If Game Boy developers were limited to only using non-overlapping tiles, it would greatly limit the sorts of games they could create. Mario needs to be able to run to any position he wants. Fortunately, there are mechanisms to support this. The tile map is just one of three "layers" that the Game Boy uses to construct a frame image, much like how modern image editing software like Photoshop supports layers. The tiles we've described are usually the bottom-most layer, called the "Background Layer". The background layer actually extends off-screen and is larger the 20x18 tiles we mentioned before, a mechanism we'll explore in detail later.
Above it is the "Window Layer", which uses the same tiles as the background layer but has some limitations on how it can be displayed. It can almost be thought of a second background layer that is rendered on top of the first, and it is often used to create UI boxes or menus, among other effects. Above that is the "Sprite Layer". This is the layer that allows for free range of movement of its "sprites" which are either 8x8 or 8x16 pixel images. There is a special section in memory known as the "Object Attribute Memory" (OAM, sometimes called the "Sprite Attribute Table"), which keeps track of the metadata for each of the different sprites. This includes the sprite's position on screen, which graphics correspond to them, and if they're flipped vertically or horizontally. Just like the background tiles, sprites are also indexed to use one of the pre-defined pixel images, but they can be placed anywhere on the screen and support transparency. There are limitations to how many sprites can be displayed, both in total due to memory size constraints but also a hard limit to how many can be shown on each horizontal line of the display.
TODO: VRAM memory map here
OAM and I/O Registers
Speaking of the OAM... where is it? All of VRAM is broken up into only two sections, the tile data and tile map, so where is the sprite data stored? It turns out that there are other blocks of RAM utilized by the PPU aside from just VRAM. The first is the OAM from 0xFE00-0xFE9F. This is 160 bytes of space, and it is broken up into four bytes of data for 40 different sprites. We'll discuss its structure in more detail later, but it includes a byte each for the sprite's X and Y position, a byte for the tile index, and a byte for miscellaneous flags.
Besides the OAM, the PPU also relies on several addresses in the I/O space. While this range of memory includes information about the currently pressed buttons and audio output, it also has some of the "control registers" for the PPU. The use of the term "register" is a little confusing, given how closely we've associated it with the CPU, but here the term is just referring to an address in RAM which controls PPU behavior. There are several bytes here used for configuration information, such as whether the screen is even rendering, the positions of the different layers, the palette, etc.
There's a lot of fine detail in creating a single frame of a Game Boy game, but this should be enough to get us started. The VRAM stores the tile pixel data as well as how they are arranged on the different layers. Elsewhere is the OAM, which contains additional data for the sprites. When it's time to render a frame, the PPU combines the three layers one on top of the other, in addition to information from the control registers, and sends that to the screen to be displayed to the player. This process repeats 60 times per second continuously while the game is running.