Video games are a rapidly evolving form of entertainment that leverages huge technological advancements to create new ways of interactions and to tell deeper and deeper stories. It’s no wonder that graphics have been at the forefront of these advancements; pushing to blur the line between film and video games. One of the most important aspects of film, however, is audio. With every advancement in film projection, there was a proportional change in audio resolution or spatialization. This, unfortunately, has not been the case with games. While games have seen multiple jumps in visual fidelity per generation, audio sees a slow linear change. Let's take a stroll down a brief history of visual and audio advancements in games to prove the point.
Note: We will not cover anything before the 8-bit generation which includes monochrome displays, oscilloscope graphics displays, and arcade cabinets.
3rd Generation (NES)
Graphic Advancements
The NES supported 64 8x8 or 8x16 sprites
The NES could display ~48-52 colors on-screen form specified color palettes
256x240 interlaced output
Audio Advancements
Mono output
5 simultaneous voices
4 sound generators (waveform and sound generators)
1 low-quality sample player
4th Generation (SNES and Sega Genesis)
Graphic Advancements
Resolution up to 512x448 interlaced (SNES)
128 simultaneous sprites at 8x8 or multiples therein (SNES)
15-bit color palette capable of displaying 256 simultaneous colors (SNES)
Parallax scrolling, and the Mode7 chip provided additional visual effects (blending, pixelization, etc…)
The SNES had the Super FX chip which allowed for 100s of vector polygons to be rendered.
Audio Advancements
Stereo audio
8 simultaneous voices
Rudimentary DSP for effects like echo, and panning
5th Generation (Playstation, Sega Saturn, Nintendo 64)
Graphic Advancements
750x756 interlaced output resolution (Sega Saturn, N64)
16M , 24-bit color palette
207,000 simultaneous colors displayed on screen (N64)
150,000 polygons/sec (Sega Saturn, N64)
600,000 flat textures/sec
Textures, shading, bitmap
Anti-aliasing, Z-buffering (N64)
Mipmapping, texture filtering for sprites (N64)
16k simultaneous sprites (Sega Saturn)
Audio Advancements
16-bit audio, 44.1 kHz PCM audio
Stereo output
32 sound channels (Sega Saturn)
Internal DSP with pitch modulation, digital reverb, and ADSR
6th Generation (PS2, Xbox, Gamecube)
Graphic Advancements
480-1080 interlaced output
32-bit color palette
75M polygon fill rate (PS2)
932 megapixel/sec texture fillrate (Xbox)
FSAA, bump mapping, anisotropic filtering, alpha blending, diffuse, specular, particle, physics simulations
Audio Advancements
64-bit PCM audio, 48 kHz
256 simultaneous voices
Stereo output
DSP including Dolby Prologic, Dolby Digital 5.1 and DTS 5.1
Spatial sound frameworks, interactive music frameworks
7th Generation (PS3, Xbox360, Wii)
Graphic Advancements
720-1080 progressive output
128-bit color
500M polygon fill rate (360)
4.4 gigapixel/sec texture fill rate (PS3)
Normal mapping, dynamic tessellation, animation blending, subsurface scattering, ambient occlusion, soft body dynamics, crowd simulations, volumetric lighting, volumetric fog, resolution upscaling
Audio Advancements
LPCM audio up to 192 kHz
Up to 7.1 channel audio
DSP including Dolby TrueHD and DTS-HD
Spatial sound frameworks with dynamic EQ
Adaptive audio systems, and interactive audio
8th Generation (PS4 and Xbox One)
Graphic Advancements
Up to 4k 60fps (OneX, PS4 Pro)
1.7B polygon fill rate (XboxOne)
Up to 187 gigapixel/sec texture fill rate (OneX)
HDR output (HDR10, Dolby Vision)
Global illumination, physical-based rendering, SSR ambient occlusion and reflections, AI assisted upscaling
Audio Advancements
DSP including Dolby Atmos, DTS:X, and Dolby
Advancements in spatial modeling
As you can tell, graphics continually get huge, multiple advancements in graphics every generation, bringing it closer and closer to cinema quality. Now let's take a look at audio. The huge shifts in audio happen in-line with dramatic shifts in technology. The CD generation brought in huge fidelity improvements. The DVD generation brought huge changes to audio middleware. The latest generation is finally allocating computational power to DSP that can simulate 3D with HRTFs or using ray tracing to simulate more accurate sound environments. Even with those once-a-generation shifts, game audio still lacks that immediacy and emphasis of music scoring found in movies. It’s particularly important because we’re bumping up to a point of diminishing returns with graphic advancements. We’ll need to start looking at new ways to improve the game experience, and the obvious area to start with is audio.
This generation will see huge improvement to spatial rendering of sounds with HRTFs, and ambisonics. With these improvements we hope to see more advancements towards game scores beginning to have the immediacy of movie scores. This is usually achieved by events in the game triggering music changes. There is an inevitable lag with this method because the game has to trigger an event, the event is sent to the audio engine, and the audio engine cues up a stinger to play at the next available beat point. To really change this, we have to think about music as a method to inform games of when game events can trigger. Imagine a rousing musical score giving beat information to the game so selective punches can be timed to points that make sense. Imagine those events triggering changes in the audio. This is the key to truly interactive music, and it starts with music.
This generation can truly be the generation of game audio. All of the pieces are there. There is strong middleware support. There is finally hardware resources allocated to address it. There is a strong need within the industry to find ways to advance the craft. If developers take on the challenge of putting audio on par with graphics, we might finally meet the ideal of cinema, and even exceed it.