Secret Encoder Ring
So you’ve bitten the proverbial bullet and installed a mixer or workstation software application with surround panners, set up a 5.1 monitoring system, and actually recorded some surround music mixes on a Tascam DA-88 or ADAT. (See “Mixing in the Round” in the May 2001 EM and “5.1 Mixing on a Budget” in the June 2001 issue for more about mixing techniques and inexpensive monitoring solutions, respectively.) Now, how do you get anyone else to hear your surround opus? You can’t play a DA-88 tape on a consumer system, and dragging a multitrack deck around isn’t practical.
You need to encode your six discrete tracks into a format that will play on a consumer home-theater system. The process is not really that complicated, and it’s thrilling to hear something you mixed in surround play back properly on a home system. This article describes the final part of the surround-production process.
CODEC THE HALLS
To easily distribute 5.1-channel recordings, you must encode them into a format that can be stored on common media, such as CDs and DVDs, and transmitted over common digital-audio interfaces, such as S/PDIF. In general, that requires the data to be compressed (reduced in size). Two types of 5.1 encoding are often used: Dolby Digital and DTS from Digital Theater Systems. DTS offers 5.1 and 6.1 formats, whereas Dolby Digital can be implemented in any format from mono to 5.1.
Both systems encode the six channels of a discrete surround mix into a single data file that can be transmitted through a standard AES/EBU or S/PDIF interface. Then, on the playback end, the file is decoded back into its original six channels before being sent to the speakers. (The entire encode-decode cycle is often referred to as a codec, which is a contraction of enCOde and DECode.) Think of encoded data as being like instant mashed potatoes. After the water is removed (encoded), the powder is compact to store and transport. Then, the end user just adds water and heats it to a boil (decoding) and voilà … instant mashed potatoes.
You don’t have to cook a DVD or CD in boiling water (kids, don’t try this at home), but all home-theater receivers do an analogous rehydration process. If you play a DVD with a Dolby Digital or DTS soundtrack, the receiver decodes the digital bitstream from the player’s S/PDIF output as the disc plays back.
DVDs have much more storage space (4.7 GB compared with a CD’s 640 MB), so they have lots of room for audio. Unfortunately, most of the data space on DVD-Video discs is hogged by the video portion, so the official DVD-Video specification mandates a Dolby Digital audio program at a transfer rate of 56 to 448 Kbps in order to fit as much audio as possible within the limited storage space. DTS files can use transfer rates of 1.5 Mbps or 768 Kbps from a DVD.
DTS and Dolby Digital files can also be burned onto CDs; in fact, DTS-encoded CDs use the same 1.4 Mbps transfer rate as stereo 16-bit, 44.1 kHz CDs, and they can be duplicated by any pressing house that makes standard stereo CDs. In addition, the file structure of a DTS track is the same as a standard CD’s, so you can fit the same 74 minutes (or as many as 99 songs) of 5.1-channel audio on a disc. However, instead of containing pulse-code modulation (PCM) digital-audio data, such CDs hold Dolby Digital or DTS data files. Those discs will play in a CD player that has an S/PDIF output connected to a home-theater system with a Dolby Digital or DTS decoder. (All digital home-theater receivers have a Dolby Digital decoder built in, and most also have a DTS decoder.)
Be sure not to play a Dolby Digital or DTS CD from the player’s analog audio outputs; the encoded datastream sounds like nothing but noise, which can damage speakers if the level is high enough. The same problem arises if you send the data from the CD player’s digital output to a sound system without a decoder, so carefully label any Dolby Digital or DTS CDs accordingly.
Dolby and DTS make hardware encoders, but they may be too expensive for small studios. For example, Dolby’s DP569 encoder retails for $5,000, and the DTS CAE-4 encoder costs $7,250. As a result, most small studios use software encoders. Minnetonka Audio manufactures the only available standalone DTS software encoders: SurCode CD Pro ($499; see Fig. 1), which allows you to encode DTS files for CDs, and SurCode DVD Pro ($1,999), which encodes DTS files for DVD or CD. In addition, Minnetonka makes SurCode Dolby Digital ($999), which encodes Dolby Digital files for DVD or CD. Minnetonka’s software is Windows-based and cannot run on a Mac. Users of Pro Tools can use the SmartCode Pro Dolby Digital ($795), DTS ($1,495), and DTS-CD ($495) encoding plug-ins from Kind of Loud (see Fig. 2), along with that company’s excellent Woofie and Tweetie surround-monitoring and panning plug-ins.
A SENSE OF LOSS
Like all dehydrating-rehydrating processes, encoding entails the loss of a little flavor. Dolby Digital and DTS use lossy encoding methods; as a result, a portion of the original audio information is lost in the compression process. The audio coming out of the decoder is not bit-for-bit identical to what went into the encoder.
Put that way, encoding sounds like a horrific thing to do to your carefully crafted surround mix. However, the final results are quite good and difficult to distinguish from the original tracks except under close scrutiny in a controlled listening environment.
A lossy codec system works by looking for redundant and masked audio information. After deciding what humans can and cannot hear, it keeps the important sonic information and throws away the things that theoretically can’t be heard anyway. Thus, you can reduce data files to a tenth of their original size, depending on the codec and the selected data rate.
The new DVD-Audio discs don’t use lossy compression; instead, they use a lossless compression scheme called Meridian Lossless Packing (MLP), which was developed by British audio manufacturer Meridian and reduces the size of audio files by approximately half, depending on the material. MLP allows as much as 74 minutes of 6-channel, 24-bit, 96 kHz audio without any loss of information. It’s a beautiful thing.
Dolby Digital encoding includes a set of parameters called metadata, which is included in the datastream along with the one to six channels of encoded audio data. Metadata allows a 5.1-channel program to play back on any audio system, from mono to a full-blown home-theater system. It also lets the encoding or production engineer ensure that the intent of the audio program is conveyed through any home-theater, stereo, or even mono audio system in the consumer world without doing anything to the audio data itself.
The key metadata parameters can be called the “three Ds”: dialnorm (dialog normalization), dynamic range control (DRC), and downmix levels. (DTS has no metadata and therefore doesn’t support the three Ds.) Dialnorm was developed as a way to match dialog levels between different program material. In movies, everything is subordinate to the dialog. In the Dolby Digital encode-decode chain, the dialnorm parameter represents the average level of dialog in a given program. (For material without dialog, dialnorm can be thought of as the average program level.) It provides a reference that defines a comfortable listening level, which can be matched between different program content.
Dialnorm’s default value is -27 dB, which represents the average level with respect to the maximum level. Programs with louder-than-normal levels might get a dialnorm setting of -20, whereas programs with softer-than-normal levels might be set to -31 dB, dialnorm’s lowest possible setting.
DRC lets the encoding engineer specify a set of dynamic-compression options that are activated in one of several situations. For example, listeners can put the receiver into Midnight mode. All Dolby Digital receivers include Midnight mode, which reduces the dynamic range according to the DRC metadata so that the material doesn’t disturb others in the house. DRC is also applied to help preserve dynamic range and to prevent clipping when a multichannel program is downmixed to stereo.
DRC is generally adjustable in the receiver, allowing listeners to select the amount of dynamic-range compression for their specific listening requirements. In addition, DRC interacts with the dialnorm setting, which defines the “comfortable listening level” outside of which DRC becomes active according to the encoded and listener settings.
In the encoder, you can select six DRC Profiles: Music Light, Music Standard, Film Light, Film Standard, Speech, or None. Music Light is intended for music that only needs light processing because its dynamic range is under control to begin with. If you already compressed the tracks to limit the dynamic range, you might choose the Music Light setting. On the other hand, if your material includes large dynamic swings, perhaps Music Standard is more appropriate. The other Profiles don’t contain the word music in their names, but they may suit your material better than the Music Light and Music Standard; audition them to see which one works best for you.
Downmixing is the process by which 5.1 channels of audio are typically reduced to stereo for listening on headphones, a TV’s mono speaker, or any system that has less than 5.1 channels. A downmix from 5.1 to stereo basically mixes the left-rear channel into the left front, the right-rear channel into the right front, and the center channel equally into the left and right front. The encoding engineer can specify the relative levels at which the center and rear channels are mixed into the front channels, from -6 to +3 dB in 1.5 dB increments. Dolby Digital also provides an option whereby 5.1 discrete channels of audio are downmixed to a Dolby Pro Logic-compatible stereo signal for backward compatibility with older, pre-Dolby Digital home-theater systems.
It’s important to understand how dialnorm, DRC, and downmixing interact with each other. As mentioned earlier, the dialnorm setting defines the program’s average signal level, which is used as the center of a “null band” of dynamic levels. If the listener engages DRC in the receiver and the program level stays within that null band, nothing happens to it. If the level exceeds the band’s upper limit, it is reduced according to the DRC Profile and listener setting; if the level drops below the band’s lower limit, it is raised according to the DRC Profile and listener setting. The Speech, Film Standard, and Music Standard Profiles establish a null band of ±2.5 dB above and below the dialnorm setting. The Music and Film Light Profiles set a null band of ±10 dB above and below the dialnorm setting.
You can also prevent the listener from engaging Midnight mode or other optional dynamic-range controls by selecting None as the DRC Profile. In that case, if there is no downmixing (that is, the program is played on a full 5.1-channel surround system), the dialnorm setting simply adjusts the volume of the decoder to match other program content. However, if the program is downmixed, a set of DRC parameters is automatically engaged to prevent clipping, and those parameters use the dialnorm setting as a reference. The dialnorm setting is important because you never know what sort of system your music is going to be played on.
EASY AS 1, 2, 5.1
Here’s a step-by-step procedure for creating and encoding a 5.1-channel music file in Dolby Digital or DTS.
Step 1. Mix multitrack program material into 5.1 surround using a mixer with surround panners or a digital audio workstation (DAW) with surround panners, such as Minnetonka MX51 or Pro Tools.
Step 2. Record the six mix tracks onto six tracks of an 8-track tape deck and label them Left, Right, Center, LFE, Left Surround, and Right Surround. If the tracks will end up on a DVD, they should be at a sampling rate of 48 kHz; if they’re going to reside on a CD, they need to be at 44.1 kHz.
Step 3. Load the surround tracks into a computer with a software-based DTS or Dolby Digital encoder.
Step 4. Assign the audio tracks to the appropriate encoder channels, which are labeled L, R, C, LFE, Ls, and Rs.
Step 5. Name the output file.
Step 6. If you’re encoding in Dolby Digital for DVD (see Figs. 3 and 4), set the sampling/encoding rate to 48 kHz and select the proper file type (AC-3, Dolby’s encoding algorithm); surround speaker levels (0 dB with respect to the front channels for music or -3 dB for movies); and the final bit rate (448 Kbps). If the Dolby Digital file is intended for a CD, the sampling/encoding rate should be 44.1 kHz, the file type should be WAV, and the bit rate can be any value, so you might as well use the highest possible rate of 640 Kbps.
Set the dialnorm value as close to the average level of your program as possible to avoid unintended processing by the receiver’s dynamic-range processor, even if only during downmixing. For example, if the average signal level is -25 dB below the maximum possible level, dialnorm should be set to -25 dB. However, it isn’t easy to determine the average signal level. You can do it by monitoring the signal through a Dolby DP570 Multichannel Audio Tool ($6,495) or a DP569/DP562 Dolby Digital hardware encoder/decoder pair ($5,000 and $3,600, respectively), but those are expensive solutions that few small studios can afford.
I talked with Minnetonka Audio about adding a scanning function to SurCode Dolby Digital that would determine an average program level and recommend a dialnorm setting consistent with the actual dynamics of the tracks prior to encoding. That would give engineers and producers a reasonable starting point that could be refined by trying a few tracks and listening to how they play back on home-theater systems.
Once dialnorm is set properly, preview the various DRC Profiles and choose the one that best suits your material. The downmix settings should also be selected according to how your content behaves in downmix situations. In general, I prefer to set the downmix levels to -6 dB for the center and -3 dB for the surrounds.
If you’re encoding in DTS for DVD (see Fig. 5), all you need to do is set the data rate (1.5 Mbps or 768 Kbps), sampling rate (48 kHz), and rear-channel attenuation (0 or -3 dB). Encoding in DTS for CD is much simpler because SurCode CD Pro presets everything: file type (WAV), sampling rate (44.1 kHz), rear-channel attenuation (none), and data rate (1.2 Mbps). (Of the 1.4 Mbps transfer rate, 1.2 Mbps are audio data.) The DTS encoding process is quite a bit easier than setting up to encode Dolby Digital, but DTS programs can’t play back on audio systems without full 5.1 capability.
Step 7. Hit the encode button. On a modern desktop computer (for example, a Pentium/400 MHz), encoding in DTS takes about the same amount of time as playing the audio file itself, and Dolby Digital takes about five times as long.
Step 8. Play the encoded file from the computer’s digital output to a digital input on your monitor receiver to make sure the encoding process worked properly.
Step 9: If you created a WAV file with a sampling rate of 44.1 kHz, you can simply load it in the CD-burner program of your choice and make a standard CD-R, just as you would with any stereo audio program. If you’ll be playing the disc from a standard CD player with an S/PDIF output, a regular CD-R disc will be fine. However, many DVD players have difficulty reading CD-R media because of an incompatibility between the wavelength of the DVD’s pickup laser and the dye in the CD-R. If you’ll be playing back your CD on a DVD player, you can try a CD-RW, which will play back from most DVD players. Alternatively, you can get a new dual-pickup DVD player specifically designed to play back CD-Rs. The compatibility problem has nothing to do with the disc containing the encoded data; it’s a playback issue even for standard stereo programs.
By the time you read this, at least two computer systems will come bundled with a Pioneer DVD-R burner. It isn’t a DVD-RAM drive (if it were, it wouldn’t be able to make discs that are playable on a consumer DVD player). It’s a DVD-R recorder, which can turn a $30 DVD blank into a disc that can be played in any consumer DVD player. The new high-end Power Mac G4 will come bundled with the Pioneer DVD-R drive for about $3,750 — amazing when you consider that the cost of a DVD-R recorder is about $6,000 by itself. In addition, Compaq has announced the same DVD-R bundling deal. That means you’ll be able to purchase the hardware to master your own DVDs for less than $4,000.
At least two software companies are taking advantage of that hardware windfall. Spruce Technologies will be offering a DVD-Video authoring system geared toward 5.1-surround audio authoring on a Windows platform. The new application will let you take Dolby Digital or DTS files and burn them onto a consumer-playable DVD-Video disc with menus, still shots, graphics, and video clips. Minnetonka Audio has already sent me the beta version of MASS 5.1, which is a DVD-Audio authoring system that integrates with its affordable MX51 surround-mixing software. The process appears to be as simple as dragging the six discrete audio files into the appropriate boxes on the screen and inserting a DVD blank in the recorder. I can hardly wait to get my DVD-R burner. Both applications are Windows-based, but the Mac versions can’t be far behind. In fact, at the recent National Association of Broadcasters convention, Apple showed a Mac-based DVD-authoring application in the $1,000 range, which certainly brings it within reach of many smaller studios.
I’ve often said that truly innovative 5.1 mixes are going to come not from the megastudios but from the vast number of smaller project studios. Now that the encoding and authoring technology for 5.1 surround is becoming affordable, I’m looking forward to some head-turning mixes. You could be the one to do the next big thing, so get on with it.
Mike Sokol is a human being with 2.0 ears, learning how to mix in a 5.1 environment. For some reason, no one takes seriously his suggestion of using gene therapy to add 3.1 more ears to surround-mixing engineers