Cinepedia The Technology Behind The Big Screen Sat, 08 Sep 2018 16:58:12 +0000 en-US hourly 1 Cinepedia 32 32 Cinema Sound Basics Wed, 01 Nov 2017 23:00:15 +0000

Digital cinema sound is unique from every other commercial distribution format in that the audio signal is not compressed.  Audio is delivered to the cinema in full 24-bits per sample, typically 48,000 samples per second.  (It is possible to carry sound at 96KHz sampling rate, but this rate rarely, if ever, is used.)

Cinema sound has a rich history of formats, beginning with the commercial availability of monophonic sound-on-film in the 1920s.  Stereo-matrix optical sound-on-film was popular in the late 1970’s and throughout the 1980’s.  (“Matrix” refers to the phase-encoding of 4 channels of sound into 2.)  Not only did this format fill out the screen channels with Left, Center, and Right, but also mono Surround.  The format also introduced the “horseshoe” array of surround speakers in the auditorium, surrounding the audience.  Bass-enhancement, as it was called, derived a low-frequency signal from the front speaker channels, which drove separate low-frequency speakers.  When digital sound was introduced for film in the 1990’s, the surround array was divided to deliver a stereo surround signal, and a discrete subwoofer channel was added to the distribution format.  This format remains popular today, known as 5.1 sound.  The 5.1 format delivers Left (L), Center (C), Right (R), Left Surround (Ls), Right Surround (Rs), and Low Frequency Effects ( LFE, or subwoofer).  The astute reader will note that 5.1 sound has 6 channels of sound.  In the nomenclature of multi-channel sound, the “.1” indicates the LFE channel.  As a result, when counting channels, 5.1 = 5 + 1 = 6.

7.1 sound is an improvement on 5.1, in that it further divides the surround speaker array into four sections:  Left Side Surround (Lss), Right Side Surround (Rss), Left Rear Surround (Lrs), and Right Rear Surround (Rrs).  This adds up to 8 channels of sound.

The channel labels L, C, R, Ls, Rs, Lss, Rss, Lrs, Rrs, LFE, and more, are defined in SMPTE ST 428-12 DCDM Common Audio Channels and Soundfield Groups.

In digital cinema, the MainSound Track File defines the carriage of 16 channels of sound.  Of course, 16 channels is never enough, and emerging immersive sound formats allow a much more complex array of signals.  Since 5.1/7.1 sound only occupies 8 channels, at most, of the available 16, there remain 8 channels of bitstream carriage for other signals that are to be synchronized to the movie.  It is in these available sound tracks where Assistive Listening and Narrative audio tracks reside, motion seat data resides, as well as compressed video for assistive sign language.

Sound Formats and Soundfields Wed, 01 Nov 2017 22:30:21 +0000

Sound Formats

While the concept of surround sound has been around since the 1940’s, the horseshoe-shaped surround speaker array commonly found in cinemas today was established in the 1970’s, with the introduction of the “Stereo Optical” format by Dolby Laboratories. The 5.1 and 7.1 sound formats, in common use today, simply divide the surround speaker array into multiple channels. The progression of surround sound formats is illustrated below.

Cinema speaker formats: Stereo Optical, 5.1 Sound, and 7.1 Sound

Figure S1. Cinema speaker formats: Stereo Optical, 5.1 Sound, and 7.1 Sound

Soundfield Groups and Immersive Sound

The concept of Soundfield Groups embodies the idea that a multiplicity of sound channels will continue to evolve.  What has actually evolved, however, is a more generalized concept of cinema sound, where a common set of sound channels, forming a sound bed, is combined with more localized sounds rendered across speakers located above and about the audience.  This is the basis of Immersive Sound formats.  A key characteristic of Immersive Sound is the 3-dimensional positioning of sound objects through the application of metadata, a rendering engine, and an appropriate loudspeaker system.  To further break from the channel concept, sound objects of limited duration can simply be carried as a chunk of audio.  In this manner, a rich soundfield may be created without the need for a large number of sound channels in distribution.  In practice, 5.1 / 7.1 speaker arrays are combined with additional speakers located above and about the audience, in combination receiving audio from the rendering engine.

Commercially available cinema immersive sound formats are available from Dolby Laboratories (Dolby Atmos®), DTS (DTS:X®), and Barco/Auro Technologies (Auro-3D®). Each of these formats consist of a unique distribution format and proprietary rendering engine. As of this writing, a SMPTE standards committee is working towards a common distribution format for immersive sound. The common distribution format is intended to drive competitive and proprietary immersive sound rendering engines.

MainSound Track File Wed, 01 Nov 2017 22:00:19 +0000

The MainSound Track File audio characteristics of bit depth, sample rate, channel count, and reference level are defined in SMPTE ST428-2 DCDM Audio Characteristics, which are applicable to both Interop DCP and SMPTE DCP. The maximum number of audio channels carried in the Sound Track File is 16.

Audio channels in the Composition map to the audio outputs of the media block or server in a one-to-one manner. An audio signal carried in channel 1 of the track file will output at channel 1 of the media block or server. Audio signals are typically paired in AES3-formatted streams for output. The pairing takes place in numerical order, i.e, audio channels 1 and 2 are output in the 1st AES3 signal, and so on. The one-to-one mapping of channel to output occurs for both Interop DCP and SMPTE DCP, although SMPTE DCP has exceptions discussed further below.

The ordering of audio and data in the MainSound Track File is called out in ISDCF Doc4 – 16 Channel Audio Packaging Guide. The table below summarizes the ordering of the MainSound Track File:

Channel in Package 5.1 7.1 Description
1 L L Left
2 R R Right
3 C C Center
4 LFE LFE Low Frequency Effects
5 Ls Lss Left Surround / Left Side Surround
6 Rs Rss Right Surround / Right Side Surround
7 HI HI Hearing Impaired
8 VI-N VI-N Visually Impaired Narrative
9 not used not used Reserved for SDDS 7.1
10 not used not used Reserved for SDDS 7.1
11 not used Lrs Left Rear Surround
12 not used Rrs Right Rear Surround
13 Motion Data Motion Data Motion Seats
14 Sync Sync FSK Sync Signal for Immersive Sound
15 Sign Language Sign Language Sign Language Video
16 not used not used


Table S-1. MainSound Track File Signal Order

It was noted earlier that the signal order described in the table above also describes the ordering of audio outputs from the media block or server. If Left audio is to be placed differently in the track file, then the playback device must route the Left signal to the output defined in the table for interoperability.

SMPTE DCP prescribes several incompatible methods for packaging 5.1 and 7.1 audio. Of these multiple methods, it is recommended that only the channel configuration defined in SMPTE 429-2 DCP Operational Constraints, Annex A, Channel Configuration 4, be used in commercial distributions. In this channel configuration, the ordering of audio and data channels defined in Table S-1 above is recommended.

However, SMPTE 429-2 also offers a few other methods for packaging 5.1 and 7.1 audio. SMPTE 429-2 defines alternative packaging channel configurations for 5.1 and 7.1, as well as configurations for other sound formats. In addition, it is possible to label each audio channel in accordance with the SMPTE ST 377-4 MXF Multichannel Audio Labeling Framework. These alternative methods for packaging multi-channel audio order the channels differently, and as discussed, require signal routing in the media block or server to output the signal to the proper speaker or device. However, the alternative methods in SMPTE DCP are not available in Interop DCP, and not all media blocks and servers are configured to properly route signals. For SMPTE DCP, it is recommended that Channel Configuration 4 and the signal ordering described in the table above be used for compatibility.

Accessibility Overview Fri, 01 Sep 2017 23:00:42 +0000

The cinema experience is made more accessible to those with hearing and visual disabilities through the use of specialized sound tracks and visual aids. These same mechanisms may also be used to aid those whose language may be different from that of the production.  Accessible sound and visual aids include Hearing-Impaired (HI) and Visually Impaired Narrative (VI-N) sound, as well as Open Subtitles and Captions, Closed Subtitles and Captions, and Sign Language Video. The delivery vehicles for this specialized content is the subject of this chapter.

]]> 0
Accessibility & The Audio Track File Fri, 01 Sep 2017 22:00:45 +0000

Hearing-Impaired (HI) Audio

Hearing-Impaired audio dates back to the 70’s when first included as an output in Dolby cinema audio processors. The purpose of the HI channel is to boost dialog over music and sound effects, with the intent of making dialog easier to understand through the use of headphones. A typical listener might suffer from high-frequency loss in hearing, for example, thus benefiting from the boosted dialog. The HI output signal of cinema audio processors is derived from the dialog-dominant Center channel, coupled with a softer mix of Left and Right audio (which predominantly carry music and effects) than what would be heard un-aided in the auditorium. Several commercial methods exist to deliver specialized audio to audience-worn headphones using infrared light or radio waves.

In digital cinema, a different approach to HI audio has been introduced, where the HI signal can be generated in post-production and distributed as a sound channel in the 16-channel MainSound Track File. (The forthcoming Sound section will provide audio channel assignment information.) The same commercial methods as before for delivering HI audio through headphones in the auditorium can be applied.

Some manufacturers choose to support both methods, providing a center-channel dominant, audio-processor-generated mix for HI, as a fallback for when the HI signal is not present in the Composition. Notably, the Composition does not include metadata that instructs the system when to fallback to audio processor-generated HI audio.

Visually-Impaired Narrative (VI-N) Audio

Visually-Impaired Narrative was first introduced to cinema in the 90’s with the DTS cinema sound system for film. The target audience of the VI-N channel will have good hearing, but either no vision, or difficulties with vision. The narrative describes the events taking place on screen to the listener. As with HI, the auditorium delivery mechanism to the headphone is typically infrared light or radio wave. The VI-N signal is distributed as an audio channel in the 16-channel MainSound Track File.  (See MainSound Track File.)

Sign Language Video

Sign Language Video was introduced as a recommendation by the Motion Picture Association of America (MPAA) in 2017 to satisfy new regulations introduced in Brazil, and potential regulations forthcoming in other countries. The video channel is encoded as a VP9 bit stream for inclusion in the MainSound Track File. The VP9 video frame rate is set to match that of the Composition. In this manner, the video signal is automatically synchronized with the movie, requiring no external synchronization methods. (The forthcoming Audio section will provide audio channel assignment information.) Sign Language Video is viewed by the audience on a second screen, typically a smart phone.

]]> 0
Timed Text Track Files Fri, 01 Sep 2017 21:00:09 +0000

Open Subtitles, Closed Subtitles, Open Captions, and Closed Captions are each a type of Timed Text essence, where each type is targeted to a specific audience. Timed Text essence may be included in the Composition as a Timed Text Track File. While the essence is referred to as Timed Text, the actual track files may also reference and carry graphics.

The essence of Timed Text is defined in SMPTE ST 428-7 DCDM – Subtitle.  However, the purpose of the Timed Text Track File is identified in the Composition Playlist (CPL).  SMPTE ST 429-7 DCP – Composition Playlist defines the MainSubtitle element for Open Subtitle essence.  This definition is extended in SMPTE ST 429-12 DCP – Caption and Closed Subtitle to include the additional elements of ClosedSubtitle, MainCaption, and ClosedCaption.

SMPTE Timed Text essence is wrapped as an MXF track file in accordance with SMPTE ST 429-5 DCP – Timed Text Track File.  By doing so, SMPTE Timed Text assets can be encrypted and managed in the same manner as Picture and Sound track files.  As explained below, Timed Text essence may have fonts and graphics associated with it.  For SMPTE Timed Text, the associated files are wrapped with the essence in the Timed Text Track File.  In contrast, Interop Timed Text essence are delivered in the package as an XML file, with the associated font and graphics essence also delivered as separate files.

Open Subtitles and Open Captions

Open Subtitles and Open Captions refer to text superimposed on the picture displayed in the auditorium. As such, they are visible to all members of the audience. Open Subtitles are typically displayed on-screen as text in a language other than the language of the movie’s sound track. The intent of the Open Subtitle is to translate the dialog of the sound track for the audience. Open Captions are also typically displayed on-screen as text, but in the same language as the movie’s sound track, with the intent to convey the dialog and action of the movie to those who are hearing-impaired or deaf. Graphics may also be displayed with on-screen timed text. Open Subtitle track files are identified in the CPL as MainSubtitle, and Open Caption track files are identified as MainCaption. A Composition may contain only one MainSubtitle, or one MainCaption file.

Timed Text is managed differently in SMPTE and Interop DCP. Both packaging methods employ XML for Timed Text essence, but each defines the XML differently. (The difference in Timed Text essence has proven to be one of the stumbling blocks in the adoption of SMPTE DCP.) In addition, Interop delivers Timed Text as an XML file in the Composition package, while SMPTE DCP wraps Timed Text XML essence in MXF. This is further explained below.

Interop Timed Text essence is based on a proprietary format introduced by Texas Instruments called CineCanvas™.  The latest CineCanvas documentation is available on the Interop DCP page. The CineCanvas file is in XML format, and included in the Composition as an XML file. Text properties and associated actions include language, font, timing of display, fade up/down properties, and the text position properties of direction, horizontal alignment and position (left, center, right), and vertical alignment and position (top, center, bottom). Image (graphics) properties include timing of display, fade up/down properties, horizontal alignment and position, and vertical alignment and position. Font and graphics files may be included with the distribution.

Timed text essence in SMPTE DCP is defined in SMPTE ST 428-7 DCDM – Subtitle. The SMPTE Subtitle format provides a more sophisticated control of display, placement, and font properties over CineCanvas, and includes 3D text. SMPTE Subtitle also includes Ruby text, small annotations used to guide pronunciation placed alongside logographic characters used in the Chinese, Japanese, and Korean languages. As with CineCanvas, horizontal alignment and position is described in terms of left, center, right, and vertical alignment and position in terms of top, center, bottom. Font and graphics files may be included with the distribution.

Closed Captions and Closed Subtitles

Closed Captions and Closed Subtitles are text-only recitations directed to a personal display, and not visible to the general audience (i.e., off-screen).

SMPTE ST 428-10 DCDM Closed Caption and Closed Subtitle sets forth a set of constraints applied to Timed Text essence for the off-screen applications of Closed Captions and Closed Subtitles. The constraints essentially limit the interpretation of positioning elements in the XML document to the vertical position. The same closed caption and closed subtitle constraints applied to SMPTE off-screen Timed Text may also be applied to CineCanvas™ off-screen applications in Interop distributions. Unlike Open Subtitles and Open Captions, Closed Subtitles and Closed Captions may carry up to six (6) versions of off-screen Timed Text Track Files in one Composition, one language per Track File. It is up to the personal display provider to provide on-demand access to the multiple track versions. The packaging of off-screen Timed Text Track Files deserves special attention, as explained in the Reel Flexibility for Timed Text section.

Interop DCP Closed Caption files are a constrained version of  the CineCanvas format.  The constraint documentation is available on the Interop DCP page.

]]> 0
Reel Flexibility for Timed Text Fri, 01 Sep 2017 20:00:53 +0000

Unlike Picture and Sound Track Files, Timed Text Track Files do not require one essence file per Composition Reel. Nor is it required that the timed text essence in a Timed Text Track File pertain only to the time frame of its associated Reel. In other words, timed text essence in a track file may carry content that plays outside of the time period of its Reel. This rule holds for both on-screen and off-screen Timed Text. The reason for why such behavior is called for that Timed Text essence can’t be edited on a scene-by-scene basis as Picture and Sound. As an example, a timed text recitation may need to overlap Reels. Based on this rule, it is possible to have only one Timed Text Track File for any one purpose in a multi-Reel Composition, with no loss of Timed Text in the movie.

The diagram below illustrates the concept. The set of available assets is shown in the left diagram. The manner in which the assets are handled by the projection system is shown in the right diagram. Languages are labeled as English (en), French (fr), German (de), and Dutch (nl). In the left diagram, Reel 1 has one Picture Track File, one Sound Track File, and one (Open) Subtitle Timed Text Track File. Reel 1 also carries four (4) off-screen Timed Text Track Files. The remaining Reels can be discerned accordingly. However, when assembling the track files for playout, as shown in the right diagram, the system discovers that there are seven (7) different types of off-screen Timed Text Track Files, one more than allowed. Only the first six (6) off-screen Timed Text Track Files will be played. In the case illustrated below, the seventh track file will not be played.

Figure TTR-1.  Example of Media Block interpretation of multiple Timed Text Track Files.

]]> 0
Communications for Off-Screen Timed Text Fri, 01 Sep 2017 19:00:16 +0000

To be useful, off-screen Timed Text (Closed Captions and Closed Subtitles) must be communicated by the Media Block to outboard transmitters and personal display systems. In practice, the transmitter and personal display devices comprise a self-contained commercial system. Such commercial systems receive the off-screen Closed Caption and Closed Subtitle data from the Media Block using a standardized set of protocols over Ethernet.

Ethernet communications are handled using the SMPTE ST 430-10 DCO – Auxilliary Content Synchronization Protocol. Using the protocol, the Media Block establishes communications with the outboard personal captioning system, and sends an list of XML timed text assets in the format defined by SMPTE ST 430-11 DCO – Auxilliary Resource Presentation List. Following that, the personal captioning system pulls the timed text data from the Media Block as needed, based on timing information supplied by the protocol, and displays the text on the personal display. Unlike Picture and Sound data, DCI security requirements allow decrypted Timed Text Track Files to be stored in the clear in local storage. The diagram below illustrates the process.

Figure OSTT-1.  Off-Screen Timed Text Communications

]]> 0
System Basics Mon, 01 May 2017 10:00:25 +0000


Digital cinema was originally designed to replace 35mm film, with the goals of lowering the cost of film distribution, securing distribution, and preserving the image quality achievable with 35mm film. Digital Cinema Initiatives (DCI), a consortium of the major Hollywood studios, manages the top level specification for the core technology. Equipment is tested to a subset of the specification per the DCI Compliance Test Plan (CTP). Interoperability is managed by DCI, SMPTE standards, or industry practice.

Projection System Basics

Movies are distributed in digital form, with movie data distributed as a Composition. Compositions are packaged for distribution as a Digital Cinema Package (DCP). Movies are encrypted prior to distribution, and decrypted at the cinema. The decryption key is provided in the form of a Key Delivery Message (KDM). Forensic marking also takes place, with picture and sound marked with time-of-day and location information. All secure processes take place in real time in secure hardware, such that unencrypted picture and sound data are never stored in user-accessible local data storage. The processor where decryption, decompression, and forensic marking takes place is referred to as a Media Block. The Media Block contains a Screen Management System (SMS), which provides the user interface to the projection system (also known as a “screen system”) and manages housekeeping tasks for the Secure Processing Block.
The Projection System

The Projection System

Theater Management System

The majority of cinemas have more than one auditorium. In the language of cinema, auditoriums are often referred to as a “screen.” The management of multiple screens requires the means to schedule shows, manage content and keys, centrally store content and keys for later movement over a network to a screen, and the ability to monitor the status and health of the system.

DCPs arrive in the cinema by satellite or by means of portable storage, such as a hard drive, and are stored in local network storage. A typical composition is 200GB in size, and KDMs are generally under 100KB. It is not unusual for several versions of a title to be received, with features such as open captions and/or immersive sound tracks. Most cinemas schedule shows in the Point-of-Sale (POS) system, which is not part of the cinema system. A schedule manager will read the POS schedule, and from that generate the actual play times for advertisements, trailers, and movies at each screen. The schedule manager will also move compositions stored in local network storage to the appropriate screens. A key manager will catalog the KDMs received, and match those to each screen. In a more sophisticated system, the key manager will also facilitate the automation of KDM delivery by collecting and sending the credentials of the equipment to the appropriate KDM generation vendor.  The monitoring system will indicate if everything is ready for a show, and if not, raise the appropriate flags to cinema personnel.  All of the elements described are often combined into one product called a Theater Management System (TMS).

Theater Management System (TMS)

Theater Management System (TMS)

]]> 0
DCI and SMPTE Mon, 01 May 2017 09:00:05 +0000

Digital Cinema Initiatives (DCI) is a consortium of the 6 major Hollywood studios: Disney, Fox, Paramount, Sony, Universal Studios, and Warner Bros. Formed in 2002, DCI issued version 1.0 of its Digital Cinema System Specification (DCSS) in July 2005. Numerous changes to the specification have since been issued, known as “errata.” DCI also publishes a Compliance Test Plan (CTP), based on its DCSS. Several testing agencies are authorized to conduct DCI Compliance Testing based on the CTP and DCSS. The latest versions of the DCSS and CTP, as well as a list of authorized testing agencies, are available at the DCI website.

The DCSS is the top-level document for baseline digital cinema system design. It comprises over 150 pages, nearly half of which specify the mechanisms of digital cinema security. To meet DCI’s specification, a Media Block must pass NIST FIPS 140-2 Level 2 testing, as constrained by the DCSS, in addition to DCI’s CTP. DCI also specifies or recommends compliance to numerous SMPTE standards. However, not all SMPTE standards employed in digital cinema are identified by DCI.

DCI Recommendations target new practices in cinema where distributors and exhibitors benefit from uniformity. While enforcement of DCI Compliance is enacted through the Trusted Device List, there is no enforcement mechanism for DCI Recommendations.

SMPTE stands for the Society of Motion Picture and Television Engineers, and is the standards body where the majority of digital cinema standards work takes place. (The JPEG 2000 profiles specified by SMPTE are standardized by ISO.) Standards group activity is managed online and available to SMPTE standards committee members at the SMPTE website. The SMPTE standards effort for digital cinema was initiated in January 2000, and continues to this day.

]]> 0
DCP Sat, 10 Dec 2016 23:00:14 +0000

In the language of digital cinema, a Digital Cinema Package, or DCP, is the name given to the collection of files sent to a cinema. It is a “packing crate” for files, which may or may not comprise a complete motion picture. A digital motion picture, on the other hand, is comprised of a structured set of files referred to as a Composition. A Composition is a work product, or title, examples of which include not only motion pictures, but also trailers and advertisements.

A DCP can carry one or more Compositions, or only a partial Composition. When carrying one or more Compositions, the DCP is called a Composition Package. When carrying the partial assets of a single Composition, the DCP is called an Asset Package. A Packing List accompanies each DCP, identifying the DCP’s file assets. These concepts are illustrated below, and explained in further detail in subsequent parts of this chapter.

A Digital Cinema Package (DCP) as a Composition Package

Figure DCP-1. A Digital Cinema Package (DCP) as a Composition Package

A Digital Cinema Package (DCP) as an Asset Package

Figure DCP-2. A Digital Cinema Package (DCP) as an Asset Package

]]> 1
The Composition Sat, 10 Dec 2016 22:00:58 +0000

A Composition is a complete version of a work product, or title. The Composition consists of multiple files comprising a playlist and at least two Track Files. For flexibility and extensibility, each Track File carries only one type of essence, such as picture, sound, or subtitles. The manner in which the files are to be played is specified in a playlist, called the Composition Playlist, or CPL.

By defnition, a Composition must have at least three files: a Composition Playlist (CPL), a Picture Track File, and a Sound Track File. In addition, Track Files can be divided into multiple files consisting of temporal chunks of essence called Reels. The name “reel” comes from film distribution, where a movie is shipped as temporal chunks of film, or reels of film. Reels makes it easier to physically ship the movie. But it is also easier to deliver a last minute edit when changing only one reel of a movie, which might happen due to a correction in credits, or a product placement change. Similarly, certain efficiencies are possible in digital distribution and production when organizing digital content in temporal chunks, leading to the organization of Compositions in digital Reels.

The illustration below depicts a Composition consisting of a Composition Playlist and four types of essence Track Files (Picture, Sound, Subtitle, and Closed Caption), temporally organized as two Reels.

Composition structure

Figure CPL-1. Composition Structure

In practice, a Composition might contain as many as 100 files, or more. The only limit to the number of track files is that imposed by the rules for encryption.

Multiple versions of a title often exist. Different versions might be necessary to accomodate 3D, subtitles, censorship cuts, or additional language sound tracks. For each version, a different Composition must be created. While this may sound inefficient, the single-essence-per-file rule encourages efficiency by allowing file assets to be shared among versions. As an example, different Compositions, representing versions of a title, intended for distribution in different countries may have different Sound Track Files, but carry the same Picture Track File. This example is illustrated below.

Figure CPL-2. Composition Versions with Shared Picture Essence Asset

Similarly, a 3D Picture Track File could be exchanged for the Picture Track File shown above, along with new CPLs, to define 3D versions of the movie. In this manner, many versions of the movie can be distributed without the need to send duplicate Track Files.

The Composition architecture, when first proposed in late 2001, was a distinct departure from any prior media format used in distribution. Prior distribution media formats characteristically married essence types into a physical form, such as a book, a DVD, a film print, or into a married electronic form, such as a television broadcast signal. Notably, it is a characteristic of the motion picture business to distribute motion pictures in somewhat customized versions, such as a version with a particular sound format, captions, a particular aspect ratio, or in one or more languages. This is driven by community requirements and equipment limitations, which often require cinemas to present more than one version of a movie over the course of an engagement. The Composition architecture was conceived to efficiently address the need to distribute multiple versions of a movie to a cinema by providing a mechanism for sharing essence files among versions.

The concepts behind the DCP and Composition, as described in this and the prior section, are explained in further detail in SMPTE ST 429-2 DCP Operational Constraints, the top-level standard for SMPTE DCP. These concepts were first put to work in the pre-standard Interop DCP, as explained here. Interop DCP remains in use today, although it is recommended that distributions transition to the newer and better documented SMPTE DCP.

(For those interested in the history of the Composition, the first public presentation on the concept was given by Michael Karagosian at NAB 2002, and available here.)

]]> 0
Title Versions Sat, 10 Dec 2016 21:30:39 +0000

The Composition section discussed how multiple versions of a Composition may be created while sharing essence carried in select Track Files. In the DCP section, the concept of a single Composition Package was presented for the carriage of multiple Compositions. A Composition Package can be used to efficiently carry multiple versions of a title.

It is often desirable, however, to distribute title versions as separate DCPs. In such cases, a parent Composition will carry the complete version of a title, and one or more child Composition versions are created, designed to share select Track Files of the parent. When all of the CPLs and essence Track Files are present in common data storage, the playback system will have everything it needs to play each version of the title. But all of the files needed to play a child Composition will not be present in the DCP that carries it. A mechanism is needed to properly manage the distribution of parent and child Compositions.

In practice, the parent Composition is given the label “Original Version,” or “OV” package, and carried in its own DCP. Each child Composition is given the label “Version File” or “VF” package, and also carried in its own DCP. When the parent OV package and associated child VF packages are loaded onto a digital cinema server, all of the files needed to play the various Composition versions are present and ready to play.

The illustrations below depict the two methods available for distributing multiple Compositions that share Track Files.

Using a single Composition Package:

Figure TV1. Composition Versions carried in a single Composition Package

Using multiple DCPs:

Figure TV2. Composition Versions carried in OV and VF DCPs

Both distribution packaging methods, when fully loaded into the digital cinema server, lead to the same result, allowing both Composition versions to be played, as illustrated below.

Figure TV2. Composition Versions having shared Assets

]]> 0
Composition Playlist (CPL) Sat, 10 Dec 2016 21:00:48 +0000

Every Composition is defined by a Composition Playlist, or CPL.  As the name suggests, the CPL defines and orchestrates the playback of all Track Files that comprise the Composition.  It does so temporally, in a reel-by-reel fashion.  Each version of a title will have a unique CPL  This could be a version having a certain picture type (a particular aspect ratio, or 2D vs 3D), a different sound mix (say, 5.1 or 7.1), a sound track in a particular language, subtitles in a particular language, and so on.

While title versions require a unique CPL, the Track Files associated with each CPL do not have to be unique. A set of Picture Track Files, for example, can be shared by many CPLs, each CPL defining a different version of the work based on the same Picture Track File, but having other Track Files of different essence types that are unique to that version.

Figure CPL-1.  The Composition Playlist (CPL)

The CPL is an XML data file whose metadata defines the Composition that it represents.  A description of the CPL’s data elements is presented below:

Composition Playlist (CPL) Element Description
CPL Identifier Identifies this instance of the Composition, encoded as a Universally Unique IDentifier (UUID) per RFC 4122
Content Version Identifier Identifies the version of content of which this Composition is an instance, encoded as a Universally Unique IDentifier (UUID)
Reel List A list by Reel of the Track File assets to be reproduced, in the order they are to be played
Content Title Text Title of the work
Annotation Text Text field, typically defined by the Digital Cinema Naming Convention
Content Kind An attribute naming the type of content in the Composition per SMPTE 429-7
Rating An agency rating of this work.  In the US, this would be an MPAA rating
Issue Date Time and date when the CPL was issued
Issuer Optional text field naming the issuer
Creator Optional text field identifying the application used to create the Composition
Signer If digitally signed, this carries the public key of the entity that signed the CPL
Digital Signature Optional, used to authenticate the CPL

Table CPL-1.  Composition Playlist (CPL) Data Elements

For a complete description of the SMPTE CPL, please refer to SMPTE S429-7 Composition Playlist. The top-level document that defines the DCP and Composition is SMPTE ST 429-2 DCP Operational Constraints.

The CPL carries metadata that describes the Composition, but in practice, more data is needed to benefit cinema operators. An ad-hoc nomenclature is often used for this purpose called the Digital Cinema Naming Convention.  The Naming Convention prescribes a human-readable text field carried by the CPL Annotation Text element.  It was devised as a stop-gap measure until a means to include additional metadata in the Composition was devised.  A standard that accomplishes this is now available, titled ST 429-16 Additional Composition Metadata and Guidelines.

]]> 0
Track Files Sat, 10 Dec 2016 19:00:38 +0000

SMPTE defines content as metadata plus essence. Essence, in the digital cinema application, is the term applied to a single form of expression, such as picture, or sound, or subtitles. Essence types are singular in nature, i.e., only a 24 fps picture file, only a 48 fps picture file, only a 3D picture file, only a 5.1 sound track, only a 7.1 sound track, and so on. Using these definitions, a Track File carries a single essence type plus the necessary metadata to facilitate its use.

The independence of essence types in the Composition provides a high degree of extensibility, allowing new types of essence to be introduced in the future without breaking the structure of the Composition. For example, when the concepts of the Composition and the Digital Cinema Package were first introduced in digital cinema, stereoscopic 3D was not on the roadmap. But the extensibility of the Composition allowed the Stereoscopic Picture Track File to be quickly incorporated as digital 3D emerged.

MXF Picture & Sound

Track Files are wrapped per a constrained version of the Material Exchange Format, or MXF, specification. MXF provides a structured method for carrying a variety of essence types with metadata. While MXF is capable of carrying more than one essence type in a single file, it must be emphasized that the digital cinema application requires only one essence type per file. More can be learned about MXF on Wikipedia. The constraints applied to MXF for wrapping digital cinema Picture and Sound are defined in SMPTE ST429-3 Sound and Picture Track File.

MXF track files consist of a header, an essence container, and a footer. The header carries metadata that describes the track file. The essence container carries, of course, the essence. The footer carries an index table of the essence.

Picture and Sound essence is frame wrapped using KLV (Key-Length-Value). The KLV Key identifies the nature of the essence present. Length refers to the length of the Value field. The Value field itself contains a frame of essence. More can be learned about KLV on Wikipedia.

MXF Track File Showing KLV Packet

MXF Track File Showing KLV Packet

XML & Timed-Text

Timed Text Track Files, such as open or closed Subtitles and Captions, are defined in XML and then wrapped in MXF. The wrapping of Timed Text XML is similar to that for Picture and Sound, in a reel-by-reel manner, with the exception that font resources may also be included in the wrap. The constraints applied to MXF for wrapping digital cinema Timed Text are defined in SMPTE ST429-5 Timed Text Track File.

]]> 0
Track File Encryption Sat, 10 Dec 2016 18:00:18 +0000

A Composition may be encrypted for secure distribution. When encryption is performed, only the Track Files are encrypted, in a file-by-file manner. The Composition Playlist (CPL) is not encrypted. Track Files may be selectively encrypted, where some Track Files are encrypted, and others are not. In practice, decisions concerning encryption are left to the content owner. A content owner, for example, may choose to encrypt picture but not sound or timed text files. When a Track File is encrypted, all essence in the file is encrypted. Essence in a Track File cannot be partially encrypted.

Encrypted Composition

Figure TFE-1. An Encrypted Composition

The encryption algorithm used in digital cinema is the well-known Advanced Encryption Algorithm (AES). AES is a symmetric encryption algorithm, a term explained in the Encryption section. In the digital cinema application, a 128-bit key is used. When encrypted, the essence within each Track File is encrypted with a unique key. No two Track Files utilize the same key. The Key Delivery Message (KDM), also discussed in the Encryption and Key Delivery Message sections, carries an encrypted version of each key used to encrypt the Track Files within the associated Composition. A KDM is required to unlock and play the Composition.

Only the essence, or the “Value” portion of the KLV packet, is encrypted. The metadata associated with the essence is exposed so it can be read when searching the file. This also allows an operator to play a Track File from any frame, regardless of encryption. The KLV packet with the encrypted essence is wrapped within another “special” KLV packet, along with associated cryptographic metadata. The “special” KLV packet simply carries encrypted content, without knowing the nature of its contents. The “special” KLV packet, carrying the encrypted KLV packet, is then wrapped in an MXF Track File as it would were it not encrypted. This arrangement is illustrated below.

Encrypted KLV Packet is Carried Within a Special KLV Packet in the MXF Track File

Figure TFE-2. Encrypted KLV Packet is Carried Within a Special KLV Packet
in the MXF Track File

More information about Track File encryption is available in SMPTE ST429-6 MXF Track File Essence Encryption.

]]> 0
Additional Composition Metadata Sat, 10 Dec 2016 16:00:50 +0000

Cinema has a history of innovation among filmmakers, resulting in a variety of film formats that require special setups per release. Business transactions also need assistance, as the provider of the DCP is often not the entity with whom box office is shared. One of the hopes when introducing digital distribution to cinema was that of automating projection system and back-office processes. However, the metadata included in the Composition Playlist (CPL) has proven to be inadequate for this purpose.

To provide users with a richer set of metadata, SMPTE DCP allows the optional inclusion of an Additional Composition Metadata file, as described in SMPTE ST 429-16:2014 Additional Composition Metadata and Guidelines. The information that can be carried in this file is listed below.

Additional Composition Metadata Elements Description
ReleaseTerritory The intended release territory for the Composition.
VersionNumber The version number of the Composition.
Chain The targeted use for the Composition, such as a theatre chain, trade show, or festival screening.
Distributor The distributor (studio) for the Composition in the intended release territory.
Facility The organization that created the Composition.
AlternateContentVersionList Content identifiers in addition to that of the CPL’s ContentVersion element.
Luminance The screen luminance at which the content was authored.
MainSoundConfiguration The soundfield and channels present in the MainSound track file.
MainSoundSampleRate The audio sample rate of the MainSound track file.
MainPictureStoredArea Height and width (in pixels) of the picture essence container (for projector setup).
MainPictureActiveArea Height and width (in pixels) of the active picture area (for masking setup).
MainSubtitleLanguageList Languages displayed by the MainSubtitle track file.

Table ACM-1. Additional Composition Metadata Data Elements

]]> 0
Interop DCP and SMPTE DCP Sat, 10 Dec 2016 15:00:21 +0000

A discussion of the DCP would not be complete without mention of the two types of distribution package used in production: Interop DCP and SMPTE DCP. They are functionally similar in that the DCP definitions provided earlier in this chapter apply to both types of packaging formats. But they are substantially different in that they are not interoperable.

Interop DCP is based on an early draft proposal for SMPTE DCP.  Interop DCP was put into practice in 2004, in preparation for the rollout of digital cinema.  SMPTE DCP was not finalized until 2009, four years after the rollout began. Interop DCP was intended as a temporary format until the standardized version came into existence. However, the lack of backwards compatibility, as well as the requirement for functionality not available in legacy equipment, has hampered the transition to SMPTE DCP.

Digital cinema owes its success to Interop DCP, which continues to be the primary distribution format at the time of this writing. Despite its success, Interop DCP has not been formally standardized or published. In contrast, SMPTE DCP is well-defined, and published as a suite of SMPTE standards. Maintenance on Interop DCP continued up until 2012, when it was decided to limit new development to only SMPTE DCP.

There are several differences in SMPTE DCP that prevent backwards compatibility and interoperability. Both formats utilize a Composition Playlist (CPL), but with different, non-interoperable XML structures.  There are also structural differences in Subtitle and Audio track files.  The subtitle rendering engine of DLP Series 1 projectors,  the largest concentration of which are in the United States, is incapable of rendering the subtitle track file standardized by SMPTE.  This incompatibility can be overcome by rendering subtitles in the server, a feature which is now commonplace in newer servers and media blocks, but not present in older installations. Another difference is a reliance on audio channel routing in the server, which also is not supported in many older systems. As a result, practical SMPTE DCP distributions are constrained to bypass the new audio features, falling back to Interop-style audio packaging.  This workaround requires the use of “Channel Configuration 4” described in Annex A of SMPTE ST 429-2 DCP Operational Constraints, which was originally included for test purposes. More explanation of digital cinema audio is available in the chapter on Sound.

The DCP type can be most easily recognized by the namespace root called out in the XML-based Composition Playlist (CPL). SMPTE DCP uses the SMPTE-RA.ORG namespace root, while Interop DCP uses DIGICINE.COM. The top-level document that defines SMPTE DCP is SMPTE ST 429-2 DCP Operational Constraints. Interop DCP documentation is available here.

There is also an excellent presentation by Jim Whittlesey on the differences between Interop DCP and SMPTE DCP.

]]> 0
The Trust Model Fri, 09 Dec 2016 08:00:05 +0000

Digital Cinema Security
The goal of a trust model is to minimize the number of entities that must be trusted to preserve business interests. The digital cinema trust model enables highly valued content to be distributed worldwide without encumbrance on fulfillment and exhibition entities.

The early stages of motion picture production are managed by rights owners where trust is inherent. Motion picture distribution, however, is normally conducted outside of a rights owner’s purview. Special consideration, therefore, is needed to ensure a trusted environment in distribution. The typical high level workflow model for motion pictures is illustrated below. The area of interest is that shared by Distribution and Exhibition.


Figure TM-1. Motion Picture Workflow

Trust is ensured through the encryption of digital cinema content and the management of security keys that enable the content to be played. For practical reasons, it’s useful to encrypt once and distribute to many, in a one-to-many fashion. But it’s also desirable to restrict where encrypted content is played, enabling playout in a site-by-site or screen-by-screen manner per the business requirements of the rights owner.

Cinema is unique in the digital media world in that it’s relatively small footprint of approximately 150,000 screens worldwide allows it to be managed as a closed ecosystem. Content encryption is managed such that the matter of trust is further narrowed, away from exhibitors, and towards an even smaller number of actors: the equipment manufacturers.

The security specification to which equipment must comply is jointly owned and managed by the six major Hollywood studios, whose content generates a substantial share of box office for cinemas. The joint specification is published by Digital Cinema Initiatives (DCI), a consortium of the six major studios. DCI maintains a website for its Digital Cinema System Specification (DCSS) and Compliance Test Plan (CTP) at

Digital cinema security utilizes a combination of symmetric key encryption and private key infrastructure, incorporating the principles of openness and autonomy. Openness ensures that the system can be built by anyone skilled in the art, based on royalty-free and license-free standards and specifications. Autonomy means that securely distributed content can be played without the need for active connection of the playback equipment to an outside network.

The principle of autonomy requires that the playback equipment manage its own trustworthiness. For this reason, DCI-compliant equipment is tamperproof by design, eliminating the need for active verification. Each certified playback device has at least one Digital Cinema Certificate, recognized by DCI as a declaration the device meets the DCI Specification.

The digital cinema trust model allows for closure of this open loop of trust by means of the Digital Cinema Security Log. The log establishes a “Circle of Trust” workflow model, as illustrated below.


Figure TM-2. The DCI Model of Control Lightly/Audit Tightly

Content owners publish the DCI Specification, which is the basis for DCI Compliance Testing of digital cinema equipment. Certified devices carry a digital public-key certificate, whose serial number and public key are recorded in a database commonly referred to as a “Trusted Device List,” or TDL. The TDL also records where the certified device has been installed. When content is encrypted, the encryption key is distributed in encrypted form as a Key Delivery Message, or KDM. The exhibitor receives both the encrypted content and the KDM, enabling the content to play. At the time of playout, a Security Log is generated by the DCI-compliant playback system, which can then be viewed by the content owner to validate that its content was played by DCI-compliant equipment. The DCI specification refers to this closed loop process of managing trust as “control lightly/audit tightly.” (See DCI, DCSS section Logging System.)

In practice, it is often sufficient for rights owners to trust the public keys made available to them by equipment manufacturers. For an exhibitor to receive the proper keys that enables content to play, it must report the equipment in its possession, completing the TDL.

To summarize, the digital cinema trust model is managed within an ecosystem that does not require active verification of playback equipment. Equipment manufacturers establish trust in their equipment through the certification process established by DCI. Content owners express trust in an exhibitor’s playback equipment when creating Key Delivery Messages (KDMs) based on Trusted Device Lists (TDLs). In addition, content owners can close the trust loop through the examination of equipment Security Logs.

]]> 0
Encryption Fri, 09 Dec 2016 07:00:46 +0000

Digital Cinema Security
Digital cinema makes use of two types of common cryptography: symmetric-key cryptography, and asymmetric-key cryptography, also known as public-key cryptography. This section provides a review of encryption basics, leading to the application of encryption for digital cinema content and keys.

Encryption Basics

There are two classifications of cryptographic algorithms: symmetric and asymmetric. Symmetric-key cryptography uses the same key to lock and unlock data. This concept is illustrated below, where the red key represents a symmetrical key.

Symmetrical Keys

Figure EM-1. Symmetrical Keys

Asymmetric cryptography requires two non-identical, but mathematically linked, keys, comprising a key pair. These are often referred to as public and private keys. The concept is illustrated below, where the public key is the green key, and the private key (in practice, hidden from prying eyes) is the blue key.

Asymmetrical Keys

Figure EM-2. Asymmetrical Keys

Digital cinema content uses a combination of symmetric-key and asymmetric-key cryptography, where the content is encrypted once, distributed widely, but only authorized equipment can play it. In the cinema application, playback devices contain a Media Block designed to securely play the content. The encryption scheme accomplishes several goals:

  • Simple in concept
  • Efficient, where large amounts of data are encrypted once (Track Files), and only small amounts of data are encrypted for each playback device (KDMs)
  • Revocable at the device level (Note: this is further discussed in the section on Trusted Device List)

Public and Private Keys in Digital Cinema

Figure EM-3. Public and Private Keys in Digital Cinema

Security Key Workflow

In the digital cinema workflow, content that the owner chooses to protect is encrypted using a symmetrical key. To secure the symmetrical key, it must also be encrypted. This is accomplished by encrypting the symmetrical key using the public key of the target playback device (a Media Block). Of course, this method requires that the content distributor knows the target device’s public key. In practice, distributors build and maintain a Trusted Device List (TDL), matching device certificates containing public keys with the location of trusted devices. (See section on Trusted Device List).

(Note: The term Trusted Device List as used here is a database maintained by or for content owners, and is not the Trusted Device List element found in the KDM.)

A DCI-compliant digital cinema playback system never stores encrypted picture and sound content in unencrypted form for playback at a later time. By specification, the system always decrypts content in real-time at the time of play. Accordingly, the KDM carries a date-time window condition that dictates when the content can be decrypted and played. (See the KDM section for more details.)

The illustration below ties together the elements of the content encryption model. The key for decrypting content in the cinema is carried in encrypted form by a KDM. The KDM is created by a trusted party only for an authorized player, simultaneously authorizing and expressing trust in the player. At the player, the KDM enables the player to decrypt the content in real-time as it plays.

Security Key Workflow in Digital Cinema

Figure EM-4. Digital Cinema Security Key Workflow

]]> 0