The Ogg container format explained

Xiph.Org's logo

Xiph.Org's logo

Ogg bitstreams have become increasingly prominent in recent years, yet Ogg is still a commonly misunderstood format. As the creator of ruby-ogg I have spent some time deciphering Xiph.Org’s technical documentation, and now I have decided to pass some of that knowledge on to you, dear reader (of course, you could read the specification yourself!).

Ogg isn’t Vorbis

The popularity of “.ogg” audio files has lead to much confusion over what Ogg actually is. In such audio files, the music data is compressed according to the Vorbis specification and then stored in an Ogg bitstream. The Ogg container format itself is independent of content type – it can happily hold video, audio and even text. In fact, Vorbis-encoded music doesn’t need to be stored within an Ogg bitstream either, but the Ogg-Vorbis combination is especially common due to the fact that the Xiph.Org Foundation created both of them. For the insanely small file sizes and excellent audio quality you can thank the Vorbis guys. For the ability to stream, seek and recover from the partial corruption of songs you can thank the good folks who work on Ogg.

Let’s get technical

OK, so now that we’ve firmly established that Ogg is a container format independent of content type, we can move on to the technical details.

It is assumed that data encapsulated by an Ogg bitstream is broken into smaller chunks called packets. The semantics of a packet varies – in the case of Vorbis a single packet may contain metadata (the song title, artist and so forth), decoding information or a snippet of audio data. In order to store these packets, the Ogg bitstream divides things up into pages. Each page can contain part of a packet, a whole packet or even multiple packets. The diagram below may make it easier to visualise the way in which things may be stored:

Graphical representation of a sample Ogg file

Graphical representation of a sample Ogg file

In the example above, page #1 contains two complete packets, packet #3 spans both pages #2 and #3 and packet #4 spans pages #3 and #4.

Reading pages

Since there is no “index” to an Ogg bitstream, the header of each page must provide enough information to enable reading and validation of the encapsulated data. The advantage of this is that if you begin reading at any point in the stream it is possible to detect the next page header and read the ensuing data. Not only does this enable the recovery from an encounter with a corrupted page, but it is also theoretically enables the commencement of decoding halfway through a stream. Here is a summary of the process for page detection:

  1. Read from the stream until the 4-byte sequence representing the ASCII string ‘OggS’ is encountered.
  2. Read the following data as an Ogg page header.
  3. Compare the checksum field with a cyclic redundancy check (CRC) on the entire page with the checksum field set to zero.
  4. If the CRC is successful, we have found a valid, uncorrupted Ogg page. Otherwise, seek back to just after the ‘OggS’ and return to step 1.

The view from a height

For Rubyists, ruby-ogg provides a nice abstraction from the entire concept of pages, simply allowing you to call a “read_packet” method. This does the page detection, optional checksum validation (CRCs are slow in Ruby) and joins packet segments that span multiple pages.

Advertisements
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: