cd-rom technical summary
from plastic pits to "fantasia"
this summary describes how information is encoded on compact disc (cd)
beginning with the physical pits and going up through higher levels of
data encoding to the structured multimedia information that is
possible with programs like hypercard. this discussion is much
broader than any single standards document, e.g. the cd-audio red
book, while omitting much of the detail needed only by drive
1. high information density -- with the density achievable using
optical encoding, the cd can contain some 540 megabytes of data on a
disc less than five inches in diameter.
2. low unit cost -- because cds are manufactured by a well-developed
process similar to that used to stamp out lp records, unit cost in
large quantities is less than two dollars.
3. read only medium -- cd-rom is read only; it cannot be written on
or erased. it is an electronic publishing, distribution, and access
medium; it cannot replace magnetic disks.
4. modest random access performance -- due to optical read head mass
and data encoding methods, random access ("seek time") performance of
cd is better than floppies but not as good as magnetic hard disks.
5. robust, removable medium -- the cd itself is comprised mostly of,
and completely coated by, durable plastic. this fact and the data
encoding method allow the cd to be resistant to scratches and other
handling damage. media lifetime is expected to be long, well beyond
that of magnetic media such as tape. in addition, the optical servo
scanning mechanism allows cds to be removed from their drives.
6. multimedia storage -- because all cd data is stored digitally, it
is inherently multimedia in that it can store text, images, graphics,
sound, and any other information expressed in digital form. its only
limit in this area is the rate at which data can be read from the
disc, currently about 150 kbytes/second. this is sufficient for all
but uncompressed, full motion color video.
cd data hierarchy
storing data on a cd may be thought of as occurring through a data
encoding hierarchy with each level built upon the previous one. at
the lowest level, data is physically stored as pits on the disc. it
is actually encoded by several low-level mechanisms to provide high
storage density and reliable data recovery. at the next level, it
organized into tracks which may be digital audio or cd-rom. the high
sierra specification then defines a file system built on cd-rom
tracks. finally, applications like hypercard specify a content format
the physical medium
the compact disc itself is a thin plastic disk some 12 cm. in
diameter. information is encoded in a plastic-encased spiral track
contained on the top of the disk. the spiral track is read optically
by a noncontact head which scans approximately radially as the disk
spins just above it. the spiral is scanned at a constant linear
velocity thus assuring a constant data rate. this requires the disc
to rotate at a decreasing rate as the spiral is scanned from its
beginning near the center of the disc to its end near the disc
the spiral track contains shallow depressions, called pits, in a
reflective layer. binary information is encoded by the lengths of
these pits and the lengths of the areas between them, called land.
during reading, a low power laser beam from the optical head is
focused on the spiral layer and is reflected back into the head. due
to the optical characteristics of the plastic disc and the wavelength
of light used, the quantity of reflected light varies depending on
whether the beam is on land or on a pit. the modulated, reflected
light is converted to a radio frequency, raw data signal by a
photodetector in the optical head.
low-level data encoding
to ensure accurate recovery, the disc data must be encoded to optimize
the analog-to-digital conversion process that the radio frequency
signal must undergo. goals of the low level data encoding include:
1. high information density. this requires encoding that makes the
best possible use of the high, but limited, resolution of the laser
beam and read head optics.
2. minimum intersymbol interference. this requires making the
minimum run length, i.e. the minimum number of consecutive zero bits
or one bits, as large as possible.
3. self-clocking. to avoid a separate timing track, the data should
be encoded so as to allow the clock signal to be regenerated from the
data signal. this requires limiting the maximum run length of the
data so that data transitions will regenerate the clock.
4. low digital sum value (the number of one bits minus the number of
zero bits). this minimizes the low frequency and dc content of the
data signal which permits optimal servo system operation.
a straightforward encoding would be to simply to encode zero bits as
land and one bits as pits. however, this does not meet goal (1) as
well as the encoding scheme actually used. the current cd scheme
encodes one bits as transitions from pit to land or land to pit and
zero bits as constant pit or constant land.
to meet goals (2) to (4), it is not possible to encode arbitrary
binary data. for example, the integer 0 expressed as thirty-two bits
of zero would have too long a run length to satisfy goal (3). to
accommodate these goals, each eight-bit byte of actual data is encoded
as fourteen bits of channel data. there are many more combinations of
fourteen bits (16,384) than there are of eight bits (256). to encode
the eight-bit combinations, 256 combinations of fourteen bits are
chosen that meet the goals. this encoding is referred to as
eight-to-fourteen modulation (efm) coding.
if fourteen channel bits were concatenated with another set of
fourteen channel bits, once again the above goals may not be met. to
avoid this possibility, three merging bits are included between each
set of fourteen channel bits. these merging bits carry no information
but are chosen to limit run length, keep data signal dc content low,
etc. thus, an eight bit byte of actual data is encoded into a total
of seventeen channel bits: fourteen efm bits and three merging bits.
to achieve a reliable self-clocking system, periodic synchronization
is necessary. thus, data is broken up into individual frames each
beginning with a synchronization pattern. each frame also contains
twenty-four data bytes, eight error correction bytes, a control and
display byte (carrying the subcoding channels), and merging bits
separating them all. each frame is arranged as follows:
sync pattern 24 + 3 channel bits
control and display byte 14 + 3
data bytes 12 * (14 + 3)
error correction bytes 4 * (14 + 3)
data bytes 12 * (14 + 3)
error correction bytes 4 * (14 + 3)
total 588 channel bits
thus, 192 actual data bits (24 bytes) are encoded as 588 channel bits.
editorial: a cd physically has a single spiral track about 3 miles
long. cds spin at about 500 rpm when reading near the center down to
about 250 rpm when reading near the circumference.
disc with a 'c' or disk with a 'k'? a usage has emerged for these
terms: disk is used for eraseable disks (e.g. magnetic disks) while
disc is used for read-only (e.g. cd-rom discs). one would presumably
call a frisbee a disc.
first level error correction
data errors can arise from production defects in the disk itself,
defects arising from subsequent damage to the disk, or jarring during
reading. a significant characteristic of these errors is that they
often occur in long bursts. this could be due, for example, to a
relatively wide mark on the disc that is opaque to the laser beam used
to read the disc. a system with two logical components called the
cross interleave reed-solomon coding (circ) is employed for error
correction. the cross interleave component breaks up the long error
bursts into many short errors; the reed-solomon component provides the
as each frame is read from the disc, it is first decoded from fourteen
channel bits (the three merging bits are ignored) into eight-bit data
bytes. then, the bytes from each frame (twenty-four data bytes and
eight error correction bytes) are passed to the first reed-solomon
decoder which uses four of the error correction bytes and is able to
correct one byte in error out of the 32. if there are no
uncorrectable errors, the data is simply passed along. if there are
errors, the data is marked as being in error at this stage of
the twenty-four data bytes and four remaining error correction bytes
are then passed through unequal delays before going through another
reed-solomon decoder. these unequal delays result in an interleaving
of the data that spreads long error bursts among many different passes
through the second decoder. the delays are such that error bursts up
to 450 bytes long can be completely corrected. the second
reed-solomon decoder uses the last four error correction bytes to
correct any remaining errors in the twenty-four data bytes. at this
point, the data goes through a de-interleaving process to restore the
correct byte order.
subcoding channels and blocks
the eight-bit control and display byte in each frame carries the
subcoding channels. a subcoding block consists of 98 subcoding bytes,
and thus 98 of the 588-bit frames. a block then can contain 2352
bytes of data. seventy-five blocks are read each second. with this
information, it is now straightforward to calculate that the cd data
rate is in fact correct for cd digital audio (cd-da):
required cd digital audio data rate: 44.1 k samples per second * 16
bits per sample * 2 channels = 1,411,200 bits/sec.
cd data rate: 8 bits per byte * 24 bytes per frame * 98 frames per
subcoding block * 75 subcoding blocks per second = 1,411,200 bits/sec.
the eight subcoding channels are labeled p through w and are encoded
one bit for each channel in a control and display byte. channel p is
used as a simple music track separator. channel q is used for control
purposes and encodes information like track number, track type, and
location (minute, second, and frame number). during the lead-in track
of the disc, channel q encodes a table of contents for the disk giving
track number and starting location. standards have been proposed that
would use the remaining channels for line graphics and ascii character
strings, but these are seldom used.
tracks can have two types as specified in the control bit field of
subchannel q. the first type is cd digital audio (cd-da) tracks. the
two-channel audio is sampled at 44.1 khz with sixteen bit linear
sampling encoded as twos complement numbers. the sixteen bit samples
are separated into two eight-bit bytes; the bytes from each channel
alternate on the disc. variations for audio tracks include
pre-emphasis and four track recording.
the other type of track specified by the subchannel q control bit
field is the data track. these must conform to the cd-rom standard
described below. in general, a disc can have a mix of cd digital
audio tracks and a cd-rom track, but the cd-rom track must come first.
editorial: this first level error correction (the only type used for
cd audio data) is extremely powerful. the cd specification allows for
discs to have up to 220 raw errors per second. every one of these
errors is (almost always) perfectly corrected by the circ scheme for a
net error rate of zero. for example, our tests using apple's cd-rom
drive (which also plays audio) show that raw error rates are around
50-100 per second these days. of course, these are perfectly
corrected, meaning that the original data is perfectly recovered. we
have tested flawed discs with raw rates up to 300 per second. net
errors on all of these discs? zero! i would expect a typical audio
cd player to perform similarly. thus i expect this raw error rate to
have no audible consequences.
so why did i say "almost always" corrected above? because a
sufficiently bad flaw may produce uncorrectable errors. these very
unusual errors are "concealed" by the player rather than corrected.
note that this concealment is likely to be less noticeable than even a
single scratch on an lp. such a flaw might be a really opaque finger
smudge; cds do merit careful handling. on the two (and only two)
occasions i have found these, i simply sprayed on a little windex
glass cleaner and wiped it off using radial strokes. this restored
the cds to zero net errors.
one can argue about the quality of the process of conversion of analog
music to and from digital representation, but in the digital domain
cds are really very, very good.
cd-rom data tracks
each cd-rom data track is divided into individually addressable blocks
of 2352 data bytes, i.e. one subcoding block or 98 frames. a header
in each block contains the block address and the mode of the block.
the block address is identical to the encoding of minute, second, and
frame number in subcode channel q. the modes defined in the cd-rom
mode 0 -- all data bytes are zero.
mode 1 -- (cd-rom data):
sync field - 12 bytes
header field - 4
user data field - 2048
error detection code - 4
reserved - 8
error correction - 276
mode 2 -- (cd audio or other data):
sync field - 12 bytes
header field - 4
user data field - 2048
auxiliary data field - 288
thus, mode 1 defines separately addressable, physical 2k byte data
blocks making cd-rom look at this level very similar to other digital
mass storage devices.
second level error correction
an uncorrected error in audio data typically results in a brief, often
inaudible click during listening at worst. an uncorrected error in
other kinds of data, for example program code, may render a cd
unusable. for this reason, cd-rom defines a second level of error
detection and error correction (edc/ecc) for mode 1 data. the
information for the edc/ecc occupies most of the auxiliary data field.
the error detection code is a cyclic redundancy check (crc) on the
sync, header, and user data. it occupies the first four bytes of the
auxiliary data field and provides a very high probability that
uncorrected errors will be detected. the error correction code is
essentially the same as the first level error correction in that
interleaving and reed-solomon coding are used. it occupies the final
276 bytes of the auxiliary data field.
editorial: this extra level of error correction for cd-rom blocks is
one of the many reasons that cd-rom drives are much more expensive
than consumer audio players. to perform this error correction quickly
requires substantial extra computing power (sometimes a dedicated
microprocessor) in the drive.
this is also one reason that consumer players like the magnavoxes
which claim to be cd-rom compatible (with their digital output jack on
the back) are useless for that purpose. they have no way of dealing
with the cd-rom error correction. they also have no way for a
computer to tell them where to seek.
another reason that cd-rom drives are more expensive is that they are
built to be a computer peripheral rather than a consumer device, i.e.
like a combination race car/truck rather than a family sedan. one
story, probably apocryphal but not far from the truth, has it that a
major japanese manufacturer tested some consumer audio players to
simulate computer use: they made them seek (move the optical head)
from the inside of the cd to the outside and back again. these are
called maximum seeks. the story says they managed to do this for
about 24 hours before they broke down. a cd-rom drive needs to be
several orders of magnitude more robust. fast and strong don't come
the high sierra file system standard
built on top of the addressable 2k blocks that the cd-rom
specification defines, the next higher level of data encoding is a
file system that permits logical organization of the data on the cd.
this can be a native file system like the macintosh hierarchical file
system (hfs). another alternative is the high sierra (also known as
the iso 9660) file standard, recently approved by the national
information standards organization (niso) and the international
standards organization (iso), which defines a file system carefully
tuned to cd characteristics. in particular:
1. cds have modest seek time and high capacity. as a result, the
high sierra standard makes tradeoffs that reduce the number of seeks
needed to read a file at the expense of space efficiency.
2. cds are read-only. thus, concerns like space allocation, file
deletion, and the like are not addressed in the specification.
for high sierra file systems, each individual cd is a volume. several
cds may be grouped together in a volume set and there is a mechanism
for subsequent volumes in a set to update preceding ones. volumes can
contain standard file structures, coded character set file structures
for character encoding other than ascii, or boot records. boot
records can contain either data or program code that may be needed by
systems or applications.
high sierra directories and files
the file system is a hierarchical one in which directories may contain
files or other directories. each volume has a root directory which
serves as an ancestor to all other directories or files in the volume.
this dictates an overall tree structure for the volume.
a typical disadvantage in hierarchical systems is that to read a file
(which must be a leaf of the hierarchy tree) given its full path name,
it is necessary to begin at the root directory and search through each
of its ancestral directories until the entry for the file is found.
for example, given the path name
three directories (the first three components of the path name) would
need to be searched. typically, a separate seek would be required for
each directory. this would result in relatively poor performance.
to avoid this, high sierra specifies that each volume contain a path
table in addition to its directories and files. the path table
describes the directory hierarchy in a compact form that may be cached
in computer memory for optimum performance. the path table contains
entries for the volume's directories in a breadth-first order;
directories with a common parent are listed in lexicographic order.
each entry contains only the location of the directory it describes,
its name, and the location in the path table of its parent. this
mechanism allows any directory to be accessed with only a single cd
directories contain more detailed information than the path table.
each directory entry contains:
directory or file location.
date and time of creation.
name of the file.
whether the entry is for a file or a directory.
whether or not it is an associated file.
whether or not it has records.
whether or not it has read protection.
whether or not it has subsequent extents.
interleave structure of the file.
interleaving may be used, for example, to meet realtime requirements
for multiple files whose contents must be presented simultaneously.
this would happen if a file containing graphic images were interleaved
with a file containing compressed sound that describes the images.
files themselves are recorded in contiguous (or interleaved) blocks on
the disc. the read-only nature of cd permits this contiguous
recording in a straightforward manner. a file may also be recorded in
a series of noncontiguous extents with a directory entry for each
the specification does not favor any particular computer architecture.
in particular all significant, multibyte numbers are recorded twice,
once with the most significant byte first and once with the least
significant byte first.
using the file system are applications that create and portray
multimedia information. while it is true that a cd can store anything
that a magnetic disk can store (and usually much more of it), cds will
be used more for storing information than for storing programs. it is
the very large storage capacity of cds coupled with their low cost
that opens up the possibilities for interactive, multimedia
information to be used in a multitude of ways.
programs like hypercard, with it's ease of authoring and broad
extensibility, are very useful for this purpose. hypercard stacks,
with related information such as color images and sound, can be easily
and inexpensively stored on cds despite their possibly very large
editorial: the high sierra file system gets its name from the location
of the first meeting on it: the high sierra hotel at lake tahoe. it
is much more commonly referred to as iso 9660, though the two
specifications are slightly different.
it has gotten very easy and inexpensive to make a cd-rom disc (or
audio cd). for example, you can now take a macintosh hard disk and
send it with $1500 to one of several cd pressers. they will send you
back your hard disk and 100 cds with exactly the same content as
what's on your disk. this is the easy way to make cds with capacity
up to the size of your hard disk (apple's go up to 160 megabytes).
true, this is not a full cd but cds don't need to be full. if you
have just 10 megabytes and need 100 copies, cds may be the best way to
if you are buying a cd-rom drive, there are several factors you might
consider in making your choice. two factors not to consider are
capacity and data rate. the capacity of all cd-rom drives is
determined solely by the cd they are reading. though you will see a
range of numbers in manufacturers' specs (e.g. 540, 550, 600, and 650
mbytes), any drive can read any disc and so they are all fundamentally
the same. all cd-rom drives read data at a net 150 kbytes/sec for
cd-rom data. other data rates you may see may include error
correction data (not included in the net rate) or may be a mode 2 data
rate (faster than mode 1). all drives will be the same in all of
end of article.