write file format



this topic describes the binary file format used by microsoft write. a write binary file contains information about file content, text
and pictures (including object-linking-and-embedding, or ole, objects), and formatting.

write-file header

the write-file header describes the content of the file. it contains data, pointers to subdivisions of the formatting section, and
information about the length of the file. the file header has the following form:

word    name    description

0    wident    must be 0137061 octal (or 0137062 octal if the file contains ole objects)
1    dty    must be zero
2    wtool    must be 0125400 octal
3        reserved; must be zero
4        reserved; must be zero
5        reserved; must be zero
6        reserved; must be zero
7-8    fcmac    number of bytes of actual text plus 128, the bytes in one sector (low-order word first)
9    pnpara    page number for start of paragraph information
10    pnfntb    page number of footnote table (fntb) or pnsep, if none
11    pnsep    page number of section property (sep) or pnsetb, if none

12    pnsetb    page number of section table (setb) or pnpgtb, if none
13    pnpgtb    page number of page table (pgtb) or pnffntb, if none
14    pnffntb    page number of font face-name table (ffntb) or pnmac, if none
15-47    szssht    reserved for microsoft word compatibility
48    pnmac    count of pages in whole file (last page number plus 1)

in the preceding list, a "page number" means an offset in 128-byte blocks from the start of the file. for example, if pnpara equals
10, the paragraph information is at offset 10*128 = 1280 in the file.
the starting page number of character information (pnchar) is not stored but is computable, as follows:
pnchar = (fcmac + 127) / 128
examining the value of word 48 of the header is a good way to distinguish write files from microsoft word files. if pnmac equals
zero, the file originated in word. any other value identifies a write file.

text and pictures

after the header comes information about text and pictures. this information constitutes a separate section of the file.

text

the text of the write file starts at word 64 (page 1). write uses the windows character set (except for the pictures in the file) as
well as the following special characters:

    ascii character codes 13, 10 (carriage return, linefeed) for paragraph ends. no other occurrences of these two characters are
allowed.

    ascii character code 12 for explicit page breaks.

    ascii character code 9 (normal) for tab characters.

other line-break or wordwrap information is not stored.

pictures

pictures (including ole objects) are stored as a sequence of bytes in the text stream. these bytes can be identified as picture
information by examining their paragraph formatting. one picture is exactly one paragraph. paragraphs that are pictures have a
special bit set in their paragraph property (pap) structure. for more information on the pap structure, see section 8.3,
"formatting."
each picture consists of a descriptive header followed by the data that makes up the picture. the header for ole objects is
different from the one used for pictures. the picture header has the following form:

byte    name    description

0-7    mfp    windows metafilepict structure (hmf member undefined)
8-9    dxaoffset    offset of picture from left margin, in twips (1/1440 inch)
10-11    dxasize    horizontal size, in twips
12-13    dyasize    vertical size, in twips
14-15    cboldsize    number of following bytes (actual metafile or bitmap bits); set to zero
16-29    bm    additional information for bitmaps only
30-31    cbheader    number of bytes in this header
32-35    cbsize    number of following bytes (actual metafile or bitmap bits), replacing cboldsize for new files

36-37    mx    scaling factor (x)
38-39    my    scaling factor (y)
40?    cbheader    picture contents, through cbheader+cbsize?

the mm member (bytes 0-1) of the metafilepict structure specifies the mapping mode used to draw the picture. the last set
of bytes will be bitmap bits if the value of the mm member is 0xe3. this is a special value used only in write. otherwise, the bytes
will be metafile contents.
if the picture has never been rescaled with the size picture command in write, the scaling factors in each direction will be 1000
(decimal). if the picture has been resized, the scaling factor will be the percentage of the original size that the picture is now, relative
to 1000 (100 per cent).
for information about the metafilepict structure and bitmaps, see the microsoft windows guide to programming and the
microsoft windows programmer's reference, volumes 1 and 3.

the descriptive header for ole objects is similar to the one used for pictures. the ole object header has the following form:

byte    name    description

0-1    mm    must be 0xe4
2-5        not used
6-7    objecttype    type: 1=static, 2=embedded, 3=link
8-9    dxaoffset    offset of picture from left margin, in twips (1/1440 inch)
10-11    dxasize    horizontal size, in twips
12-13    dyasize    vertical size, in twips
14-15        not used
16-19    dwdatasize    number of bytes in the object data that follows the header
20-23        not used
24-27    dwobjnum    hexadecimal number that, when converted to an 8-digit string, represents the object's unique
name
28-29        not used
30-31    cbheader    number of bytes in this header

32-35        not used
36-37    mx    scaling factor (x)
38-39    my    scaling factor (y)
40?    cbheader    object contents, through cbheader+dwdatasize?

the scaling factors for ole objects work the same way as they do with pictures.

formatting

write files contain both character and paragraph formatting information. there can be no gaps in either; each must begin with the
first text character (byte 128) and continue through the last. the format descriptors (fods) for the first and last paragraph must,
therefore, have the value of fclim equal to the value of fcmac, as defined in the header section.
there is a difference between paragraph and character fods. a character fod may describe any number of consecutive
characters with the same formatting. however, there must be exactly one paragraph fod for each text paragraph. in either case,
it is advisable to have multiple fods point to the same formatting properties (fprops) on a given page because it saves
space in the file. no fod may point off its page.

characters and paragraphs

both the character and paragraph sections are structured as a set of pages. each page contains an array of fods and a group
of fprops, both of which are described later in this section. following is the format of a page:

byte    name    description

0-3    fcfirst    byte number of first character covered by this page of formatting information; equals 128 for first
character in the text (low-order byte first)
4n    rgfod    array of fods
n+1-126    grpfprop    group of fprops
127    cfod    number of fods on this page

an fod is fixed in size. it contains the byte offset to the corresponding fprop. following is the structure of an fod:

word    name    description

0-1    fclim    byte number after last character covered by this fod
2    bfprop    byte offset from beginning of fod array to corresponding fprop for these characters or this paragraph

an fprop is variable in size. it contains the prefix for a character property (chp) or paragraph property (pap), both of which are
described later in this section. following is the structure of an fprop:

byte    name    description

0    cch    number of bytes in this fprop
1n    rgchprop    prefix for a chp (for characters) or a pap (for paragraphs) sufficient to include all bits that differ from
the default chp or pap

following is the format of a chp:

byte    bit    name    description

0            reserved; ignored by write
1    0    fbold    bold characters
    1    fitalic    italic characters
    2-7    ftc    font code (low bits); index into the ffntb
2        hps    size of font, in half points (standard is 24)
3    0    fuline    underlined characters
    1    fstrike    reserved; ignored by write
    2    fdline    reserved; ignored by write
    3    foverset    reserved; ignored by write
    4-5    csm    reserved; ignored by write
    6    fspecial    set for "(page)" only
    7        reserved; ignored by write
4    0-2    ftcxtra    font code (high-order bits, concatenated with ftc)
    3    foutline    reserved; ignored by write

    4    fshadow    reserved; ignored by write
    5-7        reserved; ignored by write
5        hpspos    position: 0=normal, 1-127=superscript, 128-255=subscript

if the user doesn't select any special character properties, the chp is filled with the following default values:

byte    value

0    1
2    24
3-5    0

each character fprop must, therefore, have a count of characters (cch) greater than or equal to 1.
each pap can contain up to 14 tab descriptors (tbds), which are described later in this section. following is the structure of a pap:

byte    bit    name    description

0            reserved; must be zero
1    0-1    jc    justification: 0=left, 1=center, 2=right, 3=both
    2-7        reserved; must be zero
2            reserved; must be zero
3            reserved; must be zero
4-5        dxaright    right indent, in 20ths of a point
6-7        dxaleft    left indent, in 20ths of a point
8-9        dxaleft1    first-line left indent (relative to dxaleft)
10-11        dyaline    interline spacing (standard is 240)
12-13        dyabefore    reserved; ignored by write (standard is zero)
14-15        dyaafter    reserved; ignored by write (standard is zero)

16    0    rhcpage    0=header, 1=footer
    1-2        reserved; 0=normal paragraph, nonzero=header or footer paragraph
    3    rhcfirst    start of printing: 1=print on first page, 0=do not print on first page
    4    fgraphics    paragraph type: 1=picture, 0=text
    5-7        reserved; must be zero
17-21            reserved; must be zero
22-78            tab descriptors (up to 14)

following is the format of a tbd:

byte    bit    name    description

0-1        dxa    indent from left margin of tab stop, in 20ths of a point
2    0-2    jctab    tab type: 0=normal tabs, 3=decimal tabs
    3-5    tlc    reserved; ignored by write
    6-7        reserved; must be zero
3        chalign    reserved; ignored by write

if the user doesn't select any special paragraph properties, the pap is filled with the following default values:

byte    value

0    61
2    30
10-11    240 (word)
12-78    0

each paragraph fprop must have a count of characters (cch) greater than or equal to 1.

footnotes

write documents do not have footnote tables (fntbs), so pnfntb is always equal to pnsep. in fact, all their header and footer
paragraphs appear at the beginning of the document before any normal paragraphs. when reading files created by word, write
recognizes only those headers and footers that appear at the beginning of the document; it treats all others as normal text.

sections

a write document has only one section. if the section properties of a write document differ from the defaults, the document
contains a section property (sep) section and a section table (setb) section. if not, then neither section is present and pnsep
and pnsetb are both equal to pnpgtb.
following is the format of an sep:

byte    name    description

0    cch    count of bytes used, excluding this byte (all properties at byte positions greater than cch are set to their
default values)
1-2        reserved; must be zero
3-4    yamac    page length, in 20ths of a point (default is 11*1440=15840)
5-6    xamac    page width, in 20ths of a point (default is 8.5*1440=12240)
7-8        reserved; must be 0xffff
9-10    yatop    top margin, in 20ths of a point (default is 1440)
11-12    dyatext    height of text, in 20ths of a point (default is 9*1440=12960)
13-14    xaleft    left margin, in 20ths of a point (default is 1.25*1440=1800)

15-16    dxatext    width of text area, in 20ths of a point (default is 6*1440=8640)

the page length (yamac) is equal to yatop+dyatext. the page width (xamac) is equal to xaleft+dxatext+(right margin, not
stored).
if all the above properties are set to their defaults, no sep or setb is needed. otherwise, the count of characters (cch) is greater
than or equal to 1 and less than or equal to 16.
the setb section contains an array of section descriptors (seds), described later in this section. following is the structure of an
setb:

word    name    description

0    csed    number of sections (always 2 for write documents)
1    csedmax    undefined
2n    rgsed    array of seds plus zero-padding to fill the sector

following is the structure of an sed:

word    name    description

0-1    cp    byte address of first character following section
2    fn    undefined
3-4    fcsep    byte address of associated sep

a write document always has exactly two sed entries. the cp value of the first entry indicates that it affects all the characters in the
document. the fcsep value of the first entry points to the one sep in the file. the second sed entry is a dummy with fcsep set to
0xffffffff.
the pgtb section (optional) is on the page immediately after the sep section.

note:    the term "page" used in the rest of this section refers to printed pages of a write document, not 128-byte "pages" of a disk
file.

the page table (pgtb) contains an array of page descriptors (pgds), which are described later in this section. following is the
structure of a pgtb:

word    name    description

0    cpgd    number of pgds (1 or more)
1    cpgdmac    undefined
2n    rgpgd    array of pgds plus zero padding to fill the sector

following is the structure of a pgd:

word    name    description

0    pgn    page number in printed word documents
1-2    cpmin    byte address of first character on printed page

font table

the font face-name table (ffntb) contains the number of font face names (ffns) and a list of ffns. following is the structure of
an ffntb:

byte    name    description

0-1    cffn    number of ffns
2n    grpffn    list of ffns

following is the structure of an ffn:

byte    name    description

0-1    cbffn    number of bytes following in this ffn (not including these 2 bytes)
2    ffid    font family identifier (see below)
3?cbffn+2)    szffn    font name (variable length; null-terminated)

a cbffn value of 0xffff means that the next ffn entry will be found at the start of the next 128-byte page. a cbffn value of zero
means that there are no more ffn entries in the table.
possible values for ffid are ff_dontcare, ff_roman, ff_swiss, ff_modern, ff_script, and ff_decorative.
these constants are defined in windows.h. additional values may be added to the list in future versions of windows.