write file format
this topic describes the binary file format used by microsoft write. a write binary file contains information about file content, text and pictures (including object-linking-and-embedding, or ole, objects), and formatting. write-file header the write-file header describes the content of the file. it contains data, pointers to subdivisions of the formatting section, and information about the length of the file. the file header has the following form: word name description 0 wident must be 0137061 octal (or 0137062 octal if the file contains ole objects) 1 dty must be zero 2 wtool must be 0125400 octal 3 reserved; must be zero 4 reserved; must be zero 5 reserved; must be zero 6 reserved; must be zero 7-8 fcmac number of bytes of actual text plus 128, the bytes in one sector (low-order word first) 9 pnpara page number for start of paragraph information 10 pnfntb page number of footnote table (fntb) or pnsep, if none 11 pnsep page number of section property (sep) or pnsetb, if none 12 pnsetb page number of section table (setb) or pnpgtb, if none 13 pnpgtb page number of page table (pgtb) or pnffntb, if none 14 pnffntb page number of font face-name table (ffntb) or pnmac, if none 15-47 szssht reserved for microsoft word compatibility 48 pnmac count of pages in whole file (last page number plus 1) in the preceding list, a "page number" means an offset in 128-byte blocks from the start of the file. for example, if pnpara equals 10, the paragraph information is at offset 10*128 = 1280 in the file. the starting page number of character information (pnchar) is not stored but is computable, as follows: pnchar = (fcmac + 127) / 128 examining the value of word 48 of the header is a good way to distinguish write files from microsoft word files. if pnmac equals zero, the file originated in word. any other value identifies a write file. text and pictures after the header comes information about text and pictures. this information constitutes a separate section of the file. text the text of the write file starts at word 64 (page 1). write uses the windows character set (except for the pictures in the file) as well as the following special characters: ascii character codes 13, 10 (carriage return, linefeed) for paragraph ends. no other occurrences of these two characters are allowed. ascii character code 12 for explicit page breaks. ascii character code 9 (normal) for tab characters. other line-break or wordwrap information is not stored. pictures pictures (including ole objects) are stored as a sequence of bytes in the text stream. these bytes can be identified as picture information by examining their paragraph formatting. one picture is exactly one paragraph. paragraphs that are pictures have a special bit set in their paragraph property (pap) structure. for more information on the pap structure, see section 8.3, "formatting." each picture consists of a descriptive header followed by the data that makes up the picture. the header for ole objects is different from the one used for pictures. the picture header has the following form: byte name description 0-7 mfp windows metafilepict structure (hmf member undefined) 8-9 dxaoffset offset of picture from left margin, in twips (1/1440 inch) 10-11 dxasize horizontal size, in twips 12-13 dyasize vertical size, in twips 14-15 cboldsize number of following bytes (actual metafile or bitmap bits); set to zero 16-29 bm additional information for bitmaps only 30-31 cbheader number of bytes in this header 32-35 cbsize number of following bytes (actual metafile or bitmap bits), replacing cboldsize for new files 36-37 mx scaling factor (x) 38-39 my scaling factor (y) 40? cbheader picture contents, through cbheader+cbsize? the mm member (bytes 0-1) of the metafilepict structure specifies the mapping mode used to draw the picture. the last set of bytes will be bitmap bits if the value of the mm member is 0xe3. this is a special value used only in write. otherwise, the bytes will be metafile contents. if the picture has never been rescaled with the size picture command in write, the scaling factors in each direction will be 1000 (decimal). if the picture has been resized, the scaling factor will be the percentage of the original size that the picture is now, relative to 1000 (100 per cent). for information about the metafilepict structure and bitmaps, see the microsoft windows guide to programming and the microsoft windows programmer's reference, volumes 1 and 3. the descriptive header for ole objects is similar to the one used for pictures. the ole object header has the following form: byte name description 0-1 mm must be 0xe4 2-5 not used 6-7 objecttype type: 1=static, 2=embedded, 3=link 8-9 dxaoffset offset of picture from left margin, in twips (1/1440 inch) 10-11 dxasize horizontal size, in twips 12-13 dyasize vertical size, in twips 14-15 not used 16-19 dwdatasize number of bytes in the object data that follows the header 20-23 not used 24-27 dwobjnum hexadecimal number that, when converted to an 8-digit string, represents the object's unique name 28-29 not used 30-31 cbheader number of bytes in this header 32-35 not used 36-37 mx scaling factor (x) 38-39 my scaling factor (y) 40? cbheader object contents, through cbheader+dwdatasize? the scaling factors for ole objects work the same way as they do with pictures. formatting write files contain both character and paragraph formatting information. there can be no gaps in either; each must begin with the first text character (byte 128) and continue through the last. the format descriptors (fods) for the first and last paragraph must, therefore, have the value of fclim equal to the value of fcmac, as defined in the header section. there is a difference between paragraph and character fods. a character fod may describe any number of consecutive characters with the same formatting. however, there must be exactly one paragraph fod for each text paragraph. in either case, it is advisable to have multiple fods point to the same formatting properties (fprops) on a given page because it saves space in the file. no fod may point off its page. characters and paragraphs both the character and paragraph sections are structured as a set of pages. each page contains an array of fods and a group of fprops, both of which are described later in this section. following is the format of a page: byte name description 0-3 fcfirst byte number of first character covered by this page of formatting information; equals 128 for first character in the text (low-order byte first) 4–n rgfod array of fods n+1-126 grpfprop group of fprops 127 cfod number of fods on this page an fod is fixed in size. it contains the byte offset to the corresponding fprop. following is the structure of an fod: word name description 0-1 fclim byte number after last character covered by this fod 2 bfprop byte offset from beginning of fod array to corresponding fprop for these characters or this paragraph an fprop is variable in size. it contains the prefix for a character property (chp) or paragraph property (pap), both of which are described later in this section. following is the structure of an fprop: byte name description 0 cch number of bytes in this fprop 1–n rgchprop prefix for a chp (for characters) or a pap (for paragraphs) sufficient to include all bits that differ from the default chp or pap following is the format of a chp: byte bit name description 0 reserved; ignored by write 1 0 fbold bold characters 1 fitalic italic characters 2-7 ftc font code (low bits); index into the ffntb 2 hps size of font, in half points (standard is 24) 3 0 fuline underlined characters 1 fstrike reserved; ignored by write 2 fdline reserved; ignored by write 3 foverset reserved; ignored by write 4-5 csm reserved; ignored by write 6 fspecial set for "(page)" only 7 reserved; ignored by write 4 0-2 ftcxtra font code (high-order bits, concatenated with ftc) 3 foutline reserved; ignored by write 4 fshadow reserved; ignored by write 5-7 reserved; ignored by write 5 hpspos position: 0=normal, 1-127=superscript, 128-255=subscript if the user doesn't select any special character properties, the chp is filled with the following default values: byte value 0 1 2 24 3-5 0 each character fprop must, therefore, have a count of characters (cch) greater than or equal to 1. each pap can contain up to 14 tab descriptors (tbds), which are described later in this section. following is the structure of a pap: byte bit name description 0 reserved; must be zero 1 0-1 jc justification: 0=left, 1=center, 2=right, 3=both 2-7 reserved; must be zero 2 reserved; must be zero 3 reserved; must be zero 4-5 dxaright right indent, in 20ths of a point 6-7 dxaleft left indent, in 20ths of a point 8-9 dxaleft1 first-line left indent (relative to dxaleft) 10-11 dyaline interline spacing (standard is 240) 12-13 dyabefore reserved; ignored by write (standard is zero) 14-15 dyaafter reserved; ignored by write (standard is zero) 16 0 rhcpage 0=header, 1=footer 1-2 reserved; 0=normal paragraph, nonzero=header or footer paragraph 3 rhcfirst start of printing: 1=print on first page, 0=do not print on first page 4 fgraphics paragraph type: 1=picture, 0=text 5-7 reserved; must be zero 17-21 reserved; must be zero 22-78 tab descriptors (up to 14) following is the format of a tbd: byte bit name description 0-1 dxa indent from left margin of tab stop, in 20ths of a point 2 0-2 jctab tab type: 0=normal tabs, 3=decimal tabs 3-5 tlc reserved; ignored by write 6-7 reserved; must be zero 3 chalign reserved; ignored by write if the user doesn't select any special paragraph properties, the pap is filled with the following default values: byte value 0 61 2 30 10-11 240 (word) 12-78 0 each paragraph fprop must have a count of characters (cch) greater than or equal to 1. footnotes write documents do not have footnote tables (fntbs), so pnfntb is always equal to pnsep. in fact, all their header and footer paragraphs appear at the beginning of the document before any normal paragraphs. when reading files created by word, write recognizes only those headers and footers that appear at the beginning of the document; it treats all others as normal text. sections a write document has only one section. if the section properties of a write document differ from the defaults, the document contains a section property (sep) section and a section table (setb) section. if not, then neither section is present and pnsep and pnsetb are both equal to pnpgtb. following is the format of an sep: byte name description 0 cch count of bytes used, excluding this byte (all properties at byte positions greater than cch are set to their default values) 1-2 reserved; must be zero 3-4 yamac page length, in 20ths of a point (default is 11*1440=15840) 5-6 xamac page width, in 20ths of a point (default is 8.5*1440=12240) 7-8 reserved; must be 0xffff 9-10 yatop top margin, in 20ths of a point (default is 1440) 11-12 dyatext height of text, in 20ths of a point (default is 9*1440=12960) 13-14 xaleft left margin, in 20ths of a point (default is 1.25*1440=1800) 15-16 dxatext width of text area, in 20ths of a point (default is 6*1440=8640) the page length (yamac) is equal to yatop+dyatext. the page width (xamac) is equal to xaleft+dxatext+(right margin, not stored). if all the above properties are set to their defaults, no sep or setb is needed. otherwise, the count of characters (cch) is greater than or equal to 1 and less than or equal to 16. the setb section contains an array of section descriptors (seds), described later in this section. following is the structure of an setb: word name description 0 csed number of sections (always 2 for write documents) 1 csedmax undefined 2–n rgsed array of seds plus zero-padding to fill the sector following is the structure of an sed: word name description 0-1 cp byte address of first character following section 2 fn undefined 3-4 fcsep byte address of associated sep a write document always has exactly two sed entries. the cp value of the first entry indicates that it affects all the characters in the document. the fcsep value of the first entry points to the one sep in the file. the second sed entry is a dummy with fcsep set to 0xffffffff. the pgtb section (optional) is on the page immediately after the sep section. note: the term "page" used in the rest of this section refers to printed pages of a write document, not 128-byte "pages" of a disk file. the page table (pgtb) contains an array of page descriptors (pgds), which are described later in this section. following is the structure of a pgtb: word name description 0 cpgd number of pgds (1 or more) 1 cpgdmac undefined 2–n rgpgd array of pgds plus zero padding to fill the sector following is the structure of a pgd: word name description 0 pgn page number in printed word documents 1-2 cpmin byte address of first character on printed page font table the font face-name table (ffntb) contains the number of font face names (ffns) and a list of ffns. following is the structure of an ffntb: byte name description 0-1 cffn number of ffns 2–n grpffn list of ffns following is the structure of an ffn: byte name description 0-1 cbffn number of bytes following in this ffn (not including these 2 bytes) 2 ffid font family identifier (see below) 3?cbffn+2) szffn font name (variable length; null-terminated) a cbffn value of 0xffff means that the next ffn entry will be found at the start of the next 128-byte page. a cbffn value of zero means that there are no more ffn entries in the table. possible values for ffid are ff_dontcare, ff_roman, ff_swiss, ff_modern, ff_script, and ff_decorative. these constants are defined in windows.h. additional values may be added to the list in future versions of windows. |