~lucidiot's wiki

Microsoft Compound File Binary

Compound File Binary (CFB) is a file format designed by Microsoft as part of the COM API, as an implementation of the COM Structured Storage. It may also be referred to as a Composite Document File V2, OLE container, or OLE file. It is used by a lot of Microsoft software, and even some non-Microsoft. I commonly encounter it while working on weird things at the Morgue.

Detector script

I wrote a VBScript script to look for any file starting with the CFB file signature in a Windows 98SE virtual machine:

On Error Resume Next

header = Chr(&HD0) & Chr(&HCF) & Chr(&H11) & Chr(&HE0) & Chr(&HA1) & Chr(&HB1) & Chr(&H1A) & Chr(&HE1)

Sub CFBFinder(folder)
    For Each subfolder In folder.SubFolders
        CFBFinder folder
    Next
    For Each file In folder.Files
        If file.Size > 19 Then
            Set stream = file.OpenAsTextStream(1, 0) 'open for reading in ASCII
            'handle possible permission errors
            If Err.Number = 0 Then
                If stream.Read(Len(header)) = header Then
                    WScript.Echo file.Path
                End If
            End If
        End If
    Next
End Sub

Extractor script

I wrote a smol Python script to extract a CFB file into a directory structure, to make inspection easier on Linux.

#!/usr/bin/env python
import argparse
import shutil
from dataclasses import dataclass
from pathlib import Path
from olefile import OleFileIO


@dataclass
class Args:
    cfb_file: OleFileIO
    output_dir: Path
    verbose: bool = False


def main() -> None:
    parser = argparse.ArgumentParser(description="Microsoft Compound File Binary extractor")
    parser.add_argument("cfb_file", type=OleFileIO)
    parser.add_argument("-o", "--output-dir", type=Path, default=".")
    parser.add_argument("-v", "--verbose", action="store_true", default=False)
    args = parser.parse_args(namespace=Args)
    args.output_dir.mkdir(exist_ok=True)

    with args.cfb_file as ole:
        for storage in ole.listdir(storages=True, streams=False):
            dir = args.output_dir.joinpath(*storage)
            if args.verbose:
                print(f"Creating directory {dir} for storage {storage!r}")
            dir.mkdir(exist_ok=True)

        for stream_path in ole.listdir(storages=False, streams=True):
            output_path = args.output_dir.joinpath(*stream_path)
            if args.verbose:
                print(f"Extracting stream {stream_path!r} to {output_path}")
            with ole.openstream(stream_path) as stream, output_path.open("wb") as f:
                shutil.copyfileobj(stream, f)

if __name__ == '__main__':
    main()

Occurrences

I have observed CFB in use in the following cases:

  • Microsoft Office Word documents (.doc)
  • Microsoft Office Word templates (.dot)
  • Microsoft Office Word wizards (.wiz)
  • Microsoft Office Excel documents (.xls)
  • Microsoft Office Excel templates (.xlt)
  • Microsoft Office Excel add-ins (.xla)
  • Microsoft Office PowerPoint documents (.ppt)
  • Microsoft Office PowerPoint templates (.pot)
  • Microsoft Office Access data projects (.adp)
  • Microsoft Office Access wizard templates (.mdz)
  • Microsoft Office Outlook messages (.msg)
  • Microsoft Office Outlook item templates (.oft)
  • Microsoft Office Visio drawings (.vsd)
  • Microsoft Office Visio stencils (.vss)
  • Microsoft Office Visio templates (.vst)
  • Microsoft Office Publisher documents (.pub)
  • Microsoft Office Project projects (.mpp)
  • Microsoft Office Project templates (.mpt)
  • Microsoft Office FrontPage and Visual Studio 6 user interface preferences (.prf)
  • Microsoft Photodraw pictures (.mix)
  • Microsoft Common Console documents / Management Saved Console (.msc)
  • Microsoft HTML Help cache (hh.dat)
  • Microsoft Visual Studio Solution User Options (.suo)
  • Microsoft Works Word Processor documents (.wps)
  • Microsoft Works Word Processor templates (.wpt)
  • Microsoft Works Word Processor wizards (.wwp)
  • Microsoft Works Word Processor borders (.ibd)
  • Microsoft Works Spreadsheet spreadsheets (.wks, .xlr)
  • Microsoft Works Spreadsheet wizards (.wws)
  • Microsoft Works Database databases (.wdb)
  • Microsoft Works Database database backups (.bdb)
  • Microsoft Works Database wizards (.wwd)
  • Microsoft Works Portfolio (.wsb)
  • Windows Installer packages (.msi)
  • Windows Installer merge modules (.msm)
  • Windows Installer dialogs (.wid)
  • Windows Live Writer / Open Live Writer Weblog post (.wpost)
  • Windows 7 Sticky Notes (.snt)
  • SAP Crystal Reports reports (.rpt)