PDF Internals

PDF Catalog: The Document Root Dictionary Explained

The PDF document catalog (/Type /Catalog) is the root object of every PDF file — the master directory that references pages, bookmarks, forms, names, metadata, viewer preferences, and every other top-level structure. Understanding the catalog is understanding the entire architecture of a PDF document.

Quick Answer

When a PDF viewer opens a file, the first thing it does is read the cross-reference trailer at the end of the file. The trailer contains a /Root key — an indirect reference pointing to the document catalog. The catalog is a dictionary with /Type /Catalog and a set of references to every major structure in the document: /Pages (the page tree), /Outlines (bookmarks), /AcroForm (form fields), /Names (named destinations, embedded files, JavaScript), /Metadata (XMP metadata), /ViewerPreferences (how the viewer opens the document), and /StructTreeRoot (the accessibility tag tree). Everything in a PDF flows from the catalog. It is the document's index, root, and control panel — all in one dictionary.

What Is the PDF Document Catalog?

The PDF document catalog is the mandatory root object of every PDF file. It is a PDF dictionary with /Type /Catalog referenced by the /Root entry of the cross-reference trailer. The catalog acts as the master index — every major document structure is either stored in the catalog or reachable via references from it.

The catalog's key entries include:

  • /Pages (required) — Indirect reference to the page tree root, from which all pages are accessible
  • /Outlines — The document outline (bookmark) tree root
  • /AcroForm — The interactive form dictionary, listing all form fields
  • /Names — The names dictionary containing name trees: /Dests (named destinations), /EmbeddedFiles, /JavaScript, /AP
  • /Metadata — An XMP metadata stream for document information (title, author, creation date)
  • /ViewerPreferences — Instructions for how the viewer should open and display the document
  • /OpenAction — An action to perform when the document opens (typically a GoTo destination or JavaScript)
  • /MarkInfo — Declares whether the document uses tagged content (/Marked true)
  • /StructTreeRoot — Root of the accessibility structure tree (tagged PDF)
  • /Lang — The document's natural language (e.g., "en-US"), required by PDF/UA
  • /Perms — Permission signatures that restrict modifications to the document
  • /PageMode — How to open: /UseNone (page only), /UseOutlines (show bookmarks), /FullScreen
📌

The catalog is not the Info dictionary: The legacy document information dictionary (/Info) — referenced from the trailer, containing /Title, /Author, /Creator etc. — is separate from the catalog. PDF 2.0 deprecates /Info in favour of XMP metadata in the catalog's /Metadata stream.

Key Catalog Entries and Their Roles

Catalog KeyRequired?Points ToPurpose
/Pages✅ RequiredPage tree rootEvery page in the document
/OutlinesOptionalOutline rootBookmark / navigation tree
/AcroFormOptionalForm dictionaryAll interactive form fields
/NamesOptionalNames dictionaryNamed dests, attachments, JS
/MetadataOptional*XMP streamDocument title, author, dates
/StructTreeRootPDF/UA req.Structure tree rootAccessibility tag hierarchy
/ViewerPreferencesOptionalPrefs dictionaryViewer open behaviour
/LangPDF/UA req.Language string"en-US", "de-DE", etc.
/OpenActionOptionalAction or destAction on document open
/PageModeOptionalName constantWhat panel shows on open

Real-World Examples

📊 Presentation Scenario

Conference Slides: OpenAction + PageMode for Full-Screen Kiosk

A conference organiser distributes a 60-slide presentation PDF for an unmanned kiosk display. The document catalog is configured with /OpenAction << /S /Named /N /FullScreen >> (opens full-screen) and /PageMode /FullScreen. The viewer automatically opens the PDF maximised with no UI chrome visible. The catalog's /ViewerPreferences sets /HideToolbar true, /HideMenubar true, and non-continuous page transitions. When the attendant plugs in the display laptop, double-clicking the PDF immediately launches a full-screen presentation with zero configuration — all driven by catalog settings the document author set once in the file.

🛡️ Security Scenario

Legal Contract: /Perms Locking Against Modifications

A legal firm delivers a signed contract PDF. The document catalog's /Perms dictionary contains a DocMDP (document modification detection and prevention) permission signature — restricting changes to only form field filling. Any other modification — adding pages, deleting content, changing text — would invalidate the DocMDP signature. The /Perms entry in the catalog is what enforces this at the document level, independently of any file password encryption. Validators and viewers check /Perms to determine what operations are permitted on the document.

♿ Accessibility Scenario

Government PDF: Catalog Compliance Checklist

A government agency's PDF accessibility team validates a 200-page policy document. Using veraPDF and PAC 2024, they check the catalog for PDF/UA compliance: /MarkInfo << /Marked true >> ✅ present; /Lang (en-GB) ✅ declared; /StructTreeRoot ✅ present; /ViewerPreferences /DisplayDocTitle true ✅ set so the document title shows in the viewer title bar instead of the filename. The title is declared in /Metadata as an XMP stream (not just the legacy /Info dictionary). All four catalog-level checks pass, contributing to a successful PDF/UA-1 validation report.

Why Understanding the PDF Catalog Matters

🗂️

Master Index

Every document structure — pages, forms, bookmarks, attachments, metadata — is discoverable from the catalog. Understanding the catalog unlocks the entire PDF architecture.

Accessibility Foundation

/MarkInfo, /StructTreeRoot, and /Lang in the catalog are the top-level accessibility declarations. PDF/UA validation checks begin with these catalog entries before descending into page content.

🔐

Permission Control

The /Perms dictionary in the catalog enforces document-level modification restrictions — distinct from encryption passwords. DocMDP and UR (Usage Rights) signatures live here.

🖥️

Viewer Behaviour

ViewerPreferences and OpenAction in the catalog control the user's first experience: full-screen mode, bookmark panel open, document title shown, custom open destination — all set once, honoured everywhere.

🔗

Names Tree Access

The /Names tree in the catalog provides O(log n) lookup for named destinations, embedded files, and JavaScript — enabling efficient navigation and automation in large documents.

📋

Metadata Integration

The /Metadata XMP stream in the catalog provides standards-compliant document information for search engines, document management systems, and AI processing tools.

Document Catalog Dictionary Example

PDF DOCUMENT CATALOG — COMPLETE EXAMPLE
% Trailer references the catalog
trailer <<
  /Size  487
  /Root  1 0 R    % catalog object
  /Info  2 0 R    % legacy info dict
>>

% Document Catalog (1 0 R)
1 0 obj
<<
  /Type             /Catalog
  /Pages            3 0 R     % page tree root
  /Outlines         4 0 R     % bookmarks
  /AcroForm         5 0 R     % form fields
  /Names            6 0 R     % name trees
  /Metadata         7 0 R     % XMP metadata stream
  /StructTreeRoot   8 0 R     % accessibility tags
  /MarkInfo         << /Marked true >>
  /Lang             (en-US)
  /PageMode         /UseOutlines  % show bookmarks on open
  /ViewerPreferences
  <<
    /DisplayDocTitle true
    /FitWindow       false
  >>
>>
endobj

Common Mistakes to Avoid

  • Not setting /Lang in the catalog for multilingual documents. PDF/UA requires a /Lang entry in the catalog declaring the primary document language. Content in a different language should use /Lang on the marked content sequence. Omitting /Lang from the catalog is one of the most common PDF/UA failures — all major validators flag it.
  • Using /Info dictionary instead of /Metadata XMP stream. The legacy /Info dictionary (referenced from the trailer) is deprecated in PDF 2.0. Modern standards (PDF/A, PDF/UA, PDF 2.0) require XMP metadata in the /Metadata stream of the catalog. Many authoring tools still write both — which causes inconsistency failures in strict validators if the values don't match.
  • Omitting /MarkInfo when the document has tagged content. A PDF with a /StructTreeRoot but missing /MarkInfo << /Marked true >> in the catalog will fail PDF/UA validation. These two entries must both be present and consistent — /Marked true declares the intent; /StructTreeRoot contains the actual tag tree.
  • Setting /OpenAction to auto-run JavaScript on open without user warning. A catalog /OpenAction with /S /JavaScript runs a script the moment the user opens the document. This is a security risk and is blocked by most modern PDF viewers in sandboxed mode. Use OpenAction only for safe GoTo navigation actions, not for JavaScript execution.
  • Not updating /PageMode when adding bookmarks post-creation. A PDF created without bookmarks may have /PageMode /UseNone. After adding bookmarks, /PageMode should be updated to /UseOutlines so the viewer automatically opens the bookmark panel — otherwise users must manually discover it. This is a common omission when bookmarks are added as a remediation step.

Frequently Asked Questions

  • The PDF document catalog is the root dictionary of every PDF file — referenced by the trailer's /Root key. It contains references to the page tree, bookmarks, form fields, names tree, XMP metadata, accessibility structure, and viewer preferences. Every major document structure is discoverable from the catalog.

  • The reader reads the file from the end: finds startxref → loads the cross-reference table → reads the trailer dictionary → follows /Root to the catalog object. The catalog is the entry point from which all other document structures are found.

  • /Pages is the only required entry in the catalog — an indirect reference to the page tree root (/Type /Pages). Every page in the document is accessible by traversing the page tree from this entry. Without a valid /Pages reference, the PDF has no pages and is malformed.

  • The /Names dictionary in the catalog contains name trees — balanced B-tree structures for efficient string-keyed lookups. Key sub-trees: /Dests (named destinations for navigation), /EmbeddedFiles (document-level attachments), /JavaScript (global JS code). Name trees enable O(log n) retrieval without linear search.

  • /ViewerPreferences instructs the PDF viewer how to display the document: hide toolbar, fit window size, show document title in title bar (/DisplayDocTitle true), control print scaling. These are viewer hints — user preferences or viewer settings may override them.

  • /MarkInfo declares the document's tagged content status. /Marked true signals the document uses marked content and has a structure tree. Required alongside /StructTreeRoot for PDF/UA conformance. Without /MarkInfo /Marked true, PDF/UA validators flag the document as non-conformant regardless of how well-tagged the content is.

Edit & Manage PDF Structure — Free

PDFlyst gives you powerful tools to work with PDF content, organisation, and structure.

Open PDF Editor — Free