When you press `Ctrl + D` in Acrobat to view Document Properties, you are looking directly at the `/Info` dictionary. It lives completely independently from the visual text on the pages. A PDF might have the giant visual headline "Annual Final Report" on page 1, but if the author never filled out the Info Dictionary, the internal `/Title` might still read 'Microsoft Word - Doc1.docx', ruining its SEO ranking on Google.
Standard Keys of the Info Dictionary
Unlike the rest of a PDF which is highly extensible, the traditional Info Dictionary relies on several specific core keys defined in 1993:
- /Title: The actual name of the document. Crucial for Accessibility (screen readers read this first) and Google Search indexes (it becomes the `
` tag). - /Author & /Subject: Descriptive string data defining the creator and the purpose of the document.
- /Keywords: A comma-separated list of strings. Historically used heavily by primitive corporate search engines to index documents before full-text scraping existed.
- /Creator vs /Producer: Creator is the desktop application the user was typing in (e.g., Apple Pages). Producer is the hidden mathematical engine that converted the file (e.g., macOS Quartz PDFContext).
- /CreationDate & /ModDate: Machine-readable timestamps dictating physical file history.
The Shift to XMP Metadata
| Feature | Legacy /Info Dictionary | Modern XMP Stream |
|---|---|---|
| Format | PDF Specific Dictionary Syntax (e.g., /Title (Report)) | Universal XML language. |
| Extensibility | Poor. Adding custom fields like `/CopyrightStatus` is hacky and non-standard. | Infinite. Can embed entirely custom XML schemas (Creative Commons, DRM tokens). |
| Tool Agnostic | No. Only a specialized PDF parser can read the end of the file. | Yes. A bash script or general web-crawler can find the cleartext `<?xpacket>` XML without understanding PDF architecture. |
| Adoption | Officially Deprecated in PDF 2.0. | The absolute mandatory standard for all modern PDF archival and prepress formats. |
Real-World Scenarios
The Browser Tab Disaster
A marketing team spends thousands designing a beautiful "2024 Product Catalog" PDF, but the graphic designer forgets to update the PDF properties. The designer originally cloned last year's file to save time. When thousands of customers open the link in Chrome, the browser pulls the `/Title` key from the Info Dictionary. The tab explicitly says "2019 Internal Rough Draft V3", destroying brand credibility instantly.
Proving Document Origins
In a lawsuit, a plaintiff claims they authored a critical contract on January 5th. Digital forensics experts extract the PDF and read the /Info dictionary. The /CreationDate explicitly shows `D:20240210...` (February 10). Furthermore, the /Creator field reveals the document was built in 'Adobe Photoshop' (implying image manipulation) rather than 'Microsoft Word' (standard drafting), completely shifting the legal strategy.
Screen Reader First Pass
A government website uploads a tax form. A blind user navigates to the PDF using the JAWS screen reader. According to WCAG ADA compliance laws, the screen reader first intercepts the `/Title` tag from the Info Dictionary to announce the document context. If the `/Title` is blank, the screen reader defaults to violently reading out the hideous, 40-character algorithmic filesystem URL it was downloaded from, failing compliance audits.
Best Practices for Metadata
Data Sanitization
Always manually scrub the Info Dictionary before publishing external PR documents to prevent leaking internal author names or embarrassing original working file titles.
Syncing Info and XMP
Because the `/Info` dictionary is deprecated, professional software perfectly mirrors data. If you change the Title in the XMP stream, the software silently rewrites the legacy `/Info` Title to match, ensuring both old and new search engines find the file.
Timestamp Integrity
Never rely on your Windows/Mac operating system's "File Modified" right-click data. Copy-pasting a file resets Windows file times. Only the `/CreationDate` inside the PDF Info dictionary travels permanently with the document contents themselves.
The Info Dictionary Syntax
% 1. The Info dictionary exists as a standard indirect object 90 0 obj << /Title (Quarterly Earnings Report Q3) /Author (Jane Doe - CFO Office) /Subject (Financials) /Keywords (finance, earnings, q3, public) /Creator (Microsoft Word) /Producer (Acrobat PDFMaker 21 for Word) /CreationDate (D:20231015093000-04'00') % Oct 15, 2023, 9:30 AM (UTC-4) /ModDate (D:20231016140500-04'00') % Oct 16, 2023, 2:05 PM >> endobj ... % 2. Crucially, the master Trailer physically points to it % so interpreters can find it instantly at the bottom of the file. trailer << /Size 91 /Root 1 0 R /Info 90 0 R % Points to Object 90 above >> startxref 112344 %%EOF
Common Metadata Pitfalls
- PDF 2.0 Confusion. The PDF 2.0 specification officially deprecated the `/Info` dictionary. However, thousands of legacy scripts and corporate search engines still rely on it. A common mistake is using a hyper-modern software engine that only spits out XMP data, resulting in older recipient systems marking the PDF Title as "Unknown."
- Inconsistent XMP Syncing. Overriding the `/Title` inside the dictionary using a cheap python script, but forgetting to update the bloated XMP XML stream in the same file. The PDF now houses contradictory metadata, and different software (Adobe vs Chrome) will display completely different document histories.
- Encrypting the Metadata Stream. Applying standard password-security to a PDF encrypts both the text AND the `/Info` dictionary. When hosted on a web server, Google's crawler physically cannot read the Title or Keywords, severely impacting search visibility. Modern encryption settings explicitly offer options to "Leave metadata unencrypted."
Frequently Asked Questions
Open any PDF, hit `Ctrl+D` (or go to File > Properties). The "Description" tab detailing the Title, Author, and Custom keys is a direct visual translation of the underlying `/Info` object.
It was too rigid. As documents evolved, publishers needed highly structured, nestable data logic for tracking licenses, DRM, and accessibility (which XML/XMP handles perfectly), whereas the `/Info` dictionary is just a flat list of strings.
`Creator` is the human-facing desktop software where the content was typed (like Microsoft Word or Adobe Illustrator). `Producer` is the low-level backend mathematical engine that handled the actual PDF conversion (like Acrobat Distiller).
It uses a strict ASN.1 format: `D:YYYYMMDDHHmmSSOHH'mm'`. So `D:20241024083000-05'00'` means October 24, 2024, at 8:30:00 AM, offset against UTC by -5 hours.
Yes, for backward compatibility. While XMP is the modern standard, legacy web servers, email filtering systems, and older PDF readers only know how to look for the `/Info` block at the bottom of the file.
Clean Up Your Document Data
Ensure your PDFs rank perfectly on Google and don't leak sensitive author names. Edit your PDF metadata directly using PDFlyst.
Edit PDF Metadata