PDF Accessibility

PDF Artifacts: Decorative Content Screen Readers Should Ignore

In PDF accessibility, an artifact is a page content element — headers, footers, page numbers, decorative rules, background images — that is marked as non-meaningful so screen readers skip it. Without artifact marking, assistive technology reads every decorative element out loud, creating a confusing and exhausting experience for visually impaired users.

Quick Answer

Imagine a user opens a 200-page annual report with their screen reader. Without artifact marking, NVDA reads every page: "CompanyName Annual Report — Confidential — 1 of 200" (repeating header), then "line" (decorative rule), then the actual body text — then does the same thing on every single page. With correct artifact marking, the header, footer, page number, and decorative rule are all marked as Artifact — the screen reader skips them entirely and reads only the meaningful body content, moving cleanly from the end of one page to the start of the next. Artifacts don't just matter for accessibility — they also affect how PDFs export to other formats, how text is extracted by AI tools, and whether a PDF passes PDF/UA validation.

What Is a PDF Artifact?

In the PDF specification's tagged content model, every piece of page content is either tagged (given a semantic role in the logical structure tree) or marked as an artifact (flagged as non-meaningful to assistive technology). An artifact is not invisible or removed — it still renders visually on the page. It is simply labelled "not real content" so processing tools know to ignore it.

The PDF specification defines three categories of artifacts:

  • Pagination Artifacts — Page-level elements generated as part of the pagination process: running page headers, running page footers, page numbers, background images set at the template level.
  • Layout Artifacts — Visual elements that exist for layout reasons rather than semantic communication: decorative horizontal rules between sections, column separator lines, decorative borders, whitespace fillers, column background shading.
  • Page Artifacts — Content outside the meaningful content area of the page: printer's registration marks, colour calibration bars, trim marks, bleed marks, crop marks — elements that exist in the PDF for print production purposes but have no meaning in the document's content.
⚠️

Tagged vs. Artifact — the critical distinction: A page number in a footer is a Pagination artifact. A footnote reference number inline in the text is tagged content (it has semantic meaning). Decorative stars around a chapter title are Layout artifacts. A star rating icon that communicates a score is tagged content (Figure with alt text). When in doubt: if sighted users get information from it, it needs a tag. If it's only visual decoration, it's an artifact.

Artifact vs. Tagged Content: Decision Table

Content ElementClassificationReason
Running page header text🏷️ Pagination ArtifactRepeated on every page — no unique semantic value per page
Page number in footer🏷️ Pagination ArtifactNavigational aid already conveyed by PDF viewer UI
Horizontal rule between sections🏷️ Layout ArtifactVisual separator — no semantic meaning communicated
Background texture image🏷️ Layout ArtifactPurely decorative — conveys no information
Chapter heading text✅ Tagged H1/H2Semantic structure — users navigate to it
Bar chart image✅ Tagged Figure + AltConveys data — alt text must describe the trend shown
Footnote reference [1]✅ Tagged Note/ReferenceSemantic link between body text and footnote
Printer registration mark🏷️ Page ArtifactPrint production mark — outside meaningful content

Real-World Examples

🏛️ Government Scenario

Annual Report: Artifact Remediation for EU Accessibility Directive

A government ministry's 180-page annual report must comply with the EU Web Accessibility Directive. An accessibility audit finds that every page's running header ("Ministry of Finance — 2025 Annual Report — CONFIDENTIAL"), footer copyright notice, and decorative horizontal rules between sections are not marked as artifacts. The screen reader reads all of these on every page, burying the real content in repetitive noise. The remediation team marks all headers, footers, page numbers, and decorative rules as Pagination and Layout artifacts using Acrobat's Reading Order tool. Post-fix, PAC 2024 validation passes and a NVDA screen reader test confirms clean, linear reading of content.

📊 Publishing Scenario

Technical Manual: Background Graphics and Page Borders

A technical manual PDF uses a styled page template: a blue left-border bar, a company logo watermark on every page, and a light grey star pattern background texture. None of these convey information — they are purely visual brand elements. Without artifact marking, a text extraction tool processing the PDF for an AI knowledge base extracts fragments of the border text, the logo metadata, and garbled characters from the background pattern — polluting the knowledge base with garbage data. After marking all three elements as Layout and Pagination artifacts, text extraction is clean and every extracted paragraph is genuine document content.

⚕️ Healthcare Scenario

Patient Form: Print Marks Must Not Appear in Digital Version

A hospital digitises patient intake forms as PDFs. The scanned originals contain printer registration marks, fold indicators, and a barcoded batch number in the margin — Page Artifacts all. When a screen reader user on a tablet opens the form PDF to complete it electronically, without artifact marking, the batch barcode is read as a long string of alphanumeric characters, and the registration marks are announced as graphical elements. With correct Page Artifact marking, the screen reader skips every production mark and jumps straight to "Section 1: Patient Information — Field 1: Full Name" — a clean, professional experience for the patient.

Why Artifact Marking Matters

Clean Screen Reader Experience

Artifacts are silently skipped by screen readers — eliminating the repetitive noise of headers, footers, page numbers, and decorative graphics that otherwise interrupt every page.

📋

Accurate Text Extraction

Text extraction tools, AI parsers, and OCR systems respect artifact markers — extracting only meaningful content without polluting output with repeated header/footer boilerplate or decorative characters.

📱

Clean Reflow on Mobile

PDF Reflow mode (used on small screens) respects artifact marking — decorative elements are excluded from the reflowed text flow, making documents readable on phones without layout debris.

PDF/UA Conformance

PDF/UA (ISO 14289) requires all content to be either tagged with a semantic role or marked as an artifact. Unmarked content that is neither is a validation failure — artifact marking is mandatory for compliance.

🤖

Better AI Processing

LLM document processing and RAG (Retrieval Augmented Generation) pipelines that use PDF text extraction benefit from artifact marking — cleaner input text produces higher quality AI output.

🖨️

Print-to-Digital Conversion

When converting scanned or print-production PDFs to accessible digital documents, correctly identifying and marking Page Artifacts (trim marks, bleeds) is the first step in the remediation workflow.

How Artifacts Are Marked in a PDF Content Stream

PDF CONTENT STREAM — ARTIFACT AND TAGGED CONTENT
% Page number footer — Pagination Artifact
/Artifact
<< /Type /Pagination  /Subtype /Footer >>
BDC
  BT /F1 9 Tf 288 28 Td (42) Tj ET
EMC

% Decorative horizontal rule — Layout Artifact
/Artifact << /Type /Layout >> BDC
  0.7 G  72 480 468 0.5 re f
EMC

% Body paragraph — Real tagged content
/P << /MCID 0 >> BDC
  BT /F1 11 Tf 72 460 Td
  (The financial results for 2025 show...) Tj ET
EMC

Common Mistakes to Avoid

  • Marking informative images as artifacts. An image must be marked as artifact only if it is purely decorative. A chart, photograph, diagram, or icon that conveys information sighted readers use must be tagged as a Figure with appropriate alt text — not silenced as an artifact. Incorrectly silencing informative visuals is one of the most damaging accessibility mistakes possible.
  • Relying on "untagged" to mean "artifact." Some tools treat untagged content as implicit artifact. PDF/UA requires explicit artifact marking — content that is neither tagged nor explicitly marked as artifact is a validation error and may be incorrectly processed by screen readers and extraction tools.
  • Forgetting to artifact-mark auto-generated page numbers after export. When Word, InDesign, or LibreOffice exports to PDF with running headers/footers and page numbers, these elements must be marked as Pagination artifacts in the tag tree. Auto-tagging in Acrobat rarely handles this correctly — always verify in the Tags panel after export.
  • Treating footnote reference numbers as artifacts. A footnote reference [1] inline in body text is tagged content — it is semantically meaningful, pointing the reader to a related note. Only decorative elements with no semantic communication should be artifacts. Footnotes and endnotes require proper tagged structure.
  • Not verifying artifact marking with a screen reader. Automated validators (PAC 2024, veraPDF) check that content is tagged or artifact-marked but cannot verify that the marking is semantically correct. Always do a manual NVDA or JAWS screen reader test to confirm content flows cleanly without disruptive artifact noise between meaningful passages.

Frequently Asked Questions

  • A PDF artifact is page content marked as non-meaningful — headers, footers, page numbers, decorative rules, background images — that should be ignored by screen readers, text extraction tools, and reflow mode. It still renders visually but carries no semantic value for assistive technology or content processing.

  • Pagination Artifacts: Headers, footers, page numbers, background templates. Layout Artifacts: Decorative rules, column separators, border graphics. Page Artifacts: Printer's marks — registration marks, colour bars, trim lines — outside the meaningful content area.

  • Tagged content has semantic meaning — headings, paragraphs, list items, table cells, figures. Artifacts have no semantic meaning — they exist for visual or layout purposes only. Screen readers navigate tagged structure and announce it; artifacts are silently skipped. Correct classification of all content as one or the other is the foundation of an accessible PDF.

  • Only if purely decorative. Texture backgrounds, watermark graphics, and design elements with no informational content should be artifacts. An image that conveys information — a chart, diagram, or illustrative photo — must be tagged as Figure with alt text, regardless of its position in the layout.

  • In Adobe Acrobat: Tools > Accessibility > Reading Order — select element, click 'Background/Artifact.' In the Tags panel: delete the element's tag entry. Programmatically: wrap content stream operators in /Artifact << /Type /Pagination >> BDC ... EMC markers in the page content stream.

  • Yes. PDF/UA requires all content to be either tagged with a semantic role or explicitly marked as artifact. Unclassified content — neither tagged nor artifact-marked — is a PDF/UA validation error. Correct artifact marking is one of the most common accessibility remediation steps in PDF audits.

Create Accessible PDFs — Free

PDFlyst helps you build and manage PDFs that meet accessibility standards and work for every reader.

Open PDF Editor — Free