What is MRC compression in PDF?

MRC (Mixed Raster Content) is a content-aware compression method for scanned PDFs. It segments each page into foreground text (JBIG2), background paper (heavy JPEG), and image regions (JPEG/JPX) — applying the best compression to each, achieving 10–50x smaller files while keeping text perfectly sharp.

How is MRC different from standard PDF compression?

Standard compression applies one algorithm to the whole page. If you compress hard enough to shrink the file, text blurs. MRC recognizes text vs. photo vs. background and applies the best algorithm to each independently.

What are the three MRC layers?

1. Foreground: Text and line art — losslessly compressed with JBIG2 or CCITT G4. 2. Background: Paper texture — aggressively compressed low-resolution JPEG. 3. Mask: A binary map that composites foreground over background during rendering.

What documents benefit most from MRC?

Color scanned documents containing both text and photos — brochures, books, records, invoices. MRC delivers the greatest benefit when there is sharp text alongside color backgrounds or photographs.

Can MRC compression affect OCR accuracy?

MRC generally improves OCR accuracy. Isolating text on a clean binary foreground removes background noise. Poorly calibrated segmentation on low-quality scans can occasionally misclassify pixels — validate OCR output on critical documents.

Is MRC the same as PDF/A or general PDF compression?

No. MRC is a compression technique applied to raster image streams inside a PDF. PDF/A is an archival format standard. MRC is an advanced feature found in high-quality scanners and document management software.

MRC Compression Explained: Mixed Raster Content in PDF

Quick Answer

Standard PDF compression treats the entire scanned page as one giant image. Compress it hard enough to email and text blurs. MRC (Mixed Raster Content) solves this by analysing the page and splitting it into three layers: a foreground layer for sharp text (JBIG2 — tiny files, perfect edges), a background layer for paper texture (low-quality JPEG thumbnail), and an image layer for photos (JPEG/JPX). A 10 MB scan becomes 300 KB — and the text is crisper than the original.

What Is MRC Compression?

Every scanned page contains three different types of content at once: sharp high-contrast text and line art; smooth colour background and paper texture; and full-colour photographs. These types have completely different visual characteristics — and completely different compression requirements.

Standard compression treats the whole page as one image, making it impossible to optimally compress text and photos simultaneously. MRC (Mixed Raster Content), standardised in ITU-T T.44, solves this by segmenting the scanned image into constituent layers before compressing each independently:

The Foreground (Text) Layer is compressed with JBIG2 or CCITT Group 4 — lossless algorithms for bi-level (black/white) content. Text edges are mathematically perfect.
The Background Layer is down-sampled and compressed aggressively with JPEG. Nobody needs to see individual paper fibres in high resolution.
The Image Layer (photographs) is compressed with JPEG or JPX at a quality level appropriate for photos — high enough to look good, low enough to save space.

📌

Only for scanned (image-based) PDFs. MRC works on raster scans. PDFs created from Word, Excel, or InDesign already have separate text and image streams — MRC is not needed or applicable to those.

How MRC Segmentation Works

Pixel classification. Each pixel is analysed — is it a sharp high-contrast edge (text/line art)? Or a gradual colour gradient (background/photo)?
Layer separation. The engine builds three layers: binary foreground mask, background colour map, and photo regions.
Per-layer compression. JBIG2/CCITT for foreground. Heavy JPEG for the background thumbnail. JPEG/JPX for photo regions.
Mask encoding. A binary mask tells the PDF renderer which pixels come from foreground and which from background. The renderer composites layers on the fly during display.
Optional OCR layer. An invisible text layer is added above the composited image, making the PDF fully searchable without changing visual quality.

Real-World Examples

⚖️ Legal Scenario

Law Firm: 50,000-Page Discovery Archive

A law firm digitises 50,000 pages of colour discovery documents. Standard compression would produce 500 GB — slow to email, expensive to host. With MRC they achieve 30 GB. Footnote text at 7pt remains crystal clear, ensuring no legal detail is accidentally lost. Files open instantly on mobile, speeding document review.

📚 Library Scenario

University Library: 100-Year-Old Textbook

A university digitises a century-old textbook with yellowed paper, handwritten annotations, and complex diagrams. MRC separates the yellow background from the black ink. Students receive a PDF where text appears sharp on a clean white background, even though the original was stained and faded. The file is small enough to stream on a smartphone.

🏥 Healthcare Scenario

Hospital: Patient Record Digitisation

A hospital scans thousands of mixed-content patient records daily — forms with handwriting, printed labels, and small photos. MRC stores each page at under 30 KB while maintaining text readability at any zoom level. Storage costs drop by 80% compared to TIFF, with no loss in diagnostic readability.

Why MRC Compression Matters

📦

Extreme Size Reduction

10–50× smaller than standard TIFF or flat-JPEG scans. A 500 MB invoice archive can shrink to 10 MB with no visible quality loss.

🔍

Superior OCR Accuracy

Isolating text on a clean binary layer removes background noise, dramatically improving word-level OCR accuracy for searchable and indexable archives.

⚡

Instant Web Loading

Tiny file sizes enable documents to open instantly in browsers and mobile apps, even on 3G — essential for document portals and remote access.

🏛️

Archival-Grade Quality

The lossless text foreground layer meets preservation requirements for library and archival collections — text looks sharper than the original scan.

💰

Storage Cost Reduction

80–95% storage savings versus TIFF — a direct and significant reduction in cloud hosting costs for high-volume document workflows.

🖨️

Print-Ready Output

The lossless text layer means MRC PDFs can be printed at full resolution without re-scanning — suitable for reproduction as well as display.

MRC vs. Standard Compression

Aspect	Standard Flat Compression	MRC Compression
Text quality at high compression	Blurry, pixelated	✓ Sharp — lossless JBIG2
Photo quality	Compromised	✓ Optimised per region
Typical file size per page	500 KB–5 MB	20–80 KB
OCR accuracy	Limited — background noise	✓ Excellent — clean text layer
Encoding complexity	Simple — single filter	Complex — segmentation + 3 layers
Best for	Simple grayscale text-only scans	Mixed-content colour scans

Common Mistakes to Avoid

Applying MRC to born-digital PDFs. MRC only helps when the source is a raster scan. PDFs from Word or InDesign already have separate text streams — no gain, just processing overhead.
Using aggressive segmentation on poor-quality scans. A noisy 150 DPI scan can cause misclassification of text pixels as background. Always scan at 300+ DPI and visually verify output before committing to a batch workflow.
Assuming MRC recovers sharpness. MRC optimises compression of whatever you give it. A blurred source scan remains blurred — MRC cannot add resolution that was never there.
Not verifying OCR after MRC. Segmentation errors at foreground/background boundaries can affect OCR of characters near region edges. Spot-check critical documents after processing.
Deploying in old lightweight PDF viewers without testing. MRC PDFs require the viewer to composite multiple layers. Modern viewers handle this correctly; very old or minimal renderers may not. Test in your target environment first.

Frequently Asked Questions

MRC (Mixed Raster Content) segments each scanned page into foreground text (JBIG2), background paper (JPEG), and image regions (JPEG/JPX) — applying optimal compression to each, achieving 10–50× smaller files while keeping text perfectly sharp.
Standard compression applies one algorithm to the whole page — compress hard and text blurs. MRC recognizes text, photo, and background independently and applies the best algorithm to each, so text stays sharp even at extreme size reduction.
1. Foreground: Text/line art — losslessly compressed with JBIG2 or CCITT G4. 2. Background: Paper texture — aggressively compressed low-resolution JPEG. 3. Mask: Binary map telling the renderer which pixels come from which layer.
Colour scanned documents with both text and photos — brochures, books, records, invoices, mixed-content forms. MRC delivers the greatest benefit when sharp text coexists with colour backgrounds or photographs.
MRC generally improves OCR accuracy by isolating clean text on the foreground layer. Poor segmentation on low-quality source scans can occasionally misclassify pixels — always validate OCR output on critical documents.
No. MRC is a compression technique applied to raster image streams inside a PDF. PDF/A is an archival format standard. MRC is an advanced feature found in high-quality scanners and document management software.

Compress Your Scanned PDFs — Free

PDFlyst shrinks large scanned PDFs without blurring text or ruining photo quality.

Compress PDF — Free

MRC Compression: Mixed Raster Content in PDF