What is JBIG2 compression in PDF?

JBIG2 is a bi-level (black-and-white) image compression standard used in PDF to achieve extremely small file sizes for scanned text documents. It uses symbolic encoding — storing one master shape per unique character or region rather than individual pixels — achieving 10–100x reduction compared to TIFF for typical text scans.

Is JBIG2 lossless or lossy?

JBIG2 supports both modes. Lossless JBIG2 stores character shapes exactly as they appear in the scan — safe for legal and medical documents. Lossy JBIG2 groups visually similar shapes and replaces them with a single template, achieving even smaller sizes but risking digit substitution (e.g., a '6' replaced by a '0') in degraded scans.

When should I use JBIG2 compression?

Use JBIG2 for large-volume scans of black-and-white text documents — contracts, invoices, medical records, books. It is the best choice when file size is critical and the content is strictly bi-level (no shades of gray). Never use it for photographs, color images, or grayscale content.

How does JBIG2 compare to CCITT Group 4?

Both handle bi-level images losslessly, but JBIG2 is typically 2–5x more efficient than CCITT Group 4 for text-heavy pages. CCITT G4 (used in fax) is faster and more universally supported. JBIG2 achieves better compression but requires PDF 1.4+ and may not be supported by older viewers.

Can JBIG2 cause text recognition (OCR) errors?

Lossy JBIG2 can cause OCR errors if aggressive shape matching replaces similar-looking characters (e.g., replacing 'I' with 'l', or '6' with '0'). Lossless JBIG2 preserves exact pixel patterns and does not cause OCR errors. Lossless JBIG2 can even improve OCR accuracy by cleaning noise from scanned pages.

What PDF version introduced JBIG2?

JBIG2 was introduced in PDF 1.4 (Acrobat 5, 2001). It is supported by all modern PDF viewers. Some very old PDF 1.0–1.3 processors will not decompress JBIG2 streams.

JBIG2 Compression in PDF Explained: Black & White Scan Optimization

Quick Answer

When you scan a 100-page black-and-white text document, every letter 'e' on the page is a slightly different cluster of black pixels. JBIG2 (PDF filter: JBIG2Decode) recognizes that all these 'e' shapes are essentially the same, stores one master template of 'e', then records only the coordinates where each instance appears. A 50 MB TIFF scan becomes a 500 KB JBIG2 PDF — a 100× reduction — with text just as sharp and readable for both humans and OCR engines.

What Is JBIG2 Compression?

JBIG2 (ISO/IEC 14492) is a compression standard for bi-level images — images where every pixel is exactly black or exactly white, with no shades of gray. It was developed by the Joint Bi-level Image experts Group and introduced into PDF with version 1.4.

Unlike JPEG or Flate, which treat each pixel independently, JBIG2 uses symbolic compression: it scans the entire page, identifies recurring patterns (character shapes), builds a dictionary of unique templates, and then encodes the page as a list of "place template #47 at position (342, 891)" instructions. For a page of typewritten text, the letter 'e' might appear 300 times. Instead of storing 300 sets of black-pixel patterns, JBIG2 stores one set and 300 position coordinates.

📌

Only for bi-level (1-bit) images. JBIG2 cannot compress gray or color images. For those, use JPEG, JPX, or Flate. PDF scanners typically use a mixed-content approach: JBIG2 for the text regions of each page, JPEG for any detected photo regions.

Lossless vs. Lossy JBIG2

JBIG2 supports two modes with very different quality/size trade-offs:

Mode	How It Works	File Size	Risk	Best For
Lossless	All unique character shapes stored exactly; identical shapes are matched precisely	Moderate	None — pixel-perfect reproduction	Legal, medical, financial records
Lossy	Visually similar shapes are merged into one template — a slightly different 'a' might be matched to the dominant 'a' template	Smallest possible	Character substitution in degraded scans	Mass digitization, email attachments

⚠️

The "digit substitution" risk: Aggressive lossy JBIG2 has been known to match a grainy '6' to a '0' template if they look similar enough in a low-quality scan. For legal or medical documents, always use lossless JBIG2 or verify your scanner's compression settings.

Real-World Examples

🏛️ Archival Scenario

National Archive Digitizing 5 Million Court Records

An archive digitizes 5 million historical court records. Using JBIG2 compression, they reduce total storage from 50 terabytes down to roughly 4 terabytes — saving tens of thousands of dollars in server costs. Each document remains fully readable and searchable via OCR, and researchers can access them over the internet in seconds instead of waiting for physical retrieval.

📱 Mobile Banking Scenario

Check Deposit by Photo

A mobile banking app converts check photos to black-and-white JBIG2 PDFs before uploading. The resulting files are small enough to upload reliably even on a poor 3G connection in under 2 seconds. A full-color JPEG of the same check would be 10× larger and might fail to transmit, while also containing more noise that degrades the bank's automated text recognition system.

⚕️ Healthcare Scenario

Medical Records in EHR Systems

A hospital scans thousands of handwritten patient notes and referral letters daily. Using lossless JBIG2, each page shrinks from ~400 KB (TIFF) to ~18 KB — a 22× reduction. The sharp, noise-free output improves OCR accuracy for converting handwriting to searchable text, supporting faster clinical decision-making without risking document integrity.

Why JBIG2 Matters

📦

Unmatched Efficiency

For pure black-and-white text, no other format approaches JBIG2's compression ratio. Typical 10×–30× reduction over raw bitmaps.

⚡

Fast Network Delivery

Tiny files load instantly over mobile and satellite connections — critical for mobile banking, field inspection, and remote healthcare.

🔍

OCR-Ready Output

Sharp, noise-reduced character shapes are easier for OCR engines to recognize accurately than blurry JPEG text.

🌐

Universal Support

Supported by all modern PDF viewers since PDF 1.4 (Acrobat 5, 2001). Safe for document exchange without viewer compatibility concerns.

💰

Storage Cost Savings

Enterprises digitizing millions of pages achieve storage savings of 80–95% versus TIFF or uncompressed formats, directly reducing cloud storage costs.

📋

Archival Quality (Lossless Mode)

Lossless JBIG2 preserves every pixel exactly — meeting the requirements of ISO/IEC archival standards for legal admissibility.

JBIG2 vs. Other Bi-level Formats

Format	PDF Filter	Algorithm	Typical File Size (1 B&W page)	Best For
Raw bitmap	None	No compression	~4 MB	Dev/debugging only
CCITT Group 3	`CCITTFaxDecode`	Run-length (row)	~300–800 KB	Legacy fax documents
CCITT Group 4	`CCITTFaxDecode`	Run-length (2D)	~50–150 KB	Standard scanned documents
JBIG2 (lossless)	`JBIG2Decode`	Symbolic encoding	~20–60 KB	Archival text scans
JBIG2 (lossy)	`JBIG2Decode`	Symbolic + shape matching	~8–25 KB	Mass digitization

Common Mistakes to Avoid

Using JBIG2 on grayscale or color images. JBIG2 only handles true 1-bit black-and-white. Applying it to grayscale or color content will either fail or produce extremely poor results. Use JPEG or JPX for those.
Using lossy JBIG2 for legal or medical documents. Character substitution in lossy mode — even if rare — can alter a number or letter in a medical dosage or legal amount. Always use lossless mode for compliance-critical documents.
Not verifying document quality after JBIG2 compression. Always visually inspect a sample of pages after compression, especially with lossy mode. Look for obviously wrong characters in numbers — dates, quantities, reference numbers.
Expecting JBIG2 to work on mixed-content pages without MRC. A page with both text and a photo cannot be JBIG2-compressed as a whole. Proper high-quality scanning uses Mixed Raster Content (MRC) to separate text and image regions and apply appropriate compression to each independently.
Assuming all PDF tools produce optimal JBIG2. Poor scanner firmware or basic PDF converters apply inefficient JBIG2 settings. Compare file sizes across tools — well-tuned JBIG2 should produce files 5–10× smaller than CCITT G4 for the same page.

Frequently Asked Questions

JBIG2 (filter: JBIG2Decode) is a bi-level image compression standard in PDF. It stores one master shape per repeated character, achieving 10–100× smaller files than TIFF for black-and-white text scans.
Both modes exist. Lossless is pixel-perfect — safe for legal documents. Lossy merges similar shapes, achieving smaller sizes but risking character substitution in low-quality scans.
Use JBIG2 for black-and-white text document scans (contracts, invoices, books, forms). Never use it for photographs, color, or grayscale content.
Both are bi-level lossless formats. JBIG2 is 2–5× more efficient than CCITT G4 for text-heavy pages. CCITT G4 is faster and more universally supported by older software. JBIG2 requires PDF 1.4+.
Lossy JBIG2 can cause OCR errors via character substitution. Lossless JBIG2 preserves exact pixels and can actually improve OCR accuracy by cleaning scan noise. Always use lossless for compliance-critical documents.
JBIG2 was introduced in PDF 1.4 (Acrobat 5, 2001). All modern PDF viewers support it. Very old processors targeting PDF 1.0–1.3 may not decompress JBIG2 streams.

Compress Your Scanned PDFs — Free

PDFlyst reduces scanned PDF file sizes without sacrificing text clarity or readability.

Compress PDF — Free

JBIG2 Compression: Black & White Scan Optimization in PDF