PDF Compression

JBIG2 Compression: Black & White Scan Optimization in PDF

JBIG2 is the specialist compression filter for bi-level scanned documents. By storing one master shape per unique character instead of millions of individual pixels, it achieves 10–100× smaller files than TIFF — while keeping text perfectly sharp for OCR and archival use.

Quick Answer

When you scan a 100-page black-and-white text document, every letter 'e' on the page is a slightly different cluster of black pixels. JBIG2 (PDF filter: JBIG2Decode) recognizes that all these 'e' shapes are essentially the same, stores one master template of 'e', then records only the coordinates where each instance appears. A 50 MB TIFF scan becomes a 500 KB JBIG2 PDF — a 100× reduction — with text just as sharp and readable for both humans and OCR engines.

What Is JBIG2 Compression?

JBIG2 (ISO/IEC 14492) is a compression standard for bi-level images — images where every pixel is exactly black or exactly white, with no shades of gray. It was developed by the Joint Bi-level Image experts Group and introduced into PDF with version 1.4.

Unlike JPEG or Flate, which treat each pixel independently, JBIG2 uses symbolic compression: it scans the entire page, identifies recurring patterns (character shapes), builds a dictionary of unique templates, and then encodes the page as a list of "place template #47 at position (342, 891)" instructions. For a page of typewritten text, the letter 'e' might appear 300 times. Instead of storing 300 sets of black-pixel patterns, JBIG2 stores one set and 300 position coordinates.

📌

Only for bi-level (1-bit) images. JBIG2 cannot compress gray or color images. For those, use JPEG, JPX, or Flate. PDF scanners typically use a mixed-content approach: JBIG2 for the text regions of each page, JPEG for any detected photo regions.

Lossless vs. Lossy JBIG2

JBIG2 supports two modes with very different quality/size trade-offs:

ModeHow It WorksFile SizeRiskBest For
LosslessAll unique character shapes stored exactly; identical shapes are matched preciselyModerateNone — pixel-perfect reproductionLegal, medical, financial records
LossyVisually similar shapes are merged into one template — a slightly different 'a' might be matched to the dominant 'a' templateSmallest possibleCharacter substitution in degraded scansMass digitization, email attachments
⚠️

The "digit substitution" risk: Aggressive lossy JBIG2 has been known to match a grainy '6' to a '0' template if they look similar enough in a low-quality scan. For legal or medical documents, always use lossless JBIG2 or verify your scanner's compression settings.

Real-World Examples

🏛️ Archival Scenario

National Archive Digitizing 5 Million Court Records

An archive digitizes 5 million historical court records. Using JBIG2 compression, they reduce total storage from 50 terabytes down to roughly 4 terabytes — saving tens of thousands of dollars in server costs. Each document remains fully readable and searchable via OCR, and researchers can access them over the internet in seconds instead of waiting for physical retrieval.

📱 Mobile Banking Scenario

Check Deposit by Photo

A mobile banking app converts check photos to black-and-white JBIG2 PDFs before uploading. The resulting files are small enough to upload reliably even on a poor 3G connection in under 2 seconds. A full-color JPEG of the same check would be 10× larger and might fail to transmit, while also containing more noise that degrades the bank's automated text recognition system.

⚕️ Healthcare Scenario

Medical Records in EHR Systems

A hospital scans thousands of handwritten patient notes and referral letters daily. Using lossless JBIG2, each page shrinks from ~400 KB (TIFF) to ~18 KB — a 22× reduction. The sharp, noise-free output improves OCR accuracy for converting handwriting to searchable text, supporting faster clinical decision-making without risking document integrity.

Why JBIG2 Matters

📦

Unmatched Efficiency

For pure black-and-white text, no other format approaches JBIG2's compression ratio. Typical 10×–30× reduction over raw bitmaps.

Fast Network Delivery

Tiny files load instantly over mobile and satellite connections — critical for mobile banking, field inspection, and remote healthcare.

🔍

OCR-Ready Output

Sharp, noise-reduced character shapes are easier for OCR engines to recognize accurately than blurry JPEG text.

🌐

Universal Support

Supported by all modern PDF viewers since PDF 1.4 (Acrobat 5, 2001). Safe for document exchange without viewer compatibility concerns.

💰

Storage Cost Savings

Enterprises digitizing millions of pages achieve storage savings of 80–95% versus TIFF or uncompressed formats, directly reducing cloud storage costs.

📋

Archival Quality (Lossless Mode)

Lossless JBIG2 preserves every pixel exactly — meeting the requirements of ISO/IEC archival standards for legal admissibility.

JBIG2 vs. Other Bi-level Formats

FormatPDF FilterAlgorithmTypical File Size (1 B&W page)Best For
Raw bitmapNoneNo compression~4 MBDev/debugging only
CCITT Group 3CCITTFaxDecodeRun-length (row)~300–800 KBLegacy fax documents
CCITT Group 4CCITTFaxDecodeRun-length (2D)~50–150 KBStandard scanned documents
JBIG2 (lossless)JBIG2DecodeSymbolic encoding~20–60 KBArchival text scans
JBIG2 (lossy)JBIG2DecodeSymbolic + shape matching~8–25 KBMass digitization

Common Mistakes to Avoid

  • Using JBIG2 on grayscale or color images. JBIG2 only handles true 1-bit black-and-white. Applying it to grayscale or color content will either fail or produce extremely poor results. Use JPEG or JPX for those.
  • Using lossy JBIG2 for legal or medical documents. Character substitution in lossy mode — even if rare — can alter a number or letter in a medical dosage or legal amount. Always use lossless mode for compliance-critical documents.
  • Not verifying document quality after JBIG2 compression. Always visually inspect a sample of pages after compression, especially with lossy mode. Look for obviously wrong characters in numbers — dates, quantities, reference numbers.
  • Expecting JBIG2 to work on mixed-content pages without MRC. A page with both text and a photo cannot be JBIG2-compressed as a whole. Proper high-quality scanning uses Mixed Raster Content (MRC) to separate text and image regions and apply appropriate compression to each independently.
  • Assuming all PDF tools produce optimal JBIG2. Poor scanner firmware or basic PDF converters apply inefficient JBIG2 settings. Compare file sizes across tools — well-tuned JBIG2 should produce files 5–10× smaller than CCITT G4 for the same page.

Frequently Asked Questions

  • JBIG2 (filter: JBIG2Decode) is a bi-level image compression standard in PDF. It stores one master shape per repeated character, achieving 10–100× smaller files than TIFF for black-and-white text scans.

  • Both modes exist. Lossless is pixel-perfect — safe for legal documents. Lossy merges similar shapes, achieving smaller sizes but risking character substitution in low-quality scans.

  • Use JBIG2 for black-and-white text document scans (contracts, invoices, books, forms). Never use it for photographs, color, or grayscale content.

  • Both are bi-level lossless formats. JBIG2 is 2–5× more efficient than CCITT G4 for text-heavy pages. CCITT G4 is faster and more universally supported by older software. JBIG2 requires PDF 1.4+.

  • Lossy JBIG2 can cause OCR errors via character substitution. Lossless JBIG2 preserves exact pixels and can actually improve OCR accuracy by cleaning scan noise. Always use lossless for compliance-critical documents.

  • JBIG2 was introduced in PDF 1.4 (Acrobat 5, 2001). All modern PDF viewers support it. Very old processors targeting PDF 1.0–1.3 may not decompress JBIG2 streams.

Compress Your Scanned PDFs — Free

PDFlyst reduces scanned PDF file sizes without sacrificing text clarity or readability.

Compress PDF — Free