When you scan a 100-page black-and-white text document, every letter 'e' on the page is a slightly different cluster of black pixels. JBIG2 (PDF filter: JBIG2Decode) recognizes that all these 'e' shapes are essentially the same, stores one master template of 'e', then records only the coordinates where each instance appears. A 50 MB TIFF scan becomes a 500 KB JBIG2 PDF — a 100× reduction — with text just as sharp and readable for both humans and OCR engines.
What Is JBIG2 Compression?
JBIG2 (ISO/IEC 14492) is a compression standard for bi-level images — images where every pixel is exactly black or exactly white, with no shades of gray. It was developed by the Joint Bi-level Image experts Group and introduced into PDF with version 1.4.
Unlike JPEG or Flate, which treat each pixel independently, JBIG2 uses symbolic compression: it scans the entire page, identifies recurring patterns (character shapes), builds a dictionary of unique templates, and then encodes the page as a list of "place template #47 at position (342, 891)" instructions. For a page of typewritten text, the letter 'e' might appear 300 times. Instead of storing 300 sets of black-pixel patterns, JBIG2 stores one set and 300 position coordinates.
Only for bi-level (1-bit) images. JBIG2 cannot compress gray or color images. For those, use JPEG, JPX, or Flate. PDF scanners typically use a mixed-content approach: JBIG2 for the text regions of each page, JPEG for any detected photo regions.
Lossless vs. Lossy JBIG2
JBIG2 supports two modes with very different quality/size trade-offs:
| Mode | How It Works | File Size | Risk | Best For |
|---|---|---|---|---|
| Lossless | All unique character shapes stored exactly; identical shapes are matched precisely | Moderate | None — pixel-perfect reproduction | Legal, medical, financial records |
| Lossy | Visually similar shapes are merged into one template — a slightly different 'a' might be matched to the dominant 'a' template | Smallest possible | Character substitution in degraded scans | Mass digitization, email attachments |
The "digit substitution" risk: Aggressive lossy JBIG2 has been known to match a grainy '6' to a '0' template if they look similar enough in a low-quality scan. For legal or medical documents, always use lossless JBIG2 or verify your scanner's compression settings.
Real-World Examples
National Archive Digitizing 5 Million Court Records
An archive digitizes 5 million historical court records. Using JBIG2 compression, they reduce total storage from 50 terabytes down to roughly 4 terabytes — saving tens of thousands of dollars in server costs. Each document remains fully readable and searchable via OCR, and researchers can access them over the internet in seconds instead of waiting for physical retrieval.
Check Deposit by Photo
A mobile banking app converts check photos to black-and-white JBIG2 PDFs before uploading. The resulting files are small enough to upload reliably even on a poor 3G connection in under 2 seconds. A full-color JPEG of the same check would be 10× larger and might fail to transmit, while also containing more noise that degrades the bank's automated text recognition system.
Medical Records in EHR Systems
A hospital scans thousands of handwritten patient notes and referral letters daily. Using lossless JBIG2, each page shrinks from ~400 KB (TIFF) to ~18 KB — a 22× reduction. The sharp, noise-free output improves OCR accuracy for converting handwriting to searchable text, supporting faster clinical decision-making without risking document integrity.
Why JBIG2 Matters
Unmatched Efficiency
For pure black-and-white text, no other format approaches JBIG2's compression ratio. Typical 10×–30× reduction over raw bitmaps.
Fast Network Delivery
Tiny files load instantly over mobile and satellite connections — critical for mobile banking, field inspection, and remote healthcare.
OCR-Ready Output
Sharp, noise-reduced character shapes are easier for OCR engines to recognize accurately than blurry JPEG text.
Universal Support
Supported by all modern PDF viewers since PDF 1.4 (Acrobat 5, 2001). Safe for document exchange without viewer compatibility concerns.
Storage Cost Savings
Enterprises digitizing millions of pages achieve storage savings of 80–95% versus TIFF or uncompressed formats, directly reducing cloud storage costs.
Archival Quality (Lossless Mode)
Lossless JBIG2 preserves every pixel exactly — meeting the requirements of ISO/IEC archival standards for legal admissibility.
JBIG2 vs. Other Bi-level Formats
| Format | PDF Filter | Algorithm | Typical File Size (1 B&W page) | Best For |
|---|---|---|---|---|
| Raw bitmap | None | No compression | ~4 MB | Dev/debugging only |
| CCITT Group 3 | CCITTFaxDecode | Run-length (row) | ~300–800 KB | Legacy fax documents |
| CCITT Group 4 | CCITTFaxDecode | Run-length (2D) | ~50–150 KB | Standard scanned documents |
| JBIG2 (lossless) | JBIG2Decode | Symbolic encoding | ~20–60 KB | Archival text scans |
| JBIG2 (lossy) | JBIG2Decode | Symbolic + shape matching | ~8–25 KB | Mass digitization |
Common Mistakes to Avoid
- Using JBIG2 on grayscale or color images. JBIG2 only handles true 1-bit black-and-white. Applying it to grayscale or color content will either fail or produce extremely poor results. Use JPEG or JPX for those.
- Using lossy JBIG2 for legal or medical documents. Character substitution in lossy mode — even if rare — can alter a number or letter in a medical dosage or legal amount. Always use lossless mode for compliance-critical documents.
- Not verifying document quality after JBIG2 compression. Always visually inspect a sample of pages after compression, especially with lossy mode. Look for obviously wrong characters in numbers — dates, quantities, reference numbers.
- Expecting JBIG2 to work on mixed-content pages without MRC. A page with both text and a photo cannot be JBIG2-compressed as a whole. Proper high-quality scanning uses Mixed Raster Content (MRC) to separate text and image regions and apply appropriate compression to each independently.
- Assuming all PDF tools produce optimal JBIG2. Poor scanner firmware or basic PDF converters apply inefficient JBIG2 settings. Compare file sizes across tools — well-tuned JBIG2 should produce files 5–10× smaller than CCITT G4 for the same page.
Frequently Asked Questions
JBIG2 (filter:
JBIG2Decode) is a bi-level image compression standard in PDF. It stores one master shape per repeated character, achieving 10–100× smaller files than TIFF for black-and-white text scans.Both modes exist. Lossless is pixel-perfect — safe for legal documents. Lossy merges similar shapes, achieving smaller sizes but risking character substitution in low-quality scans.
Use JBIG2 for black-and-white text document scans (contracts, invoices, books, forms). Never use it for photographs, color, or grayscale content.
Both are bi-level lossless formats. JBIG2 is 2–5× more efficient than CCITT G4 for text-heavy pages. CCITT G4 is faster and more universally supported by older software. JBIG2 requires PDF 1.4+.
Lossy JBIG2 can cause OCR errors via character substitution. Lossless JBIG2 preserves exact pixels and can actually improve OCR accuracy by cleaning scan noise. Always use lossless for compliance-critical documents.
JBIG2 was introduced in PDF 1.4 (Acrobat 5, 2001). All modern PDF viewers support it. Very old processors targeting PDF 1.0–1.3 may not decompress JBIG2 streams.
Compress Your Scanned PDFs — Free
PDFlyst reduces scanned PDF file sizes without sacrificing text clarity or readability.
Compress PDF — Free