PDF Compression

CCITT Compression: Group 4 Fax Encoding in PDF

CCITT compression is the lossless standard for black-and-white scanned documents inside PDF files. Used by courts, governments, and medical systems worldwide — it shrinks bitonal images to a fraction of their original size without losing a single pixel.

Quick Answer

CCITT compression is a lossless encoding algorithm that compresses purely black-and-white (bitonal) images. Inside PDF files, it appears as the CCITTFaxDecode filter. Group 4 — the most efficient variant — compresses a typical scanned page by encoding runs of identical pixels and comparing lines to eliminate redundancy. A document that weighs 500 KB as an uncompressed scan can shrink to under 25 KB with Group 4, with zero quality loss.

What Is CCITT Compression?

The name comes from the Comité Consultatif International Téléphonique et Télégraphique — the international committee that originally standardized it for fax machine transmission in the 1980s. Today, the standard is maintained by the ITU (International Telecommunication Union) and referred to as ITU-T T.4 (Group 3) and T.6 (Group 4).

The core insight behind CCITT is simple: most of a scanned black-and-white page is white. Rather than storing every individual white pixel, the algorithm records run lengths — for example, "the next 24,000 pixels are white." This makes the compressed data far smaller than the raw image data, with no information lost at all.

In the PDF specification, CCITT compression is applied through the CCITTFaxDecode filter. When a PDF viewer opens a page with scanned content, it reads this filter, decompresses the bitonal stream, and renders the image. The standard is supported by every PDF reader ever made — from Adobe Acrobat to browser-based viewers.

Bitonal only: CCITT works exclusively on images with exactly two colors — pure black and pure white. It cannot be applied to grayscale or color images. For those, PDF uses JPEG, JPEG 2000, or Flate compression instead.

How CCITT Group 4 Works

Group 4 uses two-dimensional coding, which is what makes it so efficient. Here is the process from scan to compressed PDF:

  1. The scanner produces a bitonal bitmap. Every pixel is either 0 (white) or 1 (black). A standard A4 page at 300 DPI produces about 8.5 million pixels — nearly 1 MB of raw data before any compression.
  2. Run-length encoding handles each row. Instead of writing every pixel, the encoder writes the length of each alternating run. A row that starts with 800 white pixels then 12 black pixels is stored as two numbers: (800, 12).
  3. Two-dimensional coding compares adjacent rows. Group 4 looks at each line relative to the line above it. Because consecutive lines of text are often nearly identical, only the differences between lines need to be stored — dramatically reducing data.
  4. The result is stored as a CCITTFaxDecode stream in the PDF. The PDF stores the compressed bitonal data along with a filter dictionary specifying K: -1 (indicating Group 4), image dimensions, and color space.

The PDF Filter Dictionary

Here is what the CCITT filter parameters look like inside a PDF stream:

PDF STREAM (SIMPLIFIED)
<<
  /Type          /XObject
  /Subtype       /Image
  /Width         2551          % pixels wide  (A4 @ 300 DPI)
  /Height        3508          % pixels tall
  /ColorSpace    /DeviceGray
  /BitsPerComponent 1         % bitonal: 1 bit per pixel
  /Filter        /CCITTFaxDecode
  /DecodeParms   << /K -1 /Columns 2551 >>
>>
% K = -1  →  Group 4 (T.6)
% K =  0  →  Group 3 1D (T.4)
% K >  0  →  Group 3 2D (T.4)

Group 3 vs. Group 4 at a Glance

Feature CCITT Group 3 CCITT Group 4
ITU Standard T.4 T.6
Coding dimension 1D (per row) 2D (row vs. previous row)
Error correction Yes (for noisy fax lines) No (assumes clean channel)
Compression efficiency Good Excellent — typically 2–4× better
PDF K parameter K=0 (1D) or K>0 (2D) K=-1
Best for Legacy fax compatibility PDF document archiving

Real-World Examples

🏥 Medical Scenario

Digitizing a Clinic's Paper Records

A medical clinic scans 5,000 patient history files — black ink on white paper. Saved as color JPEG, the archive would consume 2.5 GB. Saved with CCITT Group 4, the entire archive fits into roughly 120 MB. That is small enough to store on a cheap USB drive, attach to an email, or back up to the cloud in minutes. Every character remains perfectly sharp because not a single bit was lost in compression.

⚖️ Legal Scenario

Court Filing a 1,000-Page Transcript

A court reporter scans a signed paper transcript using a document scanner set to CCITT G4. The resulting PDF looks identical to the paper original — sharp, clean, high-contrast text. Because there is no gray or color noise, the file is also highly compatible with OCR software, making every word searchable. Courts and e-filing systems around the world specifically require this format for long-term electronic record keeping.

🏛️ Government Scenario

Archiving Historic Documents

A national archive scans millions of typed letters and government forms from the 1950s and 1960s. Using CCITT Group 4, each page averages 15–30 KB. The same page in TIFF without compression would be 900 KB. The archive saves petabytes of storage while meeting the ISO standard for long-term PDF archiving (PDF/A), which explicitly supports CCITTFaxDecode.

Benefits of CCITT Group 4

Completely Lossless

Every pixel is reproduced perfectly after decompression. No blurring, no artifacts, no character distortion — ever.

Exceptional Compression Ratios

A typical scanned text page compresses at 15:1 to 30:1. A 1 MB raw scan becomes 30–70 KB in the PDF.

Universal Compatibility

Every PDF viewer, printer, scanner and e-filing system in existence supports CCITTFaxDecode. Zero compatibility risk.

Fast Encode and Decode

The algorithm is computationally simple. Even embedded systems, old hardware, and network printers can run it at full speed.

Standards Compliant

Supported in PDF/A for archiving, TIFF, and ISO 19005. Required by many government and legal e-filing standards.

OCR-Friendly

The perfect black-to-white contrast of bitonal images is ideal for optical character recognition — better accuracy than grayscale scans.

CCITT vs. JBIG2: Which to Use?

CCITT Group 4 has dominated bitonal PDF compression for decades, but JBIG2 — introduced in the PDF 1.4 specification — can sometimes produce even smaller files. Here is how they compare:

Criteria CCITT Group 4 JBIG2
Compression type Lossless only Lossless or lossy
Typical compression ratio 15:1 – 30:1 on text 30:1 – 100:1 on text (lossy mode)
How it works Run-length + 2D diff coding Symbol dictionary — reuses repeated glyphs
PDF viewer support Universal (100%) Modern viewers (PDF 1.4+)
Risk of quality loss None (always lossless) Possible in lossy mode — characters may look substituted
Best use case Legal, medical, archival — anywhere accuracy is critical Web delivery, size optimization, modern audiences

Lossy JBIG2 controversy: In 2013, researchers found that Xerox scanners using lossy JBIG2 were silently substituting digits in scanned numbers — "6" became "8", for example. CCITT Group 4 leaves no room for this kind of error because it is always lossless. For legal and financial documents, CCITT Group 4 remains the safer choice.

Common Mistakes to Avoid

  • Applying CCITT to grayscale or color scans. CCITT only works on bitonal images. Trying to use it on a grayscale photo will produce errors or incorrect output. Use Flate or JPEG for those.
  • Scanning at too low a resolution before compression. CCITT is not a cure for a blurry scan. For readable text, scan at a minimum of 200 DPI — 300 DPI is the standard. CCITT will faithfully preserve whatever resolution you capture.
  • Confusing the K parameter values. K=0 is Group 3 one-dimensional (not Group 4). For maximum compression, you need K=-1. This is a common source of confusion in custom PDF generators and libraries.
  • Using CCITT for documents with mixed content. If a page contains both a black- and-white scan and a color photo or logo, CCITT can only handle the bitonal regions. Mixed-content PDFs typically combine multiple compression filters — one per image object.
  • Assuming all scanners output true bitonal images. Many consumer scanners dither grayscale into pseudo-bitonal images. This reduces CCITT efficiency dramatically. Use a dedicated document scanner with a hardware bitonal (1-bit) output mode.

Frequently Asked Questions

  • CCITT compression is a lossless encoding method designed for bitonal (pure black-and-white) images. Originally developed for fax machines, it is now the standard for compressing scanned documents in PDF files via the CCITTFaxDecode filter. Group 4 is the most efficient variant and produces no quality loss whatsoever.

  • CCITTFaxDecode is the PDF filter name for CCITT compression. When a PDF reader encounters this filter, it decompresses the bitonal image data using the CCITT algorithm. The K parameter in the filter dictionary determines which variant is used: K=0 is Group 3 1D, K>0 is Group 3 2D, and K=-1 is Group 4.

  • Group 3 uses one-dimensional run-length encoding and was designed for fax transmission over noisy phone lines — it includes error correction that adds overhead. Group 4 uses two-dimensional encoding, comparing each scan line to the previous one, and drops error correction since it assumes a reliable channel. Group 4 produces significantly smaller files and is the standard for PDF documents.

  • Yes, completely. CCITT Group 4 is always lossless — every black and white pixel is reproduced identically after decompression. No information is discarded. This is critical for legal and medical documents where text sharpness and pixel-perfect accuracy are required by law or regulation.

  • Use CCITT Group 4 when maximum compatibility and guaranteed accuracy matter — legal filings, medical records, government archives, or any context where a single corrupted character is unacceptable. Use JBIG2 when file size is the top priority and you know your audience uses modern PDF software. JBIG2 can achieve better compression but carries compatibility risks with older systems.

  • No. CCITT compression only works on bitonal images — exactly two values per pixel: black or white. It cannot be applied to grayscale or color images. For those, use JPEG (DCTDecode), JPEG 2000 (JPXDecode), or Flate (FlateDecode) compression inside the PDF.

Compress and Optimize Your PDFs for Free

PDFlyst tools let you compress, merge, split, and optimize PDF files — directly in your browser, with nothing to install.

Compress PDF — Free