PDF File Size

PDF Compression: Flate, JPEG, JBIG2, & JPX — How It Works

PDF uses a set of compression filters — FlateDecode, DCTDecode (JPEG), JBIG2Decode, and JPXDecode — applied to every stream in the file to dramatically reduce size. Understanding which filter suits which content type is the key to creating PDF files that are small, fast-loading, and visually crisp — whether for print production, web delivery, or archival.

Quick Answer

A PDF without any compression would be enormous — page layout instructions, font programs, and image data in raw form. PDF compression is applied via filter chains on each stream object. FlateDecode (ZIP/Deflate) compresses everything losslessly — text instructions, fonts, vector drawings, XML metadata. DCTDecode (JPEG) compresses photographic images with controlled quality loss — a full-colour photo that would be 12 MB raw becomes 400 KB at quality 85. JBIG2Decode exploits pattern repetition in scanned text — the same letter appears 800 times but is stored once. JPXDecode (JPEG 2000) provides better quality-per-byte than JPEG and supports lossless mode — required by PDF/A-2 for lossless image archival. Choose the right filter for each content type and a 100 MB uncompressed PDF becomes 4 MB — fully faithful, fully printable.

How PDF Compression Works

Every data stream in a PDF — page content, images, fonts, metadata, colour profiles — is stored as a stream object. Each stream has a /Filter key specifying which compression algorithm(s) to apply. Multiple filters can be chained for additional compression.

PDF's primary compression filters:

  • FlateDecode — Deflate/zlib (the same algorithm as ZIP/gzip). Lossless general-purpose compression. Used for: page content streams, font programs, ICC profiles, metadata, cross-reference streams. Typical ratio: 3:1 to 10:1 for text/code data.
  • DCTDecode — JPEG (Discrete Cosine Transform) lossy compression. Used for: continuous-tone colour and greyscale photographs. Quality is adjustable (0-100). Typical ratio: 10:1 to 50:1 for photographs at quality 70-90.
  • JBIG2Decode — JBIG2 bi-level (1-bit) compression. Lossless or lossy for scanned document text and line art. Exploits symbol dictionary repetition — identical glyphs stored once, referenced everywhere. Typical ratio: 5:1 to 20:1 over uncompressed bi-level, or 2-3× better than CCITT Group 4.
  • JPXDecode — JPEG 2000 (wavelet-based) compression. Lossless or lossy for continuous-tone images. Better quality at equal file size vs. JPEG. Supports alpha channel, high bit depths (up to 16-bit). Required by PDF/A-2 for lossless image compression.
  • CCITTFaxDecode — CCITT Group 3 or Group 4 fax compression. Lossless bi-level compression for black-and-white scans. Standard for fax-originated documents. Being replaced by JBIG2 in modern workflows.
🔗

Filter chaining: PDF allows multiple filters in sequence: /Filter [/ASCII85Decode /FlateDecode]. Data is first Flate-compressed, then ASCII85-encoded. For most modern PDFs, single-filter FlateDecode or DCTDecode is used — ASCII encoding filters are largely obsolete since binary data transfer became universal.

PDF Compression Filter Comparison

FilterAlgorithmLossy?Best Content TypeTypical Ratio
FlateDecodeDeflate/zlib✅ LosslessText, fonts, code, vector, metadata3:1 – 10:1
DCTDecodeJPEG / DCT❌ LossyColour & grey photographs10:1 – 50:1
JBIG2DecodeJBIG2Both modesBi-level scanned text, line art5:1 – 20:1
JPXDecodeJPEG 2000Both modesHigh-quality images, archival8:1 – 40:1
CCITTFaxDecodeCCITT G3/G4✅ LosslessBi-level (fax) scans3:1 – 8:1
RunLengthDecodeRun-length✅ LosslessSimple repeated patterns only1.5:1 – 3:1

Real-World Examples

📄 Document Scanning Scenario

Law Firm Document Digitisation: JBIG2 Reducing 80 GB to 4 GB

A law firm digitises 400,000 pages of case documents — typed and photocopied legal correspondence from the 1980s–2000s. Scanned at 300 DPI, each page produces a 2 MB TIFF bi-level image. 400,000 pages × 2 MB = 800 GB raw. Initial PDF conversion using CCITT Group 4 compression: 80 GB. Reprocessing with JBIG2 (lossless mode, global symbol dictionary across all pages): 4 GB — 95% size reduction from TIFF. The reduction comes from JBIG2's symbol dictionary: the same "e", "t", "a", "the " appears hundreds of thousands of times across 400,000 pages — stored once in the global dictionary, referenced everywhere. The resulting PDF is emailed, backed up on cloud storage, and text-searched via OCR — completely impractical at 800 GB, effortless at 4 GB.

📷 Photography Scenario

Real Estate Listing PDF: JPEG Quality Optimisation

A real estate marketing team produces property brochure PDFs with 12 full-page high-resolution photographs for email distribution. The raw InDesign export at default high-quality JPEG embeds each photo at quality 90: 12 photos × 4 MB = 48 MB + page content = 52 MB PDF. Poor for email. By optimising JPEG quality to 75 (still imperceptible quality loss for screen viewing) and downsampling from 300 DPI to 150 DPI for screen distribution: each photo drops to 600 KB, total PDF = 8 MB — suitable for email. A separate print-ready version keeps quality 90 at 300 DPI for printer delivery.

🏛️ Archival Scenario

Museum Archive: JPEG 2000 Lossless for PDF/A-2 Conformance

A national museum creates PDF/A-2b archival records of manuscript digitisations. Each manuscript page is scanned at 600 DPI in 24-bit colour: 50 MB per page uncompressed. PDF/A-2b allows lossless image compression — JPEG 2000 (JPXDecode) in lossless mode achieves 3:1 compression without any data loss: 50 MB → 16 MB per page. The lossless JPX images pass veraPDF's PDF/A-2b validation — no quality compromised, no data lost, 67% space saved. TIFF would be larger and non-conformant in this PDF/A format. JPEG would fail the lossless requirement. JPX lossless is the correct solution.

Why PDF Compression Matters

📦

Dramatic Size Reduction

The right compression reduces a PDF from hundreds of MB to a few MB — making documents practical for email, web delivery, cloud storage, and mobile viewing.

Faster Loading

Compressed PDFs open faster — especially in browser viewers where every byte must be downloaded before rendering. FlateDecode-compressed content streams allow fast decoding on modern CPUs.

🔍

Quality Preservation

Lossless filters (FlateDecode, JBIG2 lossless, JPX lossless) preserve every bit of data. Lossy filters at appropriate quality settings are visually indistinguishable from originals for most viewing uses.

🏛️

Archival Conformance

PDF/A-2 permits JPEG 2000 lossless compression for archival images. Correct compression choices are required for standards compliance — incorrect filters cause conformance failures.

💾

Storage Cost Reduction

For document management systems with millions of PDFs, correct compression multiplies cost savings — reducing storage costs by 80-95% while maintaining full document fidelity at retrieval.

🔄

Cross-Device Compatibility

Standard PDF compression filters are decompressed by every PDF viewer — from Acrobat on desktop to mobile browsers. Correctly compressed PDFs display identically everywhere, on any device.

PDF Stream Compression — Filter Syntax

PDF STREAM OBJECTS — COMPRESSION FILTER EXAMPLES
% FlateDecode — page content stream (lossless)
5 0 obj
<<
  /Length  2847
  /Filter  /FlateDecode
>>
stream
  % compressed page instructions (q BT Tf Td Tj ET Q ...)
endstream

% DCTDecode — colour photograph (JPEG lossy)
6 0 obj
<<
  /Type             /XObject
  /Subtype          /Image
  /Width  1200  /Height  800
  /ColorSpace       /DeviceRGB
  /BitsPerComponent 8
  /Filter           /DCTDecode
  /Length           184320  % ~180 KB
>>
stream
  % JPEG binary data
endstream

% JBIG2Decode — scanned text (bi-level lossless)
7 0 obj
<<
  /Width   2480  /Height  3508
  /ColorSpace       /DeviceGray
  /BitsPerComponent 1
  /Filter           /JBIG2Decode
  /DecodeParms      << /JBIG2Globals 8 0 R >>
>>

Common Mistakes to Avoid

  • Compressing scanned text photographs with JPEG instead of JBIG2. JPEG compression on bi-level (black and white) scanned text produces visible ringing artefacts and halos around characters — text looks blurry and unprofessional. For 1-bit scanned document images, always use JBIG2 (lossless mode for clean originals, lossy mode for noise-heavy scans). The compression ratios are also much better than JPEG for this content type.
  • Re-compressing already-compressed JPEG images. A JPEG image decoded from a PDF and then re-encoded as JPEG loses quality twice — each encode/decode cycle introduces more DCT artefacts. When processing or modifying PDF images, either preserve the original compressed JPEG data unchanged or decode to lossless format before re-encoding once at the final quality setting.
  • Using lossless compression (FlateDecode) for large colour photographs. Storing a 12 MP colour photograph with FlateDecode instead of DCTDecode bloats the PDF unnecessarily — FlateDecode achieves only 2:1 on photographic data (random-looking DCT coefficients don't compress well), while quality-85 JPEG achieves 15:1. Use DCTDecode for photographs; FlateDecode for diagrams, vector art, and text-only PNG-style images.
  • Setting JPEG quality too low for professional print PDFs. Quality values below 70 produce visible DCT block artefacts in photographs — acceptable for web thumbnails but unacceptable for print production. For professional print workflows, use quality 85-95. For web-only PDFs, quality 75-80 offers substantial size reduction with minimal visual impact at normal viewing distances.
  • Not enabling cross-reference stream compression in PDF 1.5+ files. The cross-reference table in large PDFs can itself be significant in size. PDF 1.5+ allows the cross-reference table to be stored as a compressed stream (FlateDecode) — reducing it from a large plaintext table to a fraction of the size. Ensure your PDF generator uses cross-reference streams (/Type /XRef) for files with large numbers of objects.

Frequently Asked Questions

  • PDF supports: FlateDecode (Deflate/ZIP lossless — for text, fonts, code), DCTDecode (JPEG lossy — for photographs), JBIG2Decode (bi-level lossless/lossy — for scanned text), JPXDecode (JPEG 2000 lossless or lossy — for high-quality images), CCITTFaxDecode (CCITT Group 3/4 — for fax-originated scans), RunLengthDecode (simple repeating data).

  • FlateDecode is Deflate/zlib (ZIP algorithm) lossless compression applied to PDF content streams. It compresses page instructions, fonts, ICC profiles, metadata, and vector data with no data loss. Achieves 3:1 to 10:1 ratios on structured text data. Every modern PDF reader handles it natively.

  • JBIG2 compresses bi-level (1-bit black and white) images by finding similar symbols like repeated letter forms and storing them once in a dictionary. All occurrences reference the stored pattern — dramatically reducing size for scanned text. Can be lossless (exact) or lossy (similar but not identical glyphs merged — very high compression but slight character alteration).

  • JPXDecode (JPEG 2000) is a wavelet-based compression filter available in PDF 1.5+. It supports lossless and lossy modes in one format, alpha channels, and up to 16-bit colour depth. Better quality-per-byte than regular JPEG. Required by PDF/A-2 for lossless image compression in archival documents.

  • Key approaches: (1) Compress all content streams with FlateDecode. (2) Downsample images to appropriate DPI (300 for print, 150 for screen). (3) Use JPEG quality 80-85 for photographs. (4) Use JBIG2 for scanned text. (5) Subset-embed fonts — include only used glyphs. (6) Remove unused objects, hidden layers, embedded thumbnails. (7) Enable cross-reference stream compression (PDF 1.5+).

  • No. Compression is transparent to the PDF reader — it decompresses streams before processing. Text content is always fully decompressed before rendering or searching, so compression has zero effect on text selectability, copy-paste, or search. However, scanned image-based PDFs (no text layer) require OCR for searchability, regardless of compression format.

Compress Your PDF — Free

PDFlyst's compression tool reduces PDF file size without compromising visual quality.

Compress PDF — Free