What compression formats does PDF support?

PDF supports multiple compression filters: FlateDecode (Deflate/ZIP — lossless, for text and line art), DCTDecode (JPEG — lossy, for photographs), JBIG2Decode (JBIG2 — lossless or lossy bi-level compression, for scanned document images), JPXDecode (JPEG 2000 — lossy or lossless, for high-quality images, PDF 1.5+), CCITTFaxDecode (CCITT Group 3/4 fax compression, for bi-level scans), LZWDecode (Lempel-Ziv-Welch, deprecated in most modern workflows), RunLengthDecode (simple run-length encoding).

What is FlateDecode compression in PDF?

FlateDecode is the Deflate (zlib/gzip) lossless compression filter used in PDF. It is the standard compression for PDF content streams (page instructions, fonts, vector graphics) and for PNG-style lossless image compression. FlateDecode is lossless — no data is lost during compression or decompression. Compression ratios for text and structured data typically range from 3:1 to 10:1. All modern PDF viewers and processors handle FlateDecode natively.

What is DCTDecode compression in PDF?

DCTDecode is the JPEG (Joint Photographic Experts Group) lossy compression filter in PDF. It uses Discrete Cosine Transform (DCT) to compress photographic images with adjustable quality levels — higher quality = larger file, lower quality = smaller file with visible artefacts. JPEG is ideal for complex colour photography where some quality loss is acceptable in exchange for very high compression ratios (10:1 to 50:1 for typical photographs).

What is JBIG2 compression in PDF?

JBIG2 (Joint Bi-level Image Experts Group 2) is a highly efficient compression format for bi-level (black and white, 1-bit) images — ideal for scanned text documents and line art. JBIG2 works by finding similar text symbols across a page (or across multiple pages for multi-page JBIG2) and encoding them once, then referencing the same pattern for each occurrence. A page of scanned text has many repeated letter forms — JBIG2 exploits this repetition for far better compression than other bi-level formats.

What is JPX/JPEG 2000 compression in PDF?

JPXDecode is the JPEG 2000 image filter in PDF (supported from PDF 1.5+). JPEG 2000 supports both lossless and lossy compression in a single format, with wavelet-based compression providing better quality at the same file sizes compared to JPEG. JPEG 2000 supports alpha channels, higher bit depths (up to 16 bits per channel), and progressive rendering. It is required by PDF/A-2 for lossless image compression and used in medical imaging PDF workflows.

How do I reduce PDF file size without losing quality?

To reduce PDF file size while preserving quality: (1) Use FlateDecode for all content streams — ensure they are not stored uncompressed. (2) Downsample high-resolution images to appropriate DPI for the intended use (300 DPI for print, 72-150 DPI for screen). (3) Use DCT/JPEG compression for photographs with quality 85. (4) Use JBIG2 for scanned document images. (5) Embed only subset fonts (not full font programs). (6) Remove hidden layers, unused objects, embedded thumbnails, and redundant metadata. (7) Apply cross-reference stream compression (PDF 1.5+) — converts the cross-reference table to a compressed stream.

PDF Compression Explained: Flate, JPEG, JBIG2, JPX & How to Reduce PDF File Size

Quick Answer

A PDF without any compression would be enormous — page layout instructions, font programs, and image data in raw form. PDF compression is applied via filter chains on each stream object. FlateDecode (ZIP/Deflate) compresses everything losslessly — text instructions, fonts, vector drawings, XML metadata. DCTDecode (JPEG) compresses photographic images with controlled quality loss — a full-colour photo that would be 12 MB raw becomes 400 KB at quality 85. JBIG2Decode exploits pattern repetition in scanned text — the same letter appears 800 times but is stored once. JPXDecode (JPEG 2000) provides better quality-per-byte than JPEG and supports lossless mode — required by PDF/A-2 for lossless image archival. Choose the right filter for each content type and a 100 MB uncompressed PDF becomes 4 MB — fully faithful, fully printable.

How PDF Compression Works

Every data stream in a PDF — page content, images, fonts, metadata, colour profiles — is stored as a stream object. Each stream has a /Filter key specifying which compression algorithm(s) to apply. Multiple filters can be chained for additional compression.

PDF's primary compression filters:

FlateDecode — Deflate/zlib (the same algorithm as ZIP/gzip). Lossless general-purpose compression. Used for: page content streams, font programs, ICC profiles, metadata, cross-reference streams. Typical ratio: 3:1 to 10:1 for text/code data.
DCTDecode — JPEG (Discrete Cosine Transform) lossy compression. Used for: continuous-tone colour and greyscale photographs. Quality is adjustable (0-100). Typical ratio: 10:1 to 50:1 for photographs at quality 70-90.
JBIG2Decode — JBIG2 bi-level (1-bit) compression. Lossless or lossy for scanned document text and line art. Exploits symbol dictionary repetition — identical glyphs stored once, referenced everywhere. Typical ratio: 5:1 to 20:1 over uncompressed bi-level, or 2-3× better than CCITT Group 4.
JPXDecode — JPEG 2000 (wavelet-based) compression. Lossless or lossy for continuous-tone images. Better quality at equal file size vs. JPEG. Supports alpha channel, high bit depths (up to 16-bit). Required by PDF/A-2 for lossless image compression.
CCITTFaxDecode — CCITT Group 3 or Group 4 fax compression. Lossless bi-level compression for black-and-white scans. Standard for fax-originated documents. Being replaced by JBIG2 in modern workflows.

🔗

Filter chaining: PDF allows multiple filters in sequence: /Filter [/ASCII85Decode /FlateDecode]. Data is first Flate-compressed, then ASCII85-encoded. For most modern PDFs, single-filter FlateDecode or DCTDecode is used — ASCII encoding filters are largely obsolete since binary data transfer became universal.

PDF Compression Filter Comparison

Filter	Algorithm	Lossy?	Best Content Type	Typical Ratio
FlateDecode	Deflate/zlib	✅ Lossless	Text, fonts, code, vector, metadata	3:1 – 10:1
DCTDecode	JPEG / DCT	❌ Lossy	Colour & grey photographs	10:1 – 50:1
JBIG2Decode	JBIG2	Both modes	Bi-level scanned text, line art	5:1 – 20:1
JPXDecode	JPEG 2000	Both modes	High-quality images, archival	8:1 – 40:1
CCITTFaxDecode	CCITT G3/G4	✅ Lossless	Bi-level (fax) scans	3:1 – 8:1
RunLengthDecode	Run-length	✅ Lossless	Simple repeated patterns only	1.5:1 – 3:1

Real-World Examples

📄 Document Scanning Scenario

Law Firm Document Digitisation: JBIG2 Reducing 80 GB to 4 GB

A law firm digitises 400,000 pages of case documents — typed and photocopied legal correspondence from the 1980s–2000s. Scanned at 300 DPI, each page produces a 2 MB TIFF bi-level image. 400,000 pages × 2 MB = 800 GB raw. Initial PDF conversion using CCITT Group 4 compression: 80 GB. Reprocessing with JBIG2 (lossless mode, global symbol dictionary across all pages): 4 GB — 95% size reduction from TIFF. The reduction comes from JBIG2's symbol dictionary: the same "e", "t", "a", "the " appears hundreds of thousands of times across 400,000 pages — stored once in the global dictionary, referenced everywhere. The resulting PDF is emailed, backed up on cloud storage, and text-searched via OCR — completely impractical at 800 GB, effortless at 4 GB.

📷 Photography Scenario

Real Estate Listing PDF: JPEG Quality Optimisation

A real estate marketing team produces property brochure PDFs with 12 full-page high-resolution photographs for email distribution. The raw InDesign export at default high-quality JPEG embeds each photo at quality 90: 12 photos × 4 MB = 48 MB + page content = 52 MB PDF. Poor for email. By optimising JPEG quality to 75 (still imperceptible quality loss for screen viewing) and downsampling from 300 DPI to 150 DPI for screen distribution: each photo drops to 600 KB, total PDF = 8 MB — suitable for email. A separate print-ready version keeps quality 90 at 300 DPI for printer delivery.

🏛️ Archival Scenario

Museum Archive: JPEG 2000 Lossless for PDF/A-2 Conformance

A national museum creates PDF/A-2b archival records of manuscript digitisations. Each manuscript page is scanned at 600 DPI in 24-bit colour: 50 MB per page uncompressed. PDF/A-2b allows lossless image compression — JPEG 2000 (JPXDecode) in lossless mode achieves 3:1 compression without any data loss: 50 MB → 16 MB per page. The lossless JPX images pass veraPDF's PDF/A-2b validation — no quality compromised, no data lost, 67% space saved. TIFF would be larger and non-conformant in this PDF/A format. JPEG would fail the lossless requirement. JPX lossless is the correct solution.

Why PDF Compression Matters

📦

Dramatic Size Reduction

The right compression reduces a PDF from hundreds of MB to a few MB — making documents practical for email, web delivery, cloud storage, and mobile viewing.

⚡

Faster Loading

Compressed PDFs open faster — especially in browser viewers where every byte must be downloaded before rendering. FlateDecode-compressed content streams allow fast decoding on modern CPUs.

🔍

Quality Preservation

Lossless filters (FlateDecode, JBIG2 lossless, JPX lossless) preserve every bit of data. Lossy filters at appropriate quality settings are visually indistinguishable from originals for most viewing uses.

🏛️

Archival Conformance

PDF/A-2 permits JPEG 2000 lossless compression for archival images. Correct compression choices are required for standards compliance — incorrect filters cause conformance failures.

💾

Storage Cost Reduction

For document management systems with millions of PDFs, correct compression multiplies cost savings — reducing storage costs by 80-95% while maintaining full document fidelity at retrieval.

🔄

Cross-Device Compatibility

Standard PDF compression filters are decompressed by every PDF viewer — from Acrobat on desktop to mobile browsers. Correctly compressed PDFs display identically everywhere, on any device.

PDF Stream Compression — Filter Syntax

PDF STREAM OBJECTS — COMPRESSION FILTER EXAMPLES

% FlateDecode — page content stream (lossless)
5 0 obj
<<
  /Length  2847
  /Filter  /FlateDecode
>>
stream
  % compressed page instructions (q BT Tf Td Tj ET Q ...)
endstream

% DCTDecode — colour photograph (JPEG lossy)
6 0 obj
<<
  /Type             /XObject
  /Subtype          /Image
  /Width  1200  /Height  800
  /ColorSpace       /DeviceRGB
  /BitsPerComponent 8
  /Filter           /DCTDecode
  /Length           184320  % ~180 KB
>>
stream
  % JPEG binary data
endstream

% JBIG2Decode — scanned text (bi-level lossless)
7 0 obj
<<
  /Width   2480  /Height  3508
  /ColorSpace       /DeviceGray
  /BitsPerComponent 1
  /Filter           /JBIG2Decode
  /DecodeParms      << /JBIG2Globals 8 0 R >>
>>

Common Mistakes to Avoid

Compressing scanned text photographs with JPEG instead of JBIG2. JPEG compression on bi-level (black and white) scanned text produces visible ringing artefacts and halos around characters — text looks blurry and unprofessional. For 1-bit scanned document images, always use JBIG2 (lossless mode for clean originals, lossy mode for noise-heavy scans). The compression ratios are also much better than JPEG for this content type.
Re-compressing already-compressed JPEG images. A JPEG image decoded from a PDF and then re-encoded as JPEG loses quality twice — each encode/decode cycle introduces more DCT artefacts. When processing or modifying PDF images, either preserve the original compressed JPEG data unchanged or decode to lossless format before re-encoding once at the final quality setting.
Using lossless compression (FlateDecode) for large colour photographs. Storing a 12 MP colour photograph with FlateDecode instead of DCTDecode bloats the PDF unnecessarily — FlateDecode achieves only 2:1 on photographic data (random-looking DCT coefficients don't compress well), while quality-85 JPEG achieves 15:1. Use DCTDecode for photographs; FlateDecode for diagrams, vector art, and text-only PNG-style images.
Setting JPEG quality too low for professional print PDFs. Quality values below 70 produce visible DCT block artefacts in photographs — acceptable for web thumbnails but unacceptable for print production. For professional print workflows, use quality 85-95. For web-only PDFs, quality 75-80 offers substantial size reduction with minimal visual impact at normal viewing distances.
Not enabling cross-reference stream compression in PDF 1.5+ files. The cross-reference table in large PDFs can itself be significant in size. PDF 1.5+ allows the cross-reference table to be stored as a compressed stream (FlateDecode) — reducing it from a large plaintext table to a fraction of the size. Ensure your PDF generator uses cross-reference streams (/Type /XRef) for files with large numbers of objects.

Frequently Asked Questions

PDF supports: FlateDecode (Deflate/ZIP lossless — for text, fonts, code), DCTDecode (JPEG lossy — for photographs), JBIG2Decode (bi-level lossless/lossy — for scanned text), JPXDecode (JPEG 2000 lossless or lossy — for high-quality images), CCITTFaxDecode (CCITT Group 3/4 — for fax-originated scans), RunLengthDecode (simple repeating data).
FlateDecode is Deflate/zlib (ZIP algorithm) lossless compression applied to PDF content streams. It compresses page instructions, fonts, ICC profiles, metadata, and vector data with no data loss. Achieves 3:1 to 10:1 ratios on structured text data. Every modern PDF reader handles it natively.
JBIG2 compresses bi-level (1-bit black and white) images by finding similar symbols like repeated letter forms and storing them once in a dictionary. All occurrences reference the stored pattern — dramatically reducing size for scanned text. Can be lossless (exact) or lossy (similar but not identical glyphs merged — very high compression but slight character alteration).
JPXDecode (JPEG 2000) is a wavelet-based compression filter available in PDF 1.5+. It supports lossless and lossy modes in one format, alpha channels, and up to 16-bit colour depth. Better quality-per-byte than regular JPEG. Required by PDF/A-2 for lossless image compression in archival documents.
Key approaches: (1) Compress all content streams with FlateDecode. (2) Downsample images to appropriate DPI (300 for print, 150 for screen). (3) Use JPEG quality 80-85 for photographs. (4) Use JBIG2 for scanned text. (5) Subset-embed fonts — include only used glyphs. (6) Remove unused objects, hidden layers, embedded thumbnails. (7) Enable cross-reference stream compression (PDF 1.5+).
No. Compression is transparent to the PDF reader — it decompresses streams before processing. Text content is always fully decompressed before rendering or searching, so compression has zero effect on text selectability, copy-paste, or search. However, scanned image-based PDFs (no text layer) require OCR for searchability, regardless of compression format.

Compress Your PDF — Free

PDFlyst's compression tool reduces PDF file size without compromising visual quality.

Compress PDF — Free

PDF Compression: Flate, JPEG, JBIG2, & JPX — How It Works