A PDF without any compression would be enormous — page layout instructions, font programs, and image data in raw form. PDF compression is applied via filter chains on each stream object. FlateDecode (ZIP/Deflate) compresses everything losslessly — text instructions, fonts, vector drawings, XML metadata. DCTDecode (JPEG) compresses photographic images with controlled quality loss — a full-colour photo that would be 12 MB raw becomes 400 KB at quality 85. JBIG2Decode exploits pattern repetition in scanned text — the same letter appears 800 times but is stored once. JPXDecode (JPEG 2000) provides better quality-per-byte than JPEG and supports lossless mode — required by PDF/A-2 for lossless image archival. Choose the right filter for each content type and a 100 MB uncompressed PDF becomes 4 MB — fully faithful, fully printable.
How PDF Compression Works
Every data stream in a PDF — page content, images, fonts, metadata, colour profiles — is stored as a stream object. Each stream has a /Filter key specifying which compression algorithm(s) to apply. Multiple filters can be chained for additional compression.
PDF's primary compression filters:
- FlateDecode — Deflate/zlib (the same algorithm as ZIP/gzip). Lossless general-purpose compression. Used for: page content streams, font programs, ICC profiles, metadata, cross-reference streams. Typical ratio: 3:1 to 10:1 for text/code data.
- DCTDecode — JPEG (Discrete Cosine Transform) lossy compression. Used for: continuous-tone colour and greyscale photographs. Quality is adjustable (0-100). Typical ratio: 10:1 to 50:1 for photographs at quality 70-90.
- JBIG2Decode — JBIG2 bi-level (1-bit) compression. Lossless or lossy for scanned document text and line art. Exploits symbol dictionary repetition — identical glyphs stored once, referenced everywhere. Typical ratio: 5:1 to 20:1 over uncompressed bi-level, or 2-3× better than CCITT Group 4.
- JPXDecode — JPEG 2000 (wavelet-based) compression. Lossless or lossy for continuous-tone images. Better quality at equal file size vs. JPEG. Supports alpha channel, high bit depths (up to 16-bit). Required by PDF/A-2 for lossless image compression.
- CCITTFaxDecode — CCITT Group 3 or Group 4 fax compression. Lossless bi-level compression for black-and-white scans. Standard for fax-originated documents. Being replaced by JBIG2 in modern workflows.
Filter chaining: PDF allows multiple filters in sequence: /Filter [/ASCII85Decode /FlateDecode]. Data is first Flate-compressed, then ASCII85-encoded. For most modern PDFs, single-filter FlateDecode or DCTDecode is used — ASCII encoding filters are largely obsolete since binary data transfer became universal.
PDF Compression Filter Comparison
| Filter | Algorithm | Lossy? | Best Content Type | Typical Ratio |
|---|---|---|---|---|
| FlateDecode | Deflate/zlib | ✅ Lossless | Text, fonts, code, vector, metadata | 3:1 – 10:1 |
| DCTDecode | JPEG / DCT | ❌ Lossy | Colour & grey photographs | 10:1 – 50:1 |
| JBIG2Decode | JBIG2 | Both modes | Bi-level scanned text, line art | 5:1 – 20:1 |
| JPXDecode | JPEG 2000 | Both modes | High-quality images, archival | 8:1 – 40:1 |
| CCITTFaxDecode | CCITT G3/G4 | ✅ Lossless | Bi-level (fax) scans | 3:1 – 8:1 |
| RunLengthDecode | Run-length | ✅ Lossless | Simple repeated patterns only | 1.5:1 – 3:1 |
Real-World Examples
Law Firm Document Digitisation: JBIG2 Reducing 80 GB to 4 GB
A law firm digitises 400,000 pages of case documents — typed and photocopied legal correspondence from the 1980s–2000s. Scanned at 300 DPI, each page produces a 2 MB TIFF bi-level image. 400,000 pages × 2 MB = 800 GB raw. Initial PDF conversion using CCITT Group 4 compression: 80 GB. Reprocessing with JBIG2 (lossless mode, global symbol dictionary across all pages): 4 GB — 95% size reduction from TIFF. The reduction comes from JBIG2's symbol dictionary: the same "e", "t", "a", "the " appears hundreds of thousands of times across 400,000 pages — stored once in the global dictionary, referenced everywhere. The resulting PDF is emailed, backed up on cloud storage, and text-searched via OCR — completely impractical at 800 GB, effortless at 4 GB.
Real Estate Listing PDF: JPEG Quality Optimisation
A real estate marketing team produces property brochure PDFs with 12 full-page high-resolution photographs for email distribution. The raw InDesign export at default high-quality JPEG embeds each photo at quality 90: 12 photos × 4 MB = 48 MB + page content = 52 MB PDF. Poor for email. By optimising JPEG quality to 75 (still imperceptible quality loss for screen viewing) and downsampling from 300 DPI to 150 DPI for screen distribution: each photo drops to 600 KB, total PDF = 8 MB — suitable for email. A separate print-ready version keeps quality 90 at 300 DPI for printer delivery.
Museum Archive: JPEG 2000 Lossless for PDF/A-2 Conformance
A national museum creates PDF/A-2b archival records of manuscript digitisations. Each manuscript page is scanned at 600 DPI in 24-bit colour: 50 MB per page uncompressed. PDF/A-2b allows lossless image compression — JPEG 2000 (JPXDecode) in lossless mode achieves 3:1 compression without any data loss: 50 MB → 16 MB per page. The lossless JPX images pass veraPDF's PDF/A-2b validation — no quality compromised, no data lost, 67% space saved. TIFF would be larger and non-conformant in this PDF/A format. JPEG would fail the lossless requirement. JPX lossless is the correct solution.
Why PDF Compression Matters
Dramatic Size Reduction
The right compression reduces a PDF from hundreds of MB to a few MB — making documents practical for email, web delivery, cloud storage, and mobile viewing.
Faster Loading
Compressed PDFs open faster — especially in browser viewers where every byte must be downloaded before rendering. FlateDecode-compressed content streams allow fast decoding on modern CPUs.
Quality Preservation
Lossless filters (FlateDecode, JBIG2 lossless, JPX lossless) preserve every bit of data. Lossy filters at appropriate quality settings are visually indistinguishable from originals for most viewing uses.
Archival Conformance
PDF/A-2 permits JPEG 2000 lossless compression for archival images. Correct compression choices are required for standards compliance — incorrect filters cause conformance failures.
Storage Cost Reduction
For document management systems with millions of PDFs, correct compression multiplies cost savings — reducing storage costs by 80-95% while maintaining full document fidelity at retrieval.
Cross-Device Compatibility
Standard PDF compression filters are decompressed by every PDF viewer — from Acrobat on desktop to mobile browsers. Correctly compressed PDFs display identically everywhere, on any device.
PDF Stream Compression — Filter Syntax
% FlateDecode — page content stream (lossless) 5 0 obj << /Length 2847 /Filter /FlateDecode >> stream % compressed page instructions (q BT Tf Td Tj ET Q ...) endstream % DCTDecode — colour photograph (JPEG lossy) 6 0 obj << /Type /XObject /Subtype /Image /Width 1200 /Height 800 /ColorSpace /DeviceRGB /BitsPerComponent 8 /Filter /DCTDecode /Length 184320 % ~180 KB >> stream % JPEG binary data endstream % JBIG2Decode — scanned text (bi-level lossless) 7 0 obj << /Width 2480 /Height 3508 /ColorSpace /DeviceGray /BitsPerComponent 1 /Filter /JBIG2Decode /DecodeParms << /JBIG2Globals 8 0 R >> >>
Common Mistakes to Avoid
- Compressing scanned text photographs with JPEG instead of JBIG2. JPEG compression on bi-level (black and white) scanned text produces visible ringing artefacts and halos around characters — text looks blurry and unprofessional. For 1-bit scanned document images, always use JBIG2 (lossless mode for clean originals, lossy mode for noise-heavy scans). The compression ratios are also much better than JPEG for this content type.
- Re-compressing already-compressed JPEG images. A JPEG image decoded from a PDF and then re-encoded as JPEG loses quality twice — each encode/decode cycle introduces more DCT artefacts. When processing or modifying PDF images, either preserve the original compressed JPEG data unchanged or decode to lossless format before re-encoding once at the final quality setting.
- Using lossless compression (FlateDecode) for large colour photographs. Storing a 12 MP colour photograph with FlateDecode instead of DCTDecode bloats the PDF unnecessarily — FlateDecode achieves only 2:1 on photographic data (random-looking DCT coefficients don't compress well), while quality-85 JPEG achieves 15:1. Use DCTDecode for photographs; FlateDecode for diagrams, vector art, and text-only PNG-style images.
- Setting JPEG quality too low for professional print PDFs. Quality values below 70 produce visible DCT block artefacts in photographs — acceptable for web thumbnails but unacceptable for print production. For professional print workflows, use quality 85-95. For web-only PDFs, quality 75-80 offers substantial size reduction with minimal visual impact at normal viewing distances.
- Not enabling cross-reference stream compression in PDF 1.5+ files. The cross-reference table in large PDFs can itself be significant in size. PDF 1.5+ allows the cross-reference table to be stored as a compressed stream (FlateDecode) — reducing it from a large plaintext table to a fraction of the size. Ensure your PDF generator uses cross-reference streams (/Type /XRef) for files with large numbers of objects.
Frequently Asked Questions
PDF supports: FlateDecode (Deflate/ZIP lossless — for text, fonts, code), DCTDecode (JPEG lossy — for photographs), JBIG2Decode (bi-level lossless/lossy — for scanned text), JPXDecode (JPEG 2000 lossless or lossy — for high-quality images), CCITTFaxDecode (CCITT Group 3/4 — for fax-originated scans), RunLengthDecode (simple repeating data).
FlateDecode is Deflate/zlib (ZIP algorithm) lossless compression applied to PDF content streams. It compresses page instructions, fonts, ICC profiles, metadata, and vector data with no data loss. Achieves 3:1 to 10:1 ratios on structured text data. Every modern PDF reader handles it natively.
JBIG2 compresses bi-level (1-bit black and white) images by finding similar symbols like repeated letter forms and storing them once in a dictionary. All occurrences reference the stored pattern — dramatically reducing size for scanned text. Can be lossless (exact) or lossy (similar but not identical glyphs merged — very high compression but slight character alteration).
JPXDecode (JPEG 2000) is a wavelet-based compression filter available in PDF 1.5+. It supports lossless and lossy modes in one format, alpha channels, and up to 16-bit colour depth. Better quality-per-byte than regular JPEG. Required by PDF/A-2 for lossless image compression in archival documents.
Key approaches: (1) Compress all content streams with FlateDecode. (2) Downsample images to appropriate DPI (300 for print, 150 for screen). (3) Use JPEG quality 80-85 for photographs. (4) Use JBIG2 for scanned text. (5) Subset-embed fonts — include only used glyphs. (6) Remove unused objects, hidden layers, embedded thumbnails. (7) Enable cross-reference stream compression (PDF 1.5+).
No. Compression is transparent to the PDF reader — it decompresses streams before processing. Text content is always fully decompressed before rendering or searching, so compression has zero effect on text selectability, copy-paste, or search. However, scanned image-based PDFs (no text layer) require OCR for searchability, regardless of compression format.
Compress Your PDF — Free
PDFlyst's compression tool reduces PDF file size without compromising visual quality.
Compress PDF — Free