I need to make good PDF files for printing, but they are huge – even bigger than the original document. I use Distiller and switch off compression because I don’t want quality to suffer. How can I fix this? — Annie
I’ve discovered a new condition that affects PDF users. I call it ‘fear of compression.’ Compression is your friend, but there are a lot of choices. Most of them are good, but some of them can, indeed, reduce the quality of your PDF files. So if you don’t mind I’ll take some time in this answer to explain what you need to know about compression and Distiller.
A price to pay?
It may seem that compression is magic. How does it make things smaller? The answer is usually to look for repetition. If you took a book and replaced every word ‘the’ with ‘Z’ it would get shorter, right? Well, most books would. The book is a lot harder to read, but a computer can put it back to rights in no time. All the different ways that Distiller can compress, except one, work on the principle of using replacing long, repetitive information with shorter information. And this can make major savings on space, so your PDF files are smaller.
Images might compress especially well, if you have large areas of the same colour. But images are also often huge and that’s where most of the space savings have to come, if you want really small files. So Distiller offers a number of ways to save even more space, by throwing information away. This isn’t necessarily a disaster, but you have to understand the dangers too.
JPEG, CCITT, ZIP – an alphabet soup is waiting if you turn to the compression page in Distiller’s job options. And it doesn’t help knowing what they stand for, so I won’t bother with that. But here are a summary of the compression types available in Distiller.
ZIP is the Swiss Army Knife of compression. Good for just about anything, but sometimes special purpose tools beat it on particular tasks. It is the same idea used in the ever-popular PKZIP and WinZip utilities, which compress files. ZIP can be used to compress the images in a PDF file, and all sorts of other things including text and line art. Using ZIP never changes the contents or quality of anything.
It’s worth noting that using ZIP when making a PDF file is not the same as compressing a whole PDF file with WinZip. WinZip compresses the whole file, and Acrobat wouldn’t understand the result until you un-zip it. Distiller just compresses particular pictures and other pieces inside the file, and Acrobat knows what to do with this.
JPEG is really good at making photographs smaller. That’s what it’s designed for – in fact the ‘P’ stands for ‘Photographic’. An important word used about JPEG is ‘lossy’. That is, it loses information. It is very ingenious and uses a detailed knowledge of how the eye and brain ‘see’ pictures, so the information removed is often impossible to detect. JPEG can throw away a little information or a lot (it always throws away some). The result is a range of settings, where the best quality files are also the largest, while the smallest files have the most obvious loss of quality.
The opposite of lossy is ‘lossless’ – that is you get out exactly what you put in.
Automatic isn’t a compression method at all – or rather, it is two methods. It’s a fact that JPEG is a bad choice for pictures containing large solid areas of colour with sudden changes. Real photos rarely do this, but things like screen shots almost always do. When you choose Auto in Distiller, Distiller will analyse each picture to see if it is a good candidate for JPEG. Then, JPEG or ZIP is used, whichever seems best.
CCITT was invented for faxes. What do faxes have to do with PDF? Well, both need compression. Without compression, a one-page fax might take five minutes to send! CCITT is specially designed for the mixtures of text and a few lines and solid black blocks typically found in a fax. It can only be used for black and white images – no shades of grey – also known as monochrome.
CCITT is usually better than ZIP for this kind of original. Distiller offers the choice of ‘group 3’ and ‘group 4’ but I don’t know why – group 4 is always smaller. Similarly, Distiller offers run length for monochrome, but this isn’t worth using.
Distiller has one other trick for making files even smaller. This is called either subsampling or downsampling. Here’s what it means: throwing away some of the image.
Let’s suppose you want an image to be 3 inches by 5 inches (an inch is 25.4 mm in most parts of the world). Suppose this was a picture scanned at 300 dpi. This figure – dpi – stands for dots per inch. Another word for dots is pixels, or samples. Our 3 x 5 inch image is 900 x 1500 samples (3 inches x 300 dpi = 900, and 5 inches x 300 dpi = 1500). That 900 x 1500 is 1,350,000 samples in all – a whole lot of information for a small picture.
It turns out that 300 dpi is too much for most purposes, and you can use less without harming the quality. If this is a regular photo, 300 dpi is actually enough for the average glossy magazine or coffee table book. 150 dpi is probably fine for the ordinary laser printer or inkjet printer, while 72 dpi is plenty for viewing on screen. Your 3 x 5 inch picture at 72 dpi is only 77,760 samples – less than 6 percent of the original. No wonder it makes a smaller file!
The dpi value is often called ‘resolution’, though unfortunately, like a lot of words in computers, it has a number of different meanings. But I’m going to use it in this way.
If your pictures start out with a well chosen resolution, Distiller can’t help you much. But, if they were a higher resolution than necessary, Distiller can throw away samples, which reduces the resolution and makes the file smaller. There are good reasons why pictures might have a high resolution. They might have been scanned for high quality print, but your PDF is only for screen use. Or, the picture might have been reduced.
(What do I mean by reduced? Suppose you have a 3 x 5 inch picture scanned at 72 dpi. But you decide to shrink it down. If you run it at half size, 1.5 x 2.5 inches, there are more samples in each inch, and the resolution is really 144 dpi – twice 72 dpi.)
So, Distiller can reduce the number of samples. There are two or three choices, and they differ in how much care Distiller will take to not lose details and change colours. There is a trade-off: the better qualities will take Distiller longer to prepare. The best quality will be ‘Bicubic downsampling’, while the fastest will be just ‘subsampling’.
Note that some things, like scanned text, need higher resolutions to be print clearly. Text should be scanned as monochrome, and you can set the monochrome options to have a higher resolution.
Putting it all together
Here is a screen shot from Distiller’s job options.
This shows the very least compression you should ever use. This is completely lossless, and quality is never reduced. Use less than this and you are just throwing away disk space. Notice that you should always use Compress Text and Line Art. This always uses ZIP, and never affects the quality.
Now, I’m not actually recommending this for most purposes, but if you must have lossless, use this. If you really want small PDF files the best approach is to experiment. I can’t tell you what will work best with your pictures, or what final quality you need. Do always remember that on the Web, most users will thank you more for small files, that download fast, than they will for really sharp high quality pictures.
Acrobat 4.0 has some suggested settings called ScreenOptimized, PrintOptimized (for an ordinary printer), or PressOptimized (for printing on a printing press). Don’t take these as the only options – use them as your starting point. You can save new options, but try to give them names which help you remember what they do.
Just when you thought it was safe…
Distiller offers a ‘4 bit’ option for ZIP. Take care, this isn’t usually what you want. It reduces a colour image to 4096 colours, or a greyscale image to 16 greys. For most work, this won’t look nice.
I mentioned that JPEG has quality settings, but the way Distiller describes them changed between 3.0 and 4.0. In 3.0, high means high compression, which is a lower quality. In 4.0 this is reversed so high means high quality and hence low compression!
One more point: there are some bugs in compression in Acrobat 4.0. It just does not compress so well as Acrobat 3.0. Adobe is starting to ship Acrobat 4.05 to registered users of 4.0, and this is a great improvement. Acrobat 4.0 does not lose quality, but files may be larger than required.
Investigating in detail
I don’t use this column to advertise my products, but I think this is worth mentioning. We have a tool Quite A Box Of Tricks which is an Acrobat plug-in. The important thing in this case is that you can download a free demo and make unlimited use of one of the features, without ever having to buy it. The feature you can use is ‘image info’. You can click on any picture in a PDF file and it will tell you information about its original size, how well it compressed, the resolution, the size in pixels (samples), and what kind of compression was used. Download demo