Editor’s Note: This is the first in what will evolve into a series of short, focused articles designed to help users create optimal PDF files — and to understand the various factors involved in doing so.
There’s a persistent myth that PDF files by their very nature are large in size; the fact is that in the majority of cases where file size is a legitimate issue, the result is due to poor authoring techniques, a lack of sufficient user knowledge about pitfalls and/or about what steps can be taken. Shlomo offers some valuable pointers. Watch for future articles in this new Best Practices series — and also, read some of his previous articles published on Planet PDF.
Author’s Note: Over the last few years I have examined countless PDFs available publicly over the Web and from other sources, and I would like to use the Best Practices series to share my insights and recommendations for producing better online/interactive, screen-oriented PDFs. Each article will touch on a specific aspect such as PDF file size, better bookmarks, search function, navigation and layout.
The examples used for this series are taken from the Acrobat 5.0 Help file (Acrohelp.pdf) and the PDF document collection (additional Help files and the SDK documentation) included in the Adobe Acrobat 5 CD — making it easier for readers to open the examples mentioned.
PDF architecture limits PDF file size to approximately 10GB, but reality imposes much more severe limits.
When producing PDF files for high-end press purposes, the highest quality possible is sought, and file size is not an issue. In this context, it usually does not matter whether the PDF for an entire book with some high-resolution images reaches a mammoth 65MB.
But when PDFs are to be viewed online or used interactively, file size is crucial.
Excessively large files will slow down response time, take more time to download, and may cause some Acrobat operations to ‘take forever’ or even to fail. Many different factors can contribute to a PDF file size, depending on the specific content, authoring applications and PDF creators involved. In most cases, it is possible to reduce the file size, sometimes even dramatically, without any loss of functionality.
Following are several possible reasons why a PDF may become much larger than it should.
Save vs. Save As
Acrobat’s ‘Save’ function is a ‘fast save’ which does not remove deleted objects from the file being saved (so that deleting 50 pages out of 100 and saving will actually result in a slightly larger file compared to the original). Only ‘Save As’ rewrites the entire file so that items no longer used are not stored in the file.
During the rewriting, Acrobat can also optimize the file, storing identical items only once and reference them in different pages. Thus, when finalizing a PDF it is beneficial to do a ‘Save As’ to get rid of extra baggage or deleted items, even if no change was actually done in that last session.
Cropped items still present
When pages are cropped in Acrobat, items included in the cropped area are still stored in the file. Based on the print/PDF setup, authoring applications may place items in a cropped area (file info, color bars, registration or crop marks) — so that your PDF may include items in the cropped areas even though you never applied cropping. When ‘innocent’ registration marks are created through high-resolution bitmaps, these may add noticeably to the file size.
Named destinations that are never used
Named destinations identify locations/views in a PDF file, and are used as the target of links or bookmarks. Some authoring applications write numerous named destinations, most of which are not actually used by any link. As each ten destinations take up about 1K, it is not impossible for a PDF with many paragraphs per page (such as indexes with multiple columns and small type or complex tables with many cells) to have 50% or more of its size related to named destinations that are not actually used.
Structure, introduced in Acrobat 4.05, has very limited support in Acrobat’s user interface, and the way Structure information is written by an authoring application may render it practically useless. However useless it is, it still takes up space. Together with other items, structure and article threads may be present in a PDF as the result of using the authoring application’s defaults rather than the result of an informed decision as to the specific required features.
(Note: Tagged PDF, introduced with Acrobat 5, is different than Structured PDF, and does offer benefits related to accessibility and reflow, depending on the anticipated use).
Due to bugs or user mistakes, items such as links may be duplicated – multiple identical items, one on top of the other, or an ‘extra’ duplicate on the next page.
When duplicate items are included in master pages or running headers and footers, they may be present on most pages, again adding to the file size without any benefit. Links (together with bookmarks and notes) are not compressed in Acrobat, regardless of compression settings.
Note: Fonts and graphics and related settings may affect file size significantly.
Distilling parameters related to these will be discussed in detail separately in later articles.
When embedding fonts, subsetting is recommended to reduce the font information. Type 3 fonts, whether you intended these to be in your PDFs or not, will increase the size of your file size significantly.
All types of objects should be compressed internally (through Distiller job options). Text and Line Art can only use lossless zip compression; bitmaps can be compressed using lossless (zip) or lossy (JPEG) compression. Depending on the original resolutions, downsampling may be essential. Also, whenever applicable, vector graphics are recommended over bitmaps. Text blocks should generally be based on text and not bitmaps.
Inefficient PDF Creators
Most PDFs are probably produced with Acrobat Distiller and PDFWriter, but several graphics applications offer PDF export which relies on custom PDF output mechanisms. Some graphics applications produce PDF files that are very inefficient in terms of internal storage.
Objects in PDF files can use ASCII or binary representation. ASCII PDFs are larger than binary PDFs files — depending on the content and number of items, difference in size may be in the range of 10-20%, or higher.
Version-specific Bugs & Issues
Version-specific bugs and issues may cause the same source PostScript files, distilled with the same distilling parameters, to create larger files, when using different versions of the product.
Depending on the specifics of PDF production and editing, there may be additional aspects which affect the size of your PDFs, including Text TouchUp operations, duplication of form fields, and merging pages/files.
Checking the Relative Weight of Items
Before distributing a PDF, check its size. In the course of doing this with different files, you will be able to establish a sense of correlation between the content, distilling options and the resulting file size.
It may be useful to keep track of previous releases of the same document; a sudden decrease in size may indicate an inadvertent change in distilling parameters, for example.
With Acrobat 5, you can use the new Tools > PDF Consultant > Audit Space Usage function to see the relative weight of different items, including images, content, fonts, links and annotations (now collectively called comments), form fields and more.
In addition, you can isolate specific pages, or write images to external files, to find out if there is a specific item which has a special effect on the overall size.
When you create a PDF with text, line art and bitmaps compressed in the distilling process, compressing the resulting PDF usually won’t reduce the file size significantly. However, Acrobat does not compress some items that are not part of page contents, such as links, bookmarks, form fields and destinations — so with some files you may still see a noticeable reduction when compressing the PDF file. If an optimal file size is important, research techniques and use add-on tools to deal with the specific aspects of your PDFs.
The help file for Acrobat 5 (Acrohelp.pdf) is probably (or will be, once all Acrobat users move to version 5) one of the more commonly used PDF files. Yet its producers do not seem to have made much of an effort to reduce its size. If you would like to try some of the operations described below, use a copy of the file.
Acrohelp.pdf was authored with FrameMaker 6.0, and its original size is 5.94 MB; using ‘Save As’ reduces the file size to 5.69 MB.
Using the Audit Space Usage function (Acrobat 5) shows that 11.7% of the re- written file is used by named destinations. However only 30% of these destinations are actually used by links in the current PDF, as the Optimize Space function reveals. You can inspect named destinations in Acrobat (Windows) Destinations, Scan Document button). None of the named destinations for each and every index entry at the end of the file (these start with G23.) is used as the target of links or bookmarks, as is the case with 5539 other named destinations in the Acrobat 5 help file. (The destinations for the A-Z group headings are used by bookmarks). After removing the unused destinations, the file size is 5.12 MB.
Beware: The Tools > PDF Consultant > Optimize Space function is limited to analysis of stand-alone PDFs, and the ‘Remove Unused Named Destinations’ option, turned on by default, does not check whether destinations are used by links in other files. After running this function on a document collection with cross-file links, many links that were previously valid become invalid.
This function is to be used only when producing a single, stand-alone PDF with no cross-PDF links. As far as I know, there are no links in other PDFs in the Acrobat 5 CD pointing to destinations in the Acrobat Help file.
So with these two simple operations, which do not affect functionality in any way, 0.82MB (14% of the original file) was saved so far.
When inspecting the results of the Audit Space Usage for the new file, 19.4% of the new file is reported as used for Structure information. The structure information was created by FrameMaker 6.0 (turned on in PDF Setup).
‘Structure’ sounds nice, but I have yet to see one use of Structure in the context of the help PDF that makes any sense (see more on structure in my review of PDF-related aspects in FrameMaker 6.0).
After removing the structure information, the resulting PDF file size is 4.8 MB. 18.5% is reported to be taken up by Comments, which is Acrobat 5 new term for all forms of links and annotations. The Acrobat 5 help file has only simple ‘go to’ links, which are very useful for navigation purposes. Turning on Acrobat’s Link tool to inspect the links, you can drag the links for next page and previous buttons in the header area and the footer area to reveal duplicate links underneath each, total of 1148 poor links which never get their chance to be clicked (yet take up around 140K).
Note: When zipping the 4.8 MB PDF, the result is a 1.9 MB file. The relatively high difference can be explained in part by interactive items that Acrobat does not compress (7803 links, 419 bookmarks and 2322 named destinations), but with access to the source files additional issues could have been inspected.
Taking into account only the items above, and without examining how efficiently the source files were constructed and distilled, it is evident that the same Acrobat help PDF could have reduced 20% of its body weight effortlessly (wouldn’t many of us like it to be so easy?).