Online PDFs – What You Get Might Not Be What You See

When converting documents to PDF, we expect all attributes in the
original documents to be preserved. This is indeed the case, with
a few exceptions — some of which vary depending on the specific
Acrobat versions, authoring applications and printer drivers
used.

When the PDF file is intended for print purposes and not for
viewing on the screen and interactivity, we don’t have to be
concerned with any display or other problems (assuming of course
that the print results and quality have been tested and verified).

However, if the PDFs are to be used online, we must keep several
additional factors in mind: search/find capabilities, copy/paste
results, display quality and text editability (if desired).

You may find that PostScript drivers and Acrobat Implementations
which are fine for print production might not be good enough
for online PDFs.

In this article, I will briefly address a few of the problems
you may encounter, without discussing additional
application-specific aspects. One of my next articles will focus
on PDF display quality issues.

Find, Search, Copy and Paste

You have probably heard of, if not encountered in your own files,
search, copy and paste problems related to using TrueType fonts
in Acrobat. Fortunately, most of these problems have been solved
in Acrobat 4, IF the latest AdobePS drivers are used to produce
the PS file to be distilled (4.3.1 for Windows 95/98 and 5.1.2 on
Windows NT) AND only if Acrobat 4 Reader is used to view the PDF.

Another font-related problem is that some Type 3 fonts might not
be searchable; and in some rare cases, even Type 1 fonts are not
searchable (and text using them in the PDF might not be even
selectable).

Even if you use ‘perfect’ fonts, you might still experience
problems with the Find and Search functions as a result of the
way the text is stored internally in the PDF. In contrast
with your source documents, the PDF can be regarded as a
final-format print file. It has no paragraphs and no flow, and is
stored as isolated lines of text. Thus, for example, if the
phrase you are searching for is split between lines in a page
with multiple columns, or split between pages, it will not be
located. The same goes for hyphenated words split between lines.

Even Acrobat’s treatment of words has its own logic, which has
nothing to do with grammar. It treats space and punctuation
symbols as separators between words, but it also inspects the
physical distance between letters and uses the results to
‘decide’ when a new word begins. Although in most cases the
outcome will be as expected, there might be surprises. A common
instance of this is evident when letter-spacing is used.

The following example is the PostScript description of sample
text, demonstrating this letter-spacing this issue. Copy the
following lines to a text file, save it as ‘test.ps’ (without
creating a PS file) and distill it.


% start

/Helvetica findfont 11 scalefont setfont

48 640 moveto
(Electronic Documentation: 2000) show

48 512 moveto
2 0 (Electronic Documentation: 2000) ashow

showpage

% end

When you inspect the resulting the PDF, you will notice that in
the second text line, where letter-spacing is applied, there is a
single space between each of the letters (select the text and
paste it to a text editor to see this interesting ‘effect’). This
will also be noticeable in Acrobat when you select the text by
dragging. This is despite the fact that the source text you
distilled only has a single space between words, not between
letters.

In the above PDF,a single space is inserted between each of the
letters. In other cases, for example in the output from
applications such as Word and FrameMaker, the spaces added will be
inconsistent (depending on factors such as kerning and the
specific font in use), so that ‘2000’ might appear as ‘2 00 0’ or
‘2 000’.

When viewing or printing the PDF, this will not be noticeable.
However, when using Find/Search, the item you are trying to
locate will not be found (unless of course you search for it
using the same irregular spaces between letters…).

Depending on the specific driver used, this may also cause the
wrong words to be highlighted when you use the search function.

If you use letter-spacing in your source files, check and see
what effect it has on your PDFs. I have encountered cases where
even though the letter-spacing was applied only to the running
header, it rendered the Search function useless for all of the
text.

Fake Bold

Several applications let you specify a bold font weight even if you
don’t have the actual bold font installed. In Microsoft Word, for
example, you can select Arial Black and also activate the Bold
property. Arial Black Bold? There is no such font. Word in fact
creates a ‘synthetic bold’ property for that text. But what
happens when you convert the document to a PDF?

With some Microsoft-based PostScript drivers, the synthetic bold
property will be ignored altogether, and you will have regular
text.

If you use the AdobePS driver, the ‘bolded’ attribute is
implemented as multiple instances (usually four) of the same
text, repeated with very small offsets.

The bolded text in the PDF will have a strange display effect. If
you inspect the screen display using large magnifications, you
will probably be able to see multiple layers of the same text.
And even if you cannot see the effect, try to edit such text in
the PDF file, and you’ll have to work your way through editing
different layers, which is not practical to say the least.

Copying and pasting an entire line of ‘bolded’ text will result
in each word pasted a number of times.

Driver-Level Font Subsetting and Missing Characters

Font subsetting and embedding are often used when distilling
PDFs: font embedding for PDFs with text that preserves its
look and feel, and subsetting to keep the PDF file size down.
This combination makes sense in almost all cases… unless you
need the text to be editable (for example, if you want to have
the option of calling your print producer asking them to change
the release date on the cover). Text which is subset can not be
edited, since Acrobat does not handle fonts for which it has only
partial letter descriptions. So if you need the text to be
editable (to the limited level which is possible in Acrobat),
subsetting has to be turned off.

And note that there are cases where subsetting is turned off in
Distiller’s job options, but fonts are nevertheless subset in the
PDFs, due to driver-level subsetting.

Driver-level font subsetting takes place with some PostScript
drivers. To reduce the amount of data sent to the printer,
character descriptions are sent only for the those characters
actually used. Distilling PS files produced by these drivers
(including AdobePS 4.2 – 4.3.1) will therefore result in PDFs
where the fonts are subsetted even though subsetting is
not activated in the Distiller job options. Again, this is
usually not of any special consequence, except that the subset
text is not editable in the PDF, if text editability is required.

With PostScript fonts, driver-level subsetting can be disabled by
choosing ‘Don’t Send Fonts’ (in AdobePS 4.x this is done by
accessing the PostScript driver properties, and clicking the
PostScript tab, Advanced). The PS file produced will not include
the fonts, it will be Distiller’s responsibility to access them
and handle them according to the job options.

With TrueType fonts, driver-level font subsetting cannot be
effectively disabled.

Note that the ‘Don’t Send Fonts’ option should not be used with
laser printers, because the fonts will not be sent to the
printer!

Another important implication of driver-level subsetting
(partially resolved in version 4.05), Acrobat is not always aware
of driver-level subsetting. If you use the Insert/Replace Pages
function, taking pages from different PDFs (where the same font
was subset differently due to different text content), the fonts
are not merged. The result is that some characters are not
included in the current font subset and therefore disappear in
the resulting PDF, with no warning or error message.

When Acrobat 4.05 encounters different subsets for the same font,
it does not merge the subsets of the different PDFs, but it at
least displays a message saying that it cannot handle different
subsets of the same font, and the insert/replace pages operation
is not carried out.

If you use earlier versions of Acrobat and employ the
Insert/Replace Page function, inspect the results very carefully.


You May Also Like

About the Author: Shlomo Perets

Leave a Reply