Getting text out of your PDF

Is there any way to select text off of a pdf file and copy it to a Word document? All I have is Acrobat Reader, and it won’t let me select anything.

In many cases, you can copy the text from a PDF file and paste it.
But there are a number of cases where you might not be able to. It
isn’t always easy to tell what is going on, so first I’ll describe
what happens when it works, then cover why it might not work.

Before you can copy text at all, you need to select it. The trick
is, that Acrobat contains a number of different ‘tools’. The default
tool is the ‘hand’ tool, which is just used for scrolling. You
need to use the text selection tool.

In Acrobat Reader 3, this is the letters ‘abc’ surrounded by a
dotted line, while in Reader 4, this is the letter ‘T’ with a
dotted box next to it.

TIP: in Reader 4, some of the tools fold out. Just click the mouse
and hold it on the text selection tool, and you will see more,
related, tools, appear. This is important – without it you’ll
miss some of them. If a tool has a tiny triangle in the bottom right
of its icon, then it will fold out. This was new for Reader 4.
You can also use Edit > Select all to save having to select text
individually.

Acrobat Reader 4.0

You drag the mouse to select the text, and then copy it (Edit > Copy,
from the menus, or Ctrl+C in Windows, or Commmand+C on the Macintosh).
What could go wrong? Actually, quite a few things. So, watch out
for these problems.

It’s different in a browser.

In a browser you can still use the text select tool, but the usual
ways of copying don’t work. That’s because the browser sees
the copy instruction, but doesn’t bother to mention it to Acrobat!
Luckily, Adobe thought of this and added a special COPY button to
the Reader toolbar. It appears only when viewed in a browser.
The button shows two tiny pages, side by side. Select all is not
available when viewing a PDF in a browser.

It isn’t text at all.

You can see it – there on the page – text. What else could it be?
Actually, it could be a picture. The text could be a series of shapes
which look like letters, or a scanned page, so the text is actually
a bitmap. In these cases, trying to select text will seem not to
find it. Unfortunately, there’s not much you can do.

It copies, but it’s complete junk.

Some ways of making a PDF file will give you hopelessly jumbled
fonts. Think of it this way: most fonts have all of the letters
a, b, c and so forth. You can put them in a grid; for most fonts,
the letters will always appear in the same place. But some fonts
have the letters all over the place. Acrobat has no way to know
that this has been done, so it just copies the letters that you’d
get for a normal font. This often happens when creating a PDF
document in Windows, using TrueType fonts, and Acrobat Distiller.
Before Distiller sees the fonts, they are already jumbled up.
Sometimes just a few characters may be junk – these might be
in a different font, or use a special character not available to
other programs. Again, there isn’t much you can do.

I can’t even select text.

Sometimes, the text select tool is greyed out and can’t be used.
This happens with ‘secure’ PDF files. The creator of any PDF can
protect it – choosing whether or not to allow copying (and printing).
If a document is protected, you would have to contact the copyright
holder and ask for an unprotected copy to use. They might agree,
or might want a fee. Many people forget that almost everything on
the web is copyright, whether or not it is secure, and whether
or not it has a copyright notice.

The text is in columns.

Acrobat doesn’t understand about columns, so trying to copy text
from columns can be painful – it just reads right across the page.
But there is an easy work-around. Just hold the Ctrl key (Windows)
or Alt/Option key (Macintosh) when selecting the text, and you will
find you can drag around any rectangular area. In Acrobat 4,
there’s even a special tool, but remember to fold it out from
the regular text selection tool.

But it’s dozens of pages!

Acrobat Reader can copy only one page at a time. Acrobat Exchange
(the commercial program) on Windows ONLY can copy the whole file,
so long as you have enough memory (and patience). But if you need
to do this, perhaps you should reconsider; it’s almost always
better to go back to the original file, if you can.

You May Also Like

About the Author: Aandi Inston

Leave a Reply