PDFs and their content – Part 1

Editor’s Note:
Jim King is a Senior Principal Scientist and PDF Architect at Adobe. This article originally appeared on Inside PDF, and has been reprinted with permission.

I think most technical people share a problem that I have: we have extreme difficulty in expressing ourselves in one simple sentence. I have this problem when responding to questions/issues about PDF. For example, I have a hard time responding to this inaccurate statement in a short sound bite:

"PDF is great because it is not editable and freezes the content."

Technically that statement is totally inaccurate but there are related statements that are true.  For example:

"PDF is great because it not only captures my content but allows me to chose and lock down the look and feel for my content."   or

"PDF is great because I can apply a document signature to the file after I create it and then people can detect if it has been tampered with between me and them."

And here is one that I encountered from my financial advisor: "I had always sent my customers paper spreadsheets in the mail because I didn’t want them to have my spreadsheet electronic files that have my intellectual content as far as the calculations and macros. Once I could make PDF files from my spreadsheets I can send them electronically and not worry."  Am I to conclude that his primary value to me was in his spreadsheets?

Editablility and Resuse

But to get at the issue of re-use and editability versus frozen content, I have to use quite a few sentences, in fact, the remainder of this blog and the following blog.

The first issue we have to get straight is whether something is a function of the PDF file format or of the software that processes it. If people are concerned about the PDF file format then they need to join the ISO committee that is now managing PDF as the ISO 32000 standard. Many of my previous blogs record the process of moving the ownership of PDF from Adobe to ISO which was completed in January 2008.

But most of the reuse issues are a property of the software not the PDF file format. So if someone doesn’t like the behavior of their current software they might consider looking for other software and/or convincing someone to provide software with the needed function. 

But just for example, there are degrees of resuse that have been incorporated into Adobe’s Acrobat viewer including the following:

  • Copy/Paste. If the author permits it, I can copy content from a PDF and paste it into other files. Adobe has spent a great deal of time and effort to make this work as well as it does, especially given the complexity of dealing with text.  Please see my previous blog entry about text in PDF.
  • Export. Acrobat supports exporting PDF content into various formats including .rtf, .doc, .html, .eps, .png, ,jpeg, .xml, .jpeg2000, .tiff, .xls, .ps, .txt.  I was almost alarmed when I opened Acrobat to obtain an accurate list and found so many format supported.  And there are choices and setting for many of these. I assure you that this represents a great investment by Adobe to provide this support for reuse of PDF content. Many of these export functions are imperfect but do provide a strong basic ability to reuse content.
  • Hybrid Files. One can make a "hybrid" PDF document that includes the author’s original source file as an attachment. This is supported as an automated feature by Open Office tools as well as the Acrobat tools that create PDF files form Microsoft Office products. This provides a final form PDF document with the editability of the original source that the author used to create the PDF in the first place.
  • Forms. A more sophisticated kind of hybrid file is supported by PDF fill-in forms. This is so cool that I am going to make the discussion about it a separate Part 2 to this entry. (I wonder if the reason I think this is so cool is because I defined the properties for the Acrobat forms prototype in 1993. Na!  It is just cool!)

If an author wants to inhibit the reuse of the content in their documents they can set properties within the PDF file to prohibit it. For some authors the content of their documents represent their intellectual property and they want to protect it.

So, if things don’t work to our liking it may be an authors decision or the software designers decision, but seldom should we hold this as a PDF deficiency. PDF is a cool tool.

You May Also Like

About the Author: Jim King

Leave a Reply