Since its beginnings in the 90s, PDF has grown and changed dramatically, to the point where the ultimate fruits of the original ‘Camelot’ project include a family of formally recognized international document standards. In order to commemorate more than 20 years of PDF, we here at Planet PDF have conducted a new series of interviews with current PDF Masters, movers and shakers in the PDF universe.
In this instalment, we talk to CEO and chief software architect of the iText group, Bruno Lowagie. Like many PDF gurus, he originally did something else, having completed a master’s degree in civil engineering (architecture). He had a long association with Ghent University, including time as a student, developer and IT project manager. In 2000, he released the original version of iText, a PDF development library, as an open source project. He went on to pen two books about iText, published in 2007 and 2010 respectively, and started his journey into entrepreneurship in 2008, when he founded the first iText company. Today, there are three companies under the iText Software name, one of which was recognized by Deloitte as the #53 in the Technology Fast 500 contest for the EMEA region in 2013.
It seems that Lowagie never lost his interest in education. In addition to his two books, he is a regular speaker at such conferences as JavaOne, and iText hosts its own annual summit where PDF developers can gather to discuss issues, techniques and use cases. iText also posts free, informative videos (including entire sessions from the iText Summit) via its YouTube channel. As a member of the committee behind the next official version of the PDF standard, PDF 2.0 (ISO/DIS 32000-2), Lowagie is also in a good position to comment on the current and future direction of PDF.
PLANET PDF: When and why did you first get involved with Acrobat/PDF?
BRUNO LOWAGIE, CEO, iText: I was working for Ghent University on a project for the Student Administration Department that required a wide variety of documents. I promised I would provide PDF documents because up until then, the documents could only be produced from a CLIPPER application that only worked on DOS and could only print to an HP printer that wasn’t a network printer. Creating PDF in an intranet application would allow the student administration to serve a much broader audience of people because the PDFs would be available on Mac, Linux and other OSs, instead of only on Windows DOS. The year was 1998 and at that time, I couldn’t find any library that was suited to run on a web server, and, more specifically, that could create documents with more than 10,000 pages in a matter of seconds. Soon it became apparent that I would have to write my own library. So I started reading the PDF spec during the Christmas holidays of 1998. At first I really hated PDF. I wrote my first library in about 6 weeks and it had two major flaws: (1) to use it, you needed to be a PDF expert, and (2) I was the only person at the University who understood my code, so I was the only person who could update/debug the application.
The more I got involved in making PDF documents with this first library, the more I started to understand and love PDF. That’s why I started a new project on the side: I wanted to create a library that would unlock the power of PDF even for those developers that didn’t know anything about PDF. Knowing how ubiquitous iText is today and how many copycats have emerged since the first iText release in 2000, I think I have accomplished this mission.
PLANET PDF: For those who don’t know, what is it that you are doing with PDF right now?
LOWAGIE: I’ve gone through a fascinating evolution where I started off as the engineer who wrote a PDF library, then became the writer who wrote a book for Manning Publications, then the compliance manager who cleaned up the IP of the library, and eventually the businessman who founded a group of companies to commercialize the open source library. In the last year, I’ve spent a lot of time professionalizing our business, while at the same time making sure that we keep innovating. As a member of the ISO committees for PDF, I worked on the upcoming PDF 2.0 specs, and we also created a challenging technical roadmap for the further development of iText. In the last couple of years, we have focused on converting XFA to PDF/A and PDF/UA.
PLANET PDF: What’s your next PDF project (as much as you *can* say, at least)?
LOWAGIE: Now that we have mastered creating PDF/A and PDF/UA documents (from scratch, from XML, from XFA), we are taking a look at the billions of unstructured PDFs that exist in the wild. The challenge of ‘Big Data’ in the world today is still making data ‘actionable’ by marking up content, grouping and annotating it for use. We’ve done some projects for customers where we created structure where there was none (for instance: creating a table of contents based on the analysis of the content streams of a document). Our next ambitious project is not only to be able to mine for data in unstructured PDFs, but also insert that structure into the PDF. We know that it’s impossible to achieve this with every PDF that is out there, but we hope that we can get acceptable results for large sets of PDFs with a ‘predictable’ structure.
PLANET PDF: Briefly describe the most significant change in the evolution or use of the technology since you first began working with PDF, and why do you consider it significant?
LOWAGIE: The first PDF specification I read was PDF 1.2. Things were simple back then when compared to today’s spec. Along the way, many features have been added to PDF and not all of them were successful. Does anybody remember Adobe MARS? I really didn’t understand why you would want to replace PDF syntax with zipped XML files other than to try to mimic XPS. Initially, I had mixed feelings about XFA, but I changed my opinion. In fact, the XML Forms Architecture is a great way to use PDF as a container for dynamic data in a workflow. At the end of the workflow you can convert the interactive document into a static structured document, and digitally sign it if necessary. For example, you can create a PDF/A-3A document where the XML structure resolves into a tagged PDF and where the original XML source is added as an attachment. Unfortunately, there are now forces in the ISO committee who insist on deprecating XFA in PDF 2.0. Also, I have seen some cool use cases for combining SWF (Flash) and PDF, but I soon decided to abandon this path because support for the technology was rather poor.
The most significant event in the history of PDF was when Adobe brought the spec to the ISO. ISO-32000-1 is an important milestone, and I really like the goal of ISO-32000-2: cleaning up the specification and replacing parts of the spec that were open for interpretation with articles that leave no room for ambiguity. Sure, the process of making a new version of the specification under the auspices of ISO is slow, but it’s useful and I’m happy I can be part of it.
PLANET PDF: With Creative Cloud, Adobe is shifting towards more of a focus on a subscription-based model. What impact do you think this will have on the world of PDF, both from a developer/solutions provider standpoint and user perspective?
LOWAGIE: With Adobe focusing on the subscription-based model, they allow other companies to address an unserved audience.
From a developer’s point of view, we are very different from Adobe in the sense that our source code is open. You can look into the code and improve it if you’re not happy with how it works. With iText, you have a programmable PDF engine you can use to enhance your web and other applications with PDF functionality. You can use iText to build your own service or solution to compete with Adobe’s subscription model.
From the point of view of the solutions provider, we are a natural fit. We provide technology at an affordable price; we don’t compete with integrators who sell man hours. Solutions providers can afford to be generalists who can rely on iText Software whenever they need PDF expertise.
As for the customer: at the iText Summit in Cologne, several European Industry leaders asked us if (1) iText was developed in the US or Europe, and (2) if they would depend on US servers if they became an iText customer. The answer pleased them: with iText they can have a product whose source is mostly written in Europe and completely open. They can have the complete freedom to choose where to deploy iText (cloud, dedicated server, desktop).
PLANET PDF: Pondering the future of PDF, what most excites you about the next few years?
LOWAGIE: PDF is ubiquitous, but when I look at Document Management Systems and Content Management Systems, I see that many of them still look at PDFs as if they were images. It’s an exciting challenge to help and change this by evangelizing new standards such as PDF/A-2, PDF/A-3 and PDF/UA. I know from experience that it’s not easy to change the mentality of people or to convince them to adopt new versions. I hope that we succeed in promoting PDF 2.0 once the spec is officially released, and that many companies choose PDF 2.0 as the new standard for their documents. I realize that this will require updates for PDF/A and PDF/UA as well, but I’m looking forward to being part of that process.
PLANET PDF: Briefly describe a common misconception about or frequent problem you’ve seen with PDF that you’d like to try to clarify for others and/or provide a tip to address.
LOWAGIE: This post on Stack Overflow is a good example. I quote: ‘I have a PDF document which contains placeholders for text that I need to identify and be replaced or just delete that text.’
Some people still think that PDF is a format suited for editing a document.
I also see plenty of people who say: I want to convert my PDF to Excel. I know that some tools claim that this is possible, but the results aren’t always accurate. People often say, ‘When I open the document in Adobe Reader, I see a table. Why can’t I just extract that table?’ When I ask them whether their PDF is tagged, they often don’t understand the question. People really need to be educated about the different standards and the different types of PDFs.
PLANET PDF: What are your favorite PDF tools, applications, SDKs or services (unrelated to your company or business) and why?
LOWAGIE: We’ve built our own SDK and a viewer (RUPS) that doesn’t render the PDF, but that allows us to look inside a PDF, at the PDF structure and syntax. Obviously, we also use Acrobat Pro to create or manipulate PDF documents manually. We use Adobe LiveCycle Designer to create XFA forms. In our test suite, we use GhostScript to create images of the PDFs we create. We then compare these images pixel-by-pixel, with reference images to find out if a change in our code resulted in a change to the rendered PDF. Otherwise, I can’t think of any other PDF tool we use on a regular basis, except for a series of viewers to check what our PDFs look like. The advent of viewers such as pdf.js was problematic. Some developers claimed that there was a problem with our SDK, because PDFs created by iText didn’t render correctly. As it turns out, those PDFs conformed with ISO-32000-1, but weren’t rendered correctly because pdf.js didn’t support the full spec.
PLANET PDF: How has developing with PDF changed since the formal recognition of the various PDF standards?
LOWAGIE: After ISO-32000-1 was released, some people at Adobe pushed me to join the ISO committee, so I did and I really enjoy it. Whenever something wasn’t clear to me in the spec, I had contacts at Adobe who were always willing to explain how a specific paragraph should be interpreted, but now I can really participate in the discussion. Personally, I didn’t have a problem with Adobe owning the copyright on the specification, but I notice that it is important for some of our customers that PDF is now an ISO standard. I think Adobe made the right decision at the right time to make PDF indispensable when compared to other document formats.
PLANET PDF: How has the proliferation of relatively powerful mobile devices and widespread data access changed the way people work with PDF?
LOWAGIE: We have a version of our software that runs on Android, but we don’t see a lot of applications that create or manipulate PDF on a mobile phone yet. For now, it still makes more sense to create PDFs in the cloud. There is a market for using iText on tablets though: where documents need to be created off-line (for example, by sales representatives who are on the road, or patients or doctors filling out charts, and so on).
Widespread data access has a much greater impact. Documents can be stored or even created in the cloud so that consumers can access them wherever, whenever. In a broader context, I see a lot of opportunities where PDFs serve as data containers for both human, as well as machine consumption. Take for instance the German ZUGFeRD standard. This is a standard way to create invoices based on PDF/A-3. The PDF is there for human consumption, so that a human being receiving the document knows what it’s about. However, the standard also requires the presence of an XML attachment that allows machines to extract the relevant data from the invoice in an unambiguous way.
We’ve done a project where we had to extract the relevant lines from credit card statements from VISA, MasterCard, and American Express to feed a Big Data database. We know how difficult it is to get data from an unstructured PDF in an accurate way. Finding structure in a document, adding structure to a document, so that the document can be a self-contained, archivable data container is one of the new raison-d’êtres for PDF.
PLANET PDF: What impact has the rise of mobile/portable had for those providing PDF-based solutions?
LOWAGIE: One of our customers has built a service that could very well replace all school books in the future. Thieme-Meulenhoff started a project where kids had to buy an iPad instead of school books. Our customer Q42 has built an application on the Google App Engine that managed the book in the Cloud.
Doccle is another nice example of how keeping your documents in the cloud makes sense. Doccle provides a personal cabinet where third parties can post document that are relevant for you. For instance, when you shop at a Doccle partner, the shop can file your receipts, warranty documents, and so on in your personal cabinet. This way, you never lose an important document. You won’t have to turn your house upside-down to find that warranty document to get a refund. You’ll always know where to find it.
PLANET PDF: Where do you see the most important functional gap in what’s out there? Tell me about your dream PDF tool, SDK or whatever. Why do you think it doesn’t exist yet?
LOWAGIE: I miss a good solution for dynamic forms. That is, I want forms that can communicate with external data sources and forms that allow a PDF document to grow dynamically. For instance, suppose you have a form for an invoice. If you select a type of product, the form would then load a selection of available products for you to select in the form. If you only add a couple of invoice lines, the end result could consist of one page. If you add more invoice lines, an extra page would be created, repeating some elements present on the first page, such as a header and footer.
This functionality more or less exists. That’s what the XML Forms Architecture (XFA) is about, but unfortunately, there has been a vote to deprecate XFA in PDF 2.0 and Adobe seems to be abandoning all things related to LiveCycle.
The main problem I see with XFA is that the spec is huge and there are a dozen different ways to arrive at the same result. My dream is that somebody would clean up the spec (maybe turn it into an ISO standard) so that supporting XFA isn’t such a labor-intensive job.
PLANET PDF: Is there anything important you didn’t feel that we covered? Please tell me and our readers about it!
LOWAGIE: We could discuss every single aspect of PDF in even greater detail. To those readers, who may be interested in increasing their knowledge of PDF, I recommend reading one of the books I’ve written, called The ABC’s of PDF with iText: PDF Syntax Essentials. It’s a short introduction to the syntax of the Portable Document Format, and it’s available on Leanpub for a nominal price.
Also, if you’re planning on attending the JavaOne conference this year, I will be doing a talk called PDF is dead, long live PDF…and Java! A recording of it will be available on the iText Youtube channel shortly after the conference.
PLANET PDF: Thanks for your time!