A while back, I wrote a couple of pieces on the ongoing progress of PDF towards becoming the primary standard for paginated content and on how to optimize the PDF viewing experience for consumers of that content. In this piece, I’ll discuss one particular element of optimization (i.e., linearization, AKA Fast Web View), whose importance is growing chiefly because of PDF’s ever-increasing presence.
To understand why linearization is important (aside from being a requirement in some cases, as it is for FDA submissions), it is first necessary to understand what it is. Basically, linearizing a PDF file allows viewers to download each page as they read it. Since non-linearized PDFs must be completely downloaded before they can be viewed, this process can drastically reduce the lag between clicking on a PDF link and actually seeing that PDF in a viewer. The improvement in performance can be particularly marked when the PDF in question is very large.
Linearization can be a key factor in the viability of PDF for viewing content online. Due to the nature of the process, performance for locally saved files is not substantially affected by linearization (the entire file is already present and doesn’t need to be downloaded), so there is no pressing need to decide whether or not to linearize any given document. Indeed, the Save As command in Acrobat automatically linearizes all PDFs, although it refers to the process as enabling Fast Web View.
Nevertheless, linearization is not suitable in all cases. It works extremely well for static PDFs — that is, those created in advance of viewing and whose content and structure do not subsequently change. Ultimately, linearization is a form of post-processing that can somewhat increase file size, and which can only be conducted after a PDF has been created. In a non-linearized PDF, the information relating to particular pages is not organized sequentially. As Karl De Abrew put it:
Believe it or not, PDF isn’t a display format, as such. It is actually a container for content that includes information about how to organize and lay out that content. Behind the scenes, PDF code actually looks a bit like spaghetti, with angled brackets aplenty.
As such, the process of linearization involves reordering objects inside a PDF. Specifically, with due credit to iText’s Bruno Lowagie:
All the objects necessary to render the first page are moved to the first part of the file. Once all objects are present, a cross-reference table is created allowing a viewer to render the page. Then the additional objects necessary to render the second page are moved forward, and again a cross-reference table is created. And so on…
The post-processing nature of linearization means that it may be unsuitable for quickly viewing dynamically created PDFs. In the same post quoted above, Lowagie argues that dynamic creation followed by linearization would take longer than simply serving the file. While I haven’t seen any benchmarking figures, this seems pretty plausible. The other situation in which Lowagie argued that linearization could not increase viewing speeds was on mobile platforms. While the free Adobe Reader supports viewing linearized PDFs on Windows, Mac and Linux systems, he pointed out that, at the time of his post (in late-Feb of 2012), he was aware of no existing iOS or Android-based viewer that supported linearization.
Interestingly, the original development of the linearization process predates the relatively recent explosion of portable technology and mobile internet. Instead, it was originally devised with desktops and laptops and slower internet connections in mind. Despite this, the process seems well-suited to optimizing the online viewing of remote, static PDF content on mobile devices. Not only do such devices typically enjoy slower connections than their less portable counterparts, cellular connections are more susceptible to interruption than wi-fi or wired ones, particularly when one is on the move.
Further, unlike desktops and laptops, the way such devices access to the internet is characterized by multiple, small data requests rather than fewer, larger ones. Given the vulnerability of cellular data connections to interruption, this makes sense to minimize downtime. Nevertheless, it also means that, given an essentially uninterrupted connection, performance will be poorer for such portable devices than for laptops or desktops. You’ll know exactly what I’m talking about if you have ever used a tethered cellular data connection and noticed better performance on your laptop than on your smartphone, which is the actual source of the connection.
So we have now discussed what linearization is, along with why and when it’s important, but how do you do it? Well, as I noted above, it’s pretty easy if you have Acrobat. Acrobat’s Save As command automatically linearizes PDFs, and the option is enabled by default in Acrobat’s PDF Optimizer. Using Acrobat’s PDF Optimizer, it’s possible to toggle the feature, although this is switched on by default.
For application developers, Debenu Quick PDF Library supports the creation of linearized PDFs.
As for my advice about when to linearize, I suggest that, when a static PDF file is intended for remote viewing on a range of platforms, such as from one’s own website, just go ahead and do it. While most mobile viewers probably still don’t support the viewing of linearized PDFs, those using desktops, laptops or, running native viewers technologies, will likely see the benefits. While the relatively small increase in file size caused by linearization is unlikely to substantially impact performance, some may still prefer to also post alternative, non-linearized ‘compact’ or ‘mobile’ versions for those who wish to download entire files. This would also have the fringe benefit of allowing content providers to further optimize versions of their PDF files specifically for mobile devices (for initial guidance, see my earlier piece on optimization). While linearization can have great utility for static PDF files, I agree with Lowagie that linearizing dynamically created PDFs before viewing would seem to be counter-productive, due to the attendant performance reductions.
All in all, the use of linearization can substantially improve the PDF viewing experience, but is not necessarily suitable when PDFs are dynamically generated, or when the primary audience will likely download the entire file rather than viewing it remotely. Like all content-related decisions, determining whether to linearize can be informed by regulatory or technical requirements, along with the viewing habits of the intended audience. My advice though, is when in doubt, linearize.