Scanned documents to high-compressed PDF

Color scanning is state-of-the-art today. Scanner hardware can produce color scans with high-quality from small MFP scanners up to high-end production scanners.

Scanners also deliver a high performance for color scanning with the typical 300dpi resolution in business environments.

A high-quality scan with 300dpi in color typically creates a large data volume with e.g. 25 MB as totally uncompressed TIFF for one page letter size and this not practical in the network and applications. Therefore, black-white TIFF or JPEG for color was used but has obviously many disadvantages.

PDF is a modern file format which allows for advanced compression of scanned documents. PDF supports compression schemas like JPEG 2000 and JBIG2 and especially, the so-called Mixed-Raster-Compression allows for high compression while preservering a very good quality of the scanned document.

Compression and quality depend heavily on the used tools for PDF conversion and sophisticated solutions can create color scans with approx. 40-80 KB per page which is normally the size if black-white TIFF is used.

Another major benefit of PDF for scanned documents is that PDF allows easily to make the pixel raster images of a scanner full text searchable. By deploying OCR to the scanned document, the document becomes ‘intelligent’ instead of being a simple image. Advanced solutions create full-text searchable PDF files and have the option of extracting the OCR recognition results separately e.g. for adding this into a full text search database for all documents. It is more ‘by the way’ starting from the archiving perspective). Very often, the scanned documents are processed in the according business workflows and at the end of processing, they are stored in the archive.

Long term archiving is the goal of the ISO standard PDF/A which ensures that those important business documents will have an identical reproduction in the unknown future. Simply speaking, PDF/A guarantees a long term safe digital paper.

For scanned documents, PDF/A-2u is best practice since many years. Digital mailroom is one of the main applications where all incoming paper letters are already scanned in the beginning. A lot of enterprises have digital mailroom applications already in place and color scanning, OCR and PDF/A are reasonable optimization steps for this operation.

Organisations also have a lot of digitization projects where existing paper like insurance files, customer files, etc. shall be digitized for a complete document management system.

You May Also Like

About the Author: Thomas Zellman

Leave a Reply