PDFlib releases new text extraction package

The latest addition to the Munich-based PDFlib’s suite of developer products is entitled PDFlib Text Extraction Toolkit. Also known as PDFlib TET, the software package is used to extract text from PDF documents, converting it to Unicode strings while preserving font and glyph information. The toolkit is currently available as a library, component and a command-line tool.

Suggested uses include the development of software for searching text, implementing a search engine to process large PDF archives, extracting text for storage or translation, converting PDF text into other formats, content-based processing of PDF documents (e.g. highlighting keywords) and comparing text between multiple PDF documents.

TET has been designed for standalone use, and does not require any third-party software to run effectively. Additionally, the product is robust enough for multi-threaded server use, significantly increasing its capacity. Language bindings including Windows, Macintosh and a several UNIX versions are available for use with various programming environments.

PDFlib TET is currently available for purchase and download.

You May Also Like

About the Author: Dan Shea

Leave a Reply