The new version (1.2) of the Java-based PDF text and metadata extraction library adds the ability to extract Adobe XMP (Extensible Metadata Platform) formatted metadata streams.
Such streams are important in the specialized PDF document workflows routinely seen in the pre-press and document management industries. With its new XMP support, in addition to its existing text extraction facilities, it can more readily service Java-centric application developers and service providers in these industries.
PDFTextStream supports all versions of the PDF document specification, and can read both 40- and 128-bit encrypted PDF files. It also provides seamless access to the text and metadata held in PDF files to Java applications and web services. In addition, the library includes a module providing drop-in integration with the popular Jakarta Lucene full-text indexing and search component.
Version 1.2 of PDFTextStream is now available.