We recently received a message asking, ‘What’s the best tool to do text searches of PDF files, regardless of whether they are text on image or not?’
In my most lawerly voice, I reply, ‘It depends.’ I would like to cover the key aspects of this in more than one post, because, like everything related to PDF, there’s a lot lurking beneath the surface.
First, some distinctions and definitions. There is a crucial difference between ‘text’ and ‘image.’ If you are talking about documents that were created directly from another program (like MS Word) you don’t really have to worry about separating those concepts. (Well, not yet…) However, if you are dealing with pages that were scanned from hard copy, you’ve got to do some conceptual work. Second, there is a difference (in Acrobat) between the Search command, and the Find command.