The process of making a scanned PDF searchable is often referred to as ‘OCR’, which simply means ‘optical character recognition.’ I don’t typically OCR my office files, but I do OCR documents that are used in my cases. Why not OCR all of the documents that one scans? Quite simply it isn’t worth the extra time it takes to run the OCR.
When I am scanning day-to-day stuff I want to get the documents digitized quickly and then toss the paper. If I had to OCR the stuff I scan every day it would make the process take at least 4 times longer. But with case documents I’m willing to OCR because (1) I tend to scan in large batches, as opposed to individual documents, and (2) the benefit of OCR is much more likely to be something I’ll take advantage of, so the extra time it takes to get the documents digitized is worth it.
Of course, it’s possible to batch OCR a bunch of PDFs at once. And if you want to do this I recommend Rick Borstein’s excellent blog post on this subject. One thing that Rick’s article doesn’t cover is: what do you do if you want to have the batch process run automatically every night?
I’m not really sure, because I’ve never used any software to do this, but I can point to a couple of possible solutions (all of them Windows-only, and none of them inexpensive): (1) Autobahn DX, which costs between $1,600 and $2,695 depending on which level you buy, and (2) File Convert, which has a $600 entry level version.
If any of you have addressed this issue and have suggestions I’d love to hear them. And if anyone knows a Mac way of having OCR run in batch at regular intervals that would be appreciated as well.
Update: and if you are interested in how to OCR PDFs inside of a Portfolio, Rick Borstein has a great article on that as well.