The DEVONtechnologies Blog

A Word on OCR and Why It's Missing

February 2, 2009 — Eric Böhnisch-Volkmann

Last Friday we had to release DEVONthink Pro Office 2.0 public beta 2 without an embedded OCR engine – which is truly annoying for you and us.

Starting with version 2.0 of DEVONthink Pro Office we are moving from the IRIS engine to the ABBYY FineReader engine which produces way smaller PDFs, is more accurate, and much smarter to embed into our application. Where the IRIS engine was an external program remotely controlled by DEVONthink, the engine provided by ABBYY is a true framework that we can directly embed and control. You will see this soon in a much improved 'OCR Activity' panel and no other OCR windows popping up for every page.

So why did we leave out in this release? Naturally, as version 2.0 is based on OCR provided by ABBYY our license we have with IRIS would not cover IRIS-based OCR in a DEVONthink Pro Office 2.0 beta without paying for every single license on top of the ABBYY license. But we still have technical difficulties with the ABBYY engine, namely it simply crashes when you feed it with PDF files. We cannot deliver this — but we had to release a new public beta last Friday because public beta 1 expired this weekend. A catch-22.

We apologize for these timing inconveniences and we are working hard together with the ABBYY technicians in Moscow to solve this issue quickly. We will deliver either a new public beta release or an updated OCR component as soon as ever possible to re-enable you to run OCR on your scans. I will keep you updated on all progress here in my blog.

In the meantime, please simply add your PDFs to your database as you had in the past, you can convert them to searchable documents as soon as the ABBYY OCR component is back using Data > Convert > To searchable PDF. You can easily find all un-OCR-ified PDFs in your databases using a smart group looking for kind 'PDF/PS' and a word count of zero.