Thursday, 17 May 2012

Library Updates

The new version of Ubuntu does not support tesseract v2.The OCR libraries have therefore been updated to the latests version. This, and the enevitable following refactor updated several other key elements of openDIAS.
  • Leptonica has replaced libtiff and freeimage, just because that's the one that's intergrated into tesseract, and it can be used for all the other image processing we are doing.
  • Poppler has been introduced to parse PDF files. We can now get thumbnail images for PDFs as well as imported images and scanned docs. A temporary hook has been put in place that will allow PDFs that have already been imported to be re-parsed to get the accurate OCR test and thumbnail image.