Wednesday, 30 May 2012
Well, that was much easier than I'd imagine it would be. The end product is almost exactly how it was describe in my earlier post. We just need to move all the current 'hard coded' English test into a language pack and replace with the framework variables.
Then it's time to localise the EN language pack in to some suitable languages - Gulp!
Tuesday, 22 May 2012
As an experiment, I'm looking into how much work is required to add a localisation framework. The idea being that the app web frontend will allow the user to specify a language (from allowed 'installed languages', or leave it to the browser client to negotiate the best language depending on users machine settings. The available languages will be negotiated, depending on the backend installed 'language packs', default to English if all options fail.
Thursday, 17 May 2012
The new version of Ubuntu does not support tesseract v2.The OCR libraries have therefore been updated to the latests version. This, and the enevitable following refactor updated several other key elements of openDIAS.
- Leptonica has replaced libtiff and freeimage, just because that's the one that's intergrated into tesseract, and it can be used for all the other image processing we are doing.
- Poppler has been introduced to parse PDF files. We can now get thumbnail images for PDFs as well as imported images and scanned docs. A temporary hook has been put in place that will allow PDFs that have already been imported to be re-parsed to get the accurate OCR test and thumbnail image.