|
| Downloads: 753
| Submitted: Mar 9 2010 Updated: Oct 3 2010
| | Description:
This Dolphin/Konqueror service menu will give you a possibility to OCR images conveniently in your file manager window.
This is a very simple program. It OCR's a document and puts it into a file that has the same name as the OCRed image file but with a txt extension.
For the menu to be visible and have basic functionality (OCR tif files) you have to have tesseract-ocr installed and in your path, as well as the desired language packages. (The menu is tested against tesseract-ocr v. 2.03 and 2.04).
To be able to OCR png and jpeg images you have to have imagemagick installed. To be able to OCR pdf file you have to have ghostscript installed.
TRANSLATION: Find the translatable strings at: http://pastebin.com/QV7vV7jn, do the translation and forward those to me via a personal message or email.
INSTALLATION: Install through the Dolphin settings menu. If it does not work, LET ME KNOW and see the readme.txt file for alternative installation methods.
KNOWN PROBLEMS: none at present.
TROUBLESHOOTING: If you experience problems, and you get no output:
1. Ensure that you do have tesseract installed, as well as the appropriate language packs that you use. Also, check that you have imagemagick, ghostscript installed if you want it to work with images other than plain tif. In Debian/ubuntu, commands "dpkg -l | grep tesseract" "dpkg -l | grep imagemagick" "dpkg -l | grep ghostscript" will tell you what you have installed.
2. Check if the problem you experience is in the tesseract engine itself. To do this: a) download this image (http://ftp.akl.lt/users/dgvirtual/ocr_using_tesseract/testimage.tif) and run tesseract against it in console first using english ("tesseract testimage.tif testoutput") and then the language you have problems with (say – Spanish: "tesseract testimage.tif testoutput -l spa"). If you get the file testoutput.txt with text in both cases (it should not be nice), then the problem is not in tesseract or your tesseract installation. Otherwise, consult tesseract ocr website and forums for solution.
2. To troubleshoot the problems related to the service menu, test it against the image you downloaded in the previous step. If it works, test against other images in http://ftp.akl.lt/users/dgvirtual/ocr_using_tesseract/ – if it does not, there must be a problem with imagemagick (png, gif, jpg) or ghostscript (pdf) installations.
3. To figure out what problems could there be in the script and to get help from me, please run the shell script from the service menu archive ocr_using_tesseract.sh against the image with a trouble like this "ocr_using_tesseract.sh en image.png" (note the "en" and not "eng" here) and send the image, the output txt file and the output produced by command to me (my email is in readme.txt file).
Changelog:
v. 0.3 – 2010-09-30
- The service menu is now fully localizable
- Limitation for uppercase extensions removed
v. 0.2.1 – 2010-09-27
- removed a bug that prevented the service menu to be displayed for PDF
files
- removed the accidental likewise named file overwrite problem
- got the readme.txt file back
- added German translation to the service menu thanks to Rettich
v. 0.2 – 2010-09-24
- attempted to make it knewstuff3 compatible – must be installible through the Dolphin services.
- siplified operation – a dialog asks to choose language, while there is only one service menu entry now.
- fixed progress bar error.
- it seems that the problem with directory names with spaces is gone.
v. 0.1 – 2010-03-10
- Initial creation of the service menu.
License: GPL
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Add commentBack
|