Thursday, July 1, 2010

OCR for Linux

I tried a few solutions to extract text out of a clear image containing alan0@hotmail.com:
Install with: sudo apt-get install gocr tesseract ocrad

$ gocr -i email.png
aIan0hdmaiI.com

$ convert email.png email.tif # requires tif
$ tesseract email.tif out
$ cat out.txt
Ina rrykeeg a

$ convert email.png email.ppm # requires p[bgp]m
$ ocrad email.ppm
alano_no_mall.


Upscaling the image slightly improved the results but still none were correct.
Disappointing ...

No comments: