Author Topic: Xsane OCR setup [SOLVED]  (Read 1893 times)

Offline wharfhouse

  • Full Member
  • ***
  • Posts: 158
Xsane OCR setup [SOLVED]
« on: December 16, 2011, 02:36:31 AM »
Hi All,

Does anybody know how to setup tesseract ocr with Xsane under the preferences>setup>OCR and then how you actually go to perform this task?

I'm having a go for the first time and can't seem to make sense of it.  Searched the internet and here but there's precious little out there on this subject and what little there is is often contradictory.  Even Oliver Raunch doesn't go into any detail!

Any tips gratefully received!  :D
« Last Edit: December 23, 2011, 02:55:30 AM by wharfhouse »

Offline rich2005

  • Sr. Member
  • ****
  • Posts: 260
Re: Xsane OCR setup
« Reply #1 on: December 16, 2011, 10:41:17 AM »
Ask for YAGF to be included in the repo.  Used to use this in Mepis & just (10 minutes ago) stole this from a debian distro. Thats how long it takes to unpack ( 2 files) and set up. I know not very correct but ....

This is a very simple front-end, probably you can compile it yourself.  Works very well with tesseract ( or, as well as any linux OCR works)

screenshot: http://i.imgur.com/mWrRw.jpg

might even keep this one.

Offline rubentje1991

  • PCLinuxOS Tester
  • Hero Member
  • *******
  • Posts: 2108
  • Rubenus Parvus MCMXCI
Re: Xsane OCR setup
« Reply #2 on: December 16, 2011, 01:22:23 PM »
Ask for YAGF to be included in the repo.  Used to use this in Mepis & just (10 minutes ago) stole this from a debian distro. Thats how long it takes to unpack ( 2 files) and set up. I know not very correct but ....

This is a very simple front-end, probably you can compile it yourself.  Works very well with tesseract ( or, as well as any linux OCR works)

screenshot: http://i.imgur.com/mWrRw.jpg

might even keep this one.


In Package Suggest you can ask for this package  ;)
Thanks for the info rich2005

Offline wharfhouse

  • Full Member
  • ***
  • Posts: 158
Re: Xsane OCR setup
« Reply #3 on: December 19, 2011, 03:28:03 AM »
Thanks Guys,

But it didn't actually give me the answer I was looking for!!  Does that mean you don't have to do anything in Xsane setup OCR?  I did look at YAGF on it's website and actually looks a nice little frontend/gui (is that the right term?).  I downloaded OCRFeeder last night from Synaptic and gave it a go, but found it very poor in terms of user-friendliness and performance.

It begs the question, do you need Xsane to OCR a document?  Surely it's logical to simply open up an OCR application and import an image/document straight from a scanner via Sane in the background to then perform OCR and save the output to .txt or .odt.

I'll now suggest YAGF for the repo, be nice to give that a test drive to compare.  :) :)

Offline rich2005

  • Sr. Member
  • ****
  • Posts: 260
Re: Xsane OCR setup
« Reply #4 on: December 19, 2011, 05:19:26 AM »
You don't need xsane to use tesseract, I only suggested a front-end (such as YAGF) because it will call xsane for you, then take the output, OCR & save.  If you already have a scanned document in a suitable format (jpg, tiff etc)  then tesseract command line is (as per MAN). As you previously noted not a lot of info about tessereact.

tessereact   image.ext   text_file

and tesseract will OCR to a new file text_file.txt

One more advantage of YAGF, it will open a pdf for OCR which the command line does not. Still does not support retaining doc formatting however, at least not on my machine.

Never had much success setting up the OCR feature in xsane, which I think is your question, but then I was using gocr and the recognition rate was poor at best.  I will give it a try with tesseract and see what happens.

edit:
Looks like xsane OCR will not call tesseract directly, ie. the -i and -o parameters do not work. There is a bash script here
http://doc.u b u n t u-fr.org/xsane2tess     (come on guys, it is only for info - muxed 'tu reference in the hope of...)
which you can use instead of gocr in the xsane OCR setup. The site is French but the script is in English. Make executable, copy to /usr/bin   Tried it and it does work (had to remove -l parameter on line 78). Needs ImageMagick which is in the repo.
If you want more details send me a PM.
« Last Edit: December 19, 2011, 06:08:13 AM by rich2005 »

Offline wharfhouse

  • Full Member
  • ***
  • Posts: 158
Re: Xsane OCR setup
« Reply #5 on: December 20, 2011, 01:54:54 AM »
Good Morning Rich,

Thanks for your reply... useful info there.  I've got the French U b u n t u site open on another tab so I'll copy the script and make it executable and see what that's like  ;)

Meanwhile you probably noticed I put in request for YAGF and promptly got my knuckles wrapped by the ManBear!!  Does he always act like a bouncer at a nightclub?  ;D

Quote
If you want more details send me a PM.

Excuse my ignorance... PM??

Cheers!
 :)

Offline rich2005

  • Sr. Member
  • ****
  • Posts: 260
Re: Xsane OCR setup
« Reply #6 on: December 20, 2011, 05:56:04 AM »
Quote
I've got the French another distro site open on another tab so I'll copy the script and make it executable and see what that's like

My French is very-very average, but looking at the page, you use the -l (language) switch in the line replacing gocr
ie.  xsane2tess -l eng    That would save editing the script so it defaults to english.

Quote
Meanwhile you probably noticed I put in request for YAGF and promptly got my knuckles wrapped by the ManBear!!  Does he always act like a bouncer at a nightclub?

no real surprise there  

Quote
Excuse my ignorance... PM??
 

personal message via the forum or a email to the info on my forum profile.

Offline wharfhouse

  • Full Member
  • ***
  • Posts: 158
Re: Xsane OCR setup
« Reply #7 on: December 22, 2011, 02:36:03 AM »
Hi Rich,

Thought you'd like an update on progress (or lack of it!).

I managed to get the web page translated just in case I missed something important, copied the script into ~/usr/bin, made executable... didn't work for me!

On your recommendation I removed the -l parameter on line 78 - no change; put it back - no change; removed all the options in Xsane>preferences>OCR tab (-i -o -x) - no change; put them back - no change!

Couple of points that may account for this I'd like to run by you...
1)  according to the translation, in Xsane "Save" mode window > Save as filetype... they suggest "text" and yet by all accounts ocr has to be in TIFF format (.tif).  So, is this right?
2)  Xsane will only produce an output when "full colour mode" is selected for me.  Greyscale and black & white (lineart) just produces a jet black image.  This is in all Xsane modes.  Research would indicate that I'm stuck with this one due to the Sane hp3900 back-end (my scanner's an HP 3970).  I don't know if this matters or not.

So what does it do?  Well, I got error logs (xsane2tess.log) when I scanned in grey or lineart with messages like it didn't recognize the image file, no error logs with colour.  As an output ocr'd txt file the result was a blank document.

Reading the script would suggest that there should be some user interface while the process is running (echo lines).  There was no indication of any ocr activity... I don't know if this is normal or not.  I thought as it worked for you, you could comment on what I should be seeing.

A bit more research and I've found a replacement perl script for xsane2tess as there seems to be a common problem in trying to get the script to work.  I'll have another go tonight and report back in the morning.  :-\

Thanks for your help  :)

Offline rich2005

  • Sr. Member
  • ****
  • Posts: 260
Re: Xsane OCR setup
« Reply #8 on: December 22, 2011, 06:34:36 AM »
Quote
copied the script into ~/usr/bin

that would indicate your home folder, (a typo) needs to go to root /usr/bin

Never surprised when something does not work.

The alteration to the script at line 78 - for confirmation
this is the original line
tesseract "$TIF_FILE" "$TXT_FILE" -l "$TES_LANG" 1>&2
and I changed it to
tesseract "$TIF_FILE" "$TXT_FILE"  1>&2

I am not having any problems here with a grayscale scan.

just for info: some screenshots
the xsane OCR setup  http://i.imgur.com/l1bHF.jpg
the save dialog, such as it is  http://i.imgur.com/uR8dM.jpg
the saved text: http://i.imgur.com/iIrLj.jpg


Offline wharfhouse

  • Full Member
  • ***
  • Posts: 158
Re: Xsane OCR setup
« Reply #9 on: December 23, 2011, 02:54:47 AM »
Rich... we have success... I think!!

Your pictures told a thousand words; for a start I was in the wrong mode, you were in "viewer" mode whereas I was in "save" mode, and there are still a lot of anomalies, but I did get an ocr and pretty damn good recognition as well.  This is what I did/found...

  • Edited the script as you recommended and checked it was in the right location (root/usr/bin) which it was.
    Slightly altered the entries/options under xsane>preferences>setup>ocr tab.
    Went into viewer mode and made settings in there as per your pic.
    Previewed scan... jet black page! so as mentioned previously, did it in colour.
    Preview came up looking ok... press scan.
    Scan took place and "Viewer" window came up (never seen this before) with an OCR button on it!Scan looked awful with the text shadowed with RGB like a rainbow... that was no good, no point in even trying ocr.
    Tried again in "greyscale"... success even 'though it wouldn't preview.
    Adjusted gamma, brightness and contrast to sharpen the scan and rescanned/adjusted till I got a good clear scan.
    Finally press OCR and the SaveAs window came up as per your pic.
Now here I'm not sure... the bit at the bottom where it says "Type".  Should it be by ext or Text?  I tried both ways.

Anyway, clicked Save and here I came unstuck again.... kept producing empty text files (which was what happened before).  So I kept deleting them and tried other settings and so on and so forth taking up several hours!!  But in doing so I inadvertantly opened up a previous test file (txt) and found text in it!  What I hadn't realized it takes time for the ocr process to take place!!  Problem solved! ;D

So all very clunky and I need to now experiment to make the process smoother; but I will suggest YAGF  when they re-open in the New Year.

I still don't know why greyscale (AND lineart) won't preview but does scan after all (as shown in the Viewer window), and I don't know what part ImageMagick plays in all this if any, but I'll mark the post as solved which just leaves me to thank you for helping me out on this one.  It's a pity with all these Linux applications they only tell you half a story and leave you to muddle through the rest! but then I suppose we would never learn anything then!! ;D ;D ;D

I wish you a very Happy Christmas Rich and thanks.
 ;)