Discussion:
simple OCR in php
Ray
2007-06-29 16:24:45 UTC
Permalink
Hello all,
I am looking for a way to incorporate some simple OCR into a php script. The
user will bulk scan a pile of invoices. I want the php script to look at each
invoice and read a number off the invoice. The image will then be renamed,
and be organized into a directory and the file name will be added to a
database. (all of these steps are straight forward once the number is read.)
I have no problem with a system that requires a special OCR font and/or some
sort of registration mark to help locate the Invoice number. Can anybody tell
me of any tools out there that can do this?
Thanks,
Ray
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Jay Blanchard
2007-06-29 16:32:34 UTC
Permalink
[snip]
I am looking for a way to incorporate some simple OCR into a php script.
The
user will bulk scan a pile of invoices. I want the php script to look at
each
invoice and read a number off the invoice. The image will then be
renamed,
and be organized into a directory and the file name will be added to a
database. (all of these steps are straight forward once the number is
read.)
I have no problem with a system that requires a special OCR font and/or
some
sort of registration mark to help locate the Invoice number. Can anybody
tell
me of any tools out there that can do this?
[/snip]

In short PHP cannot perform OCR functions. You could insert an OCR
application into the process and have the OCR app pass PHP the
information.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Ray
2007-06-29 16:44:23 UTC
Permalink
Post by Jay Blanchard
[snip]
I am looking for a way to incorporate some simple OCR into a php script.
The
user will bulk scan a pile of invoices. I want the php script to look at
each
invoice and read a number off the invoice. The image will then be
renamed,
and be organized into a directory and the file name will be added to a
database. (all of these steps are straight forward once the number is
read.)
I have no problem with a system that requires a special OCR font and/or
some
sort of registration mark to help locate the Invoice number. Can anybody
tell
me of any tools out there that can do this?
[/snip]
In short PHP cannot perform OCR functions. You could insert an OCR
application into the process and have the OCR app pass PHP the
information.
Exactly.
Are there any third party tools or OCR applications that anybody can
recommend? (I suppose you could write an OCR application in PHP, but that
sounds like an awful lot of work.)
Thanks
Ray
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Robert Cummings
2007-06-29 17:01:00 UTC
Permalink
Post by Jay Blanchard
[snip]
I am looking for a way to incorporate some simple OCR into a php script.
The
user will bulk scan a pile of invoices. I want the php script to look at
each
invoice and read a number off the invoice. The image will then be
renamed,
and be organized into a directory and the file name will be added to a
database. (all of these steps are straight forward once the number is
read.)
I have no problem with a system that requires a special OCR font and/or
some
sort of registration mark to help locate the Invoice number. Can anybody
tell
me of any tools out there that can do this?
[/snip]
In short PHP cannot perform OCR functions. You could insert an OCR
application into the process and have the OCR app pass PHP the
information.
Why can PHP not perform OCR functions?

Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
M. Sokolewicz
2007-06-29 20:57:42 UTC
Permalink
Post by Robert Cummings
Post by Jay Blanchard
[snip]
I am looking for a way to incorporate some simple OCR into a php script.
The
user will bulk scan a pile of invoices. I want the php script to look at
each
invoice and read a number off the invoice. The image will then be
renamed,
and be organized into a directory and the file name will be added to a
database. (all of these steps are straight forward once the number is
read.)
I have no problem with a system that requires a special OCR font and/or
some
sort of registration mark to help locate the Invoice number. Can anybody
tell
me of any tools out there that can do this?
[/snip]
In short PHP cannot perform OCR functions. You could insert an OCR
application into the process and have the OCR app pass PHP the
information.
Why can PHP not perform OCR functions?
Cheers,
Rob.
It's not so much impossible to perform OCR as it is unrealistic to do
so. PHP contains only the very most basic support for such things, not
to mention there are barely any (OS) libraries for PHP to do this, it's
simply not realistic to try and make one if you're "on your own". Other
languages (ie. C) have libraries, written by people specifically for
this purpose, so it seems clear to me: PHP is (currently) not the most
realistic language to try and do OCR with, instead it'd be a lot easier
(at this point) to "borrow" from another. Be this a low-level library in
C, or a full-blown package, it doesn't matter.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Tijnema
2007-06-29 22:13:12 UTC
Permalink
Post by M. Sokolewicz
Post by Robert Cummings
Post by Jay Blanchard
[snip]
I am looking for a way to incorporate some simple OCR into a php script.
The
user will bulk scan a pile of invoices. I want the php script to look at
each
invoice and read a number off the invoice. The image will then be
renamed,
and be organized into a directory and the file name will be added to a
database. (all of these steps are straight forward once the number is
read.)
I have no problem with a system that requires a special OCR font and/or
some
sort of registration mark to help locate the Invoice number. Can anybody
tell
me of any tools out there that can do this?
[/snip]
In short PHP cannot perform OCR functions. You could insert an OCR
application into the process and have the OCR app pass PHP the
information.
Why can PHP not perform OCR functions?
Cheers,
Rob.
It's not so much impossible to perform OCR as it is unrealistic to do
so. PHP contains only the very most basic support for such things, not
to mention there are barely any (OS) libraries for PHP to do this, it's
simply not realistic to try and make one if you're "on your own". Other
languages (ie. C) have libraries, written by people specifically for
this purpose, so it seems clear to me: PHP is (currently) not the most
realistic language to try and do OCR with, instead it'd be a lot easier
(at this point) to "borrow" from another. Be this a low-level library in
C, or a full-blown package, it doesn't matter.
Take the library of a C OCR program, write some PHP C code around it
and a new extension is born :)

Tijnema
--
Vote for PHP Color Coding in Gmail! -> http://gpcc.tijnema.info
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Robert Cummings
2007-06-29 22:15:52 UTC
Permalink
Post by M. Sokolewicz
Post by Robert Cummings
Post by Jay Blanchard
[snip]
I am looking for a way to incorporate some simple OCR into a php script.
The
user will bulk scan a pile of invoices. I want the php script to look at
each
invoice and read a number off the invoice. The image will then be
renamed,
and be organized into a directory and the file name will be added to a
database. (all of these steps are straight forward once the number is
read.)
I have no problem with a system that requires a special OCR font and/or
some
sort of registration mark to help locate the Invoice number. Can anybody
tell
me of any tools out there that can do this?
[/snip]
In short PHP cannot perform OCR functions. You could insert an OCR
application into the process and have the OCR app pass PHP the
information.
Why can PHP not perform OCR functions?
Cheers,
Rob.
It's not so much impossible to perform OCR as it is unrealistic to do
so. PHP contains only the very most basic support for such things, not
to mention there are barely any (OS) libraries for PHP to do this, it's
simply not realistic to try and make one if you're "on your own". Other
languages (ie. C) have libraries, written by people specifically for
this purpose, so it seems clear to me: PHP is (currently) not the most
realistic language to try and do OCR with, instead it'd be a lot easier
(at this point) to "borrow" from another. Be this a low-level library in
C, or a full-blown package, it doesn't matter.
Well I agree with that, but as you indicate, it is possible to do it in
PHP, just not particularly practical :)

Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Crayon Shin Chan
2007-06-29 17:02:32 UTC
Permalink
Post by Jay Blanchard
In short PHP cannot perform OCR functions.
Why? PHP provides all requisite functions/features so if someone was
sadistic enough and talented enough there's nothing to stop them writing
an OCR app using it.
Post by Jay Blanchard
You could insert an OCR
application into the process and have the OCR app pass PHP the
information.
That would be the smart choice though.
--
Crayon
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Jay Blanchard
2007-06-30 17:12:52 UTC
Permalink
[snip]
Post by Jay Blanchard
In short PHP cannot perform OCR functions.
Why? PHP provides all requisite functions/features so if someone was
sadistic enough and talented enough there's nothing to stop them writing

an OCR app using it.
[/snip]

Sure, but then the scanning device would have to be connected to the
server. I suppose you could open a socket and stream the information to
the server and then have PHP read and interpret the stream as it
arrives. See how complex this is becoming?
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Robert Cummings
2007-06-30 17:36:54 UTC
Permalink
Post by Jay Blanchard
[snip]
Post by Jay Blanchard
In short PHP cannot perform OCR functions.
Why? PHP provides all requisite functions/features so if someone was
sadistic enough and talented enough there's nothing to stop them writing
an OCR app using it.
[/snip]
Sure, but then the scanning device would have to be connected to the
server. I suppose you could open a socket and stream the information to
the server and then have PHP read and interpret the stream as it
arrives. See how complex this is becoming?
It was JUST as complex the first time someone did it in C, or Java, or
what have your for a chosen language.

Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Richard Lynch
2007-07-03 06:03:54 UTC
Permalink
Post by Robert Cummings
It was JUST as complex the first time someone did it in C, or Java, or
what have your for a chosen language.
No, it was more complex, because it wasn't PHP.
:-)
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Robert Cummings
2007-07-03 06:14:03 UTC
Permalink
Post by Richard Lynch
Post by Robert Cummings
It was JUST as complex the first time someone did it in C, or Java, or
what have your for a chosen language.
No, it was more complex, because it wasn't PHP.
:-)
Good point :)

Cheers,
Rob.
--
.------------------------------------------------------------.
| InterJinn Application Framework - http://www.interjinn.com |
:------------------------------------------------------------:
| An application and templating framework for PHP. Boasting |
| a powerful, scalable system for accessing system services |
| such as forms, properties, sessions, and caches. InterJinn |
| also provides an extremely flexible architecture for |
| creating re-usable components quickly and easily. |
`------------------------------------------------------------'
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Stut
2007-06-30 17:50:07 UTC
Permalink
Post by Jay Blanchard
[snip]
Post by Jay Blanchard
In short PHP cannot perform OCR functions.
Why? PHP provides all requisite functions/features so if someone was
sadistic enough and talented enough there's nothing to stop them writing
an OCR app using it.
[/snip]
Sure, but then the scanning device would have to be connected to the
server. I suppose you could open a socket and stream the information to
the server and then have PHP read and interpret the stream as it
arrives. See how complex this is becoming?
Maybe it's just me, but OCR and scanning are certainly related but are
by no means dependant on each other. Is it becoming complex or are you
over-complicating it?

-Stut
--
http://stut.net/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Richard Lynch
2007-07-03 06:03:02 UTC
Permalink
Post by Jay Blanchard
[snip]
Post by Jay Blanchard
In short PHP cannot perform OCR functions.
Why? PHP provides all requisite functions/features so if someone was
sadistic enough and talented enough there's nothing to stop them writing
an OCR app using it.
[/snip]
Sure, but then the scanning device would have to be connected to the
server. I suppose you could open a socket and stream the information to
the server and then have PHP read and interpret the stream as it
arrives. See how complex this is becoming?
No, the scanning device could be on a desktop that builds a folder of
files with names that can be tied back to the documents somehow.

Or, for what the OP asked for, the whole thing could be on a "server"
which is really a "desktop" where having the scanner connected would
be pretty normal.
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Andrei
2007-07-03 08:46:50 UTC
Permalink
Post by Richard Lynch
Post by Jay Blanchard
[snip]
Post by Jay Blanchard
In short PHP cannot perform OCR functions.
Why? PHP provides all requisite functions/features so if someone was
sadistic enough and talented enough there's nothing to stop them writing
an OCR app using it.
[/snip]
Sure, but then the scanning device would have to be connected to the
server. I suppose you could open a socket and stream the information to
the server and then have PHP read and interpret the stream as it
arrives. See how complex this is becoming?
No, the scanning device could be on a desktop that builds a folder of
files with names that can be tied back to the documents somehow.
Or, for what the OP asked for, the whole thing could be on a "server"
which is really a "desktop" where having the scanner connected would
be pretty normal.
It's better to focus on OCR code which reads and parses an image
file (usually tiff file). Obtaining the image is not that hard (at least
on linux).

Andy
Ray
2007-07-05 01:06:46 UTC
Permalink
Post by Richard Lynch
Post by Jay Blanchard
[snip]
Post by Jay Blanchard
In short PHP cannot perform OCR functions.
Why? PHP provides all requisite functions/features so if someone was
sadistic enough and talented enough there's nothing to stop them writing
an OCR app using it.
[/snip]
Sure, but then the scanning device would have to be connected to the
server. I suppose you could open a socket and stream the information to
the server and then have PHP read and interpret the stream as it
arrives. See how complex this is becoming?
No, the scanning device could be on a desktop that builds a folder of
files with names that can be tied back to the documents somehow.
Or, for what the OP asked for, the whole thing could be on a "server"
which is really a "desktop" where having the scanner connected would
be pretty normal.
actually, the scanner is on a desktop, and (to make a long story short) the
server has part of the desktop hard-drive mounted as a network drive. Cron
job tells php script to go look in a given folder on the desktop, process all
files, and clean up after itself, all at 3 or 4 am.
Post by Richard Lynch
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Richard Lynch
2007-07-03 05:51:19 UTC
Permalink
Post by Jay Blanchard
[snip]
I am looking for a way to incorporate some simple OCR into a php script.
The
user will bulk scan a pile of invoices. I want the php script to look at
each
invoice and read a number off the invoice. The image will then be
renamed,
and be organized into a directory and the file name will be added to a
database. (all of these steps are straight forward once the number is
read.)
I have no problem with a system that requires a special OCR font and/or
some
sort of registration mark to help locate the Invoice number. Can anybody
tell
me of any tools out there that can do this?
[/snip]
In short PHP cannot perform OCR functions. You could insert an OCR
application into the process and have the OCR app pass PHP the
information.
Really?

So that OCR routine I wrote to hack a CAPTCHA doesn't exist?

Weird.

:-)

If you really do want to write OCR in PHP, it's pretty trivial:

http://php.net/imagecolorat

You'll need to build up a "dictionary" of known characters and define
a "distance" function to decide when two characters "match" or not,
but it's not rocket science.

It doesn't even qualify as Artificial Intelligence anymore. :-)

But since you have standard un-obfuscated content, using exec() to run
a well-established OCR package might be easier.

Or not, as I could never get the dang things to work in the first
place, personally. :-v

YMMV
NAIAA
--
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Manuel Lemos
2007-06-30 02:51:19 UTC
Permalink
Hello,
Post by Ray
Hello all,
I am looking for a way to incorporate some simple OCR into a php script. The
user will bulk scan a pile of invoices. I want the php script to look at each
invoice and read a number off the invoice. The image will then be renamed,
and be organized into a directory and the file name will be added to a
database. (all of these steps are straight forward once the number is read.)
I have no problem with a system that requires a special OCR font and/or some
sort of registration mark to help locate the Invoice number. Can anybody tell
me of any tools out there that can do this?
I think you are looking for something like this:

http://www.phpclasses.org/phpocr
--
Regards,
Manuel Lemos

Metastorage - Data object relational mapping layer generator
http://www.metastorage.net/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Zeb Packard
2007-07-02 19:27:08 UTC
Permalink
Linux journal had an article for tesseract

code.google.com/p/tesseract-ocr

the files needed to be cleaned up first though (contrast black text
against white background), so understanding gimp or some other equally
functional command-line image editor is essential. Suggested
alternative was netpbm.sourceforge.net for image editing and for OCR
the alternative was ocrad. It was suggested that the images be scanned
in at 150dpi or greater.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Continue reading on narkive:
Loading...