Sunday, February 19, 2012

How to do field content search in sql

Hi everyone
I have a silly question about SQL, I wonder if it is possible using
existing technique to accomplish it.
I have a binary field(e.g. Image) in SQL, I need to store image
file(scanned from original document) in that field. I don't think it is
possible but my boss want me to give him at least an alternative
solution: Can I search the text in this field?
To make it clear:
I scan a paper document, I get a jpeg file, the original file contains:
text object, handwritting object. There is a word "Bush" in the
document. I store this jpeg file in a field (type of Image) in SQL
database, Now I want to do a search for "Bush" in that field in the
database.
Can I do that?
I told him it is impossible to do that but you know he is looking for
kind of alternative solution, he doesn't care money.
I told him I can search for text object, if you put some description on
that image field, then I can search those description. But he want to
search the whole document.
Can I do sort of recognization and extract content from the scanned
jpeg, and then store these extracted content in sql so that I can do
some search?
Thanks.You can't do that out-of-the-box with SQL Server. However, there are
plenty of document imaging systems that will scan and index documents
into a database for you. You should probably take a look at some of
those packages.
David Portas
SQL Server MVP
--|||Raymond,
David, this can actually be done with a slight and very easy modification to
Raymond's procedures...
Raymond, when you scan a paper document can your scanner or scanner software
save the file as a Tagged Image File Format (TIFF) file? Most scanners can
store files as TIFF as well do OCR too... If you can then you can use the
TIFF IFilter (mspfilt.dll) as it filters files with the TIFF extension. This
filter gets installed by MS Office 2003 & MS Office XP. If you cannot, you
could possibly use one of several JPEG IFilters, for example the XMP IFilter
lets you index JPEG, GIF, TIFF, PNG, PS, EPS, PSD, AI and SVG files (see
http://www.ifiltershop.com/xmpfilter.html) or JPEG IFilter - JPEG Content
Filter for Microsoft Indexing Service (see
http://www.aimingtech.com/jpeg_ifilter/). While I've not tested these JPEG
IFilters with SQL Server 2000 FTS, but as they say an ifilter, is an
ifilter, is an ifilter...
These IFilters work the same way and use the same technology as Adobe's PDF
IFilter that can used with SQL Server 2000 Full-text Search (FTS). You store
the binary file in a column defined with the IMAGE datatype and then alter
your table and add a new column for "file extension" and define this column
as char(3), varchar(4) or sysname and populate it with "pdf", ".pdf"
respectively.
Below are a couple KB articles related to the TIFF IFilter:
Q321820 FIX: Non-OCR/Non-Display TIFF Data Indexed by SQL Server Full-Text
http://support.microsoft.com/defaul...b;en-us;Q321820
Q283950 SPS: Some Character Sets Are Not Supported for OCR by the TIFF Index
Filter
http://support.microsoft.com/defaul...b;en-us;Q283950
Q291835 SPS: TIFF Filter Stops Working After You Start the Windows
Components Wizard
http://support.microsoft.com/defaul...b;en-us;q291835
Q294303 SPS: TIFF Filter Does Not Perform OCR on Indexed Files
http://support.microsoft.com/defaul...b;en-us;Q294303
Q. Can I do sort of recognition and extract content from the scanned jpeg,
and then store these extracted content in sql so that I can do some search?
A. Yes, if your scanner &/or scanner software supports OCR or Optical
Character Recognition, then you will be able to extract the content.
Hope that helps!
John
--
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
news:1125682987.259473.144690@.g44g2000cwa.googlegroups.com...
> You can't do that out-of-the-box with SQL Server. However, there are
> plenty of document imaging systems that will scan and index documents
> into a database for you. You should probably take a look at some of
> those packages.
> --
> David Portas
> SQL Server MVP
> --
>|||Hi John and David,
Thank you for your reply.
John, I tried your script on your blog to set up a full text search, I
stall couldn't get any matching file when I do my search. For example, I
inserted a txt file containing "John" into the image column of table
FTSTable, and then I search it in that table, no luck.
Did I miss anything? I followed every step on the script on your blog.
Thanks.
*** Sent via Developersdex http://www.examnotes.net ***|||Raymond,
Could you review your server's Application event log for any "Microsoft
Search" or MssCi source events. Specifically, for MssCi (informational,
warnings & errors) as this may indicate why the Full Population did not
succeed for the file type. Can I assume you are using English as the
"Language for Word Breaker" and that English is the language in the docs?
Additionally, could you confirm that you have properly associated the
correct file type MS Word with "doc" in the file extension column defined as
char(3)?
Feel free to email me directly, at jt-kane at comcast dot net
Thanks,
John
--
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Raymond Xie" <halifaxdal@.yahoo.ca> wrote in message
news:%23M9k48xsFHA.3236@.TK2MSFTNGP09.phx.gbl...
> Hi John and David,
> Thank you for your reply.
> John, I tried your script on your blog to set up a full text search, I
> stall couldn't get any matching file when I do my search. For example, I
> inserted a txt file containing "John" into the image column of table
> FTSTable, and then I search it in that table, no luck.
> Did I miss anything? I followed every step on the script on your blog.
> Thanks.
>
>
> *** Sent via Developersdex http://www.examnotes.net ***|||Raymond,
A follow-up... Did you install Adobe's PDF IFilter? If not, you can download
the PDF Ifilter from the following blog entry:
"IFilters or Indexing Filters used with SQL FTS..." at:
http://spaces.msn.com/members/jtkane/Blog/cns!1pWDBCiDX1uvH5ATJmNCVLPQ!374.e
ntry
Click on PDF - Adobe and then save or install the ifilter60.exe file on
your server where SQL Server 2000 is installed. Then re-run a Full
Population and check the server's application event log for a successful
Full Population!
Thanks,
John
--
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"John Kane" <jt-kane@.comcast.net> wrote in message
news:uqVHf8CtFHA.3236@.TK2MSFTNGP09.phx.gbl...
> Raymond,
> Could you review your server's Application event log for any "Microsoft
> Search" or MssCi source events. Specifically, for MssCi (informational,
> warnings & errors) as this may indicate why the Full Population did not
> succeed for the file type. Can I assume you are using English as the
> "Language for Word Breaker" and that English is the language in the docs?
> Additionally, could you confirm that you have properly associated the
> correct file type MS Word with "doc" in the file extension column defined
> as char(3)?
> Feel free to email me directly, at jt-kane at comcast dot net
> Thanks,
> John
> --
> SQL Full Text Search Blog
> http://spaces.msn.com/members/jtkane/
>
> "Raymond Xie" <halifaxdal@.yahoo.ca> wrote in message
> news:%23M9k48xsFHA.3236@.TK2MSFTNGP09.phx.gbl...
>

No comments:

Post a Comment