Extract Data from .PDF files

后端未结

关注

 4  1945

I need to extract data from .PDF files and load it in to SQL 2008. Can any one tell me how to proceed??

4条回答

挽巷 (楼主)

2020-12-07 17:47

what you need to do is to use a tool to extract the text from PDF first and then read the file into a binary reader .. then store it into your database .. for extracting the text there are several tools to use. the first to mention are:
iTextsharp which is a Library that can be downloaded and used to do extensive work and in-depth edits and builds when dealing with PDF documents, and there are a lot of examples available online along with a full book that explains the ins and outs of it
The second tool is Adobe PDF iFilter which is a tool from adobe to deal with PDF modifications and manipulation.
Also Foxit iFilter also is a similar assembly that can do just what u r asking for!
PDF Boxwill also serve you!

these are the most well known and well documented ones! check the following examples: try the following examples on code project:
Parsing PDF files in .NET using PDFBox and IKVM.NET.
A simple class to extract plain text from PDF documents with ITextSharp
Using the IFilter interface to extract text from various document types
A parser for PDF Forms written in C#.NET
These do the job and they ain't hard to understand. Hope they help you :-)

A final note: as for me, i would iTextSharp as it's the most well documented library with most available examples.

0 讨论(0)