Extract Data from .PDF files

后端 未结 4 1945
Happy的楠姐
Happy的楠姐 2020-12-07 17:01

I need to extract data from .PDF files and load it in to SQL 2008. Can any one tell me how to proceed??

4条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-07 17:47

    what you need to do is to use a tool to extract the text from PDF first and then read the file into a binary reader .. then store it into your database .. for extracting the text there are several tools to use. the first to mention are:

  • iTextsharp which is a Library that can be downloaded and used to do extensive work and in-depth edits and builds when dealing with PDF documents, and there are a lot of examples available online along with a full book that explains the ins and outs of it
  • The second tool is Adobe PDF iFilter which is a tool from adobe to deal with PDF modifications and manipulation.
  • Also Foxit iFilter also is a similar assembly that can do just what u r asking for!
  • PDF Boxwill also serve you!

    these are the most well known and well documented ones! check the following examples: try the following examples on code project:

  • Parsing PDF files in .NET using PDFBox and IKVM.NET.
  • A simple class to extract plain text from PDF documents with ITextSharp
  • Using the IFilter interface to extract text from various document types
  • A parser for PDF Forms written in C#.NET
    These do the job and they ain't hard to understand. Hope they help you :-)

    A final note: as for me, i would iTextSharp as it's the most well documented library with most available examples.

提交回复
热议问题