PDF Parsing with SWIFT

后端 未结 2 1737
无人及你
无人及你 2021-02-07 13:32

I want to parse a PDF that has no images, only text. I\'m trying to find pieces of text. For example to search the string \"Name:\" and be able to read the characters after \":\

2条回答
  •  耶瑟儿~
    2021-02-07 14:07

    You can use PDFKit to do this. It is part of the Quartz framework and is available on both iOS and MacOS. It is also pretty fast, I was able to search through a PDF with over 15000 characters in just 0.07s.

    Here is an example:

    import Quartz
    
    let pdf = PDFDocument(url: URL(fileURLWithPath: "/Users/...some path.../test.pdf"))
    
    guard let contents = pdf?.string else {
        print("could not get string from pdf: \(String(describing: pdf))")
        exit(1)
    }
    
    let footNote = contents.components(separatedBy: "FOOT NOTE: ")[1] // get all the text after the first foot note
    
    print(footNote.components(separatedBy: "\n")[0]) // print the first line of that text
    
    // Output: "The operating system being written in C resulted in a more portable software."
    

    You can also still access most of (if not all of) the properties you had before. Such as pdf.pageCount for the number of pages, and pdf.page(at: ) to get a specific page.

提交回复
热议问题