Whitespace gone from PDF extraction, and strange word interpretation

前端 未结 6 2103
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-01 11:26

Using the snippet below, I\'ve attempted to extract the text data from this PDF file.

import pyPdf

def get_text(path):
    # Load PDF into pyPDF
    pdf = p         


        
6条回答
  •  囚心锁ツ
    2020-12-01 12:07

    I had solved this issue by using R:

    library(pdftools)
    pdf_file <- "xxx/untitled.pdf"
    text <- pdf_text(pdf_file)
    cat(text[1])
    

提交回复
热议问题