How to extract text from pdf in Python 3.7

后端 未结 10 1254
后悔当初
后悔当初 2020-12-29 10:19

I am trying to extract text from a PDF file using Python. My main goal is I am trying to create a program that reads a bank statement and extracts its text to update an exce

10条回答
  •  南笙
    南笙 (楼主)
    2020-12-29 10:59

    Using tika worked for me!

    from tika import parser
    
    rawText = parser.from_file('January2019.pdf')
    
    rawList = rawText['content'].splitlines()
    

    This made it really easy to extract separate each line in the bank statement into a list.

提交回复
热议问题