I am trying to extract text from a PDF file using Python. My main goal is I am trying to create a program that reads a bank statement and extracts its text to update an exce
Using tika worked for me!
from tika import parser rawText = parser.from_file('January2019.pdf') rawList = rawText['content'].splitlines()
This made it really easy to extract separate each line in the bank statement into a list.