Python 3.x, How do I train the model on pdf invoices that has email and other document content attached, including images and handwritten text? I only need the information , se