Extract images from PDF, how to handle JBIG2 encoded
问题 I have a bunch of PDF files, some of them are pure text but some are fully or partially saved as "One image per page" because they are generated from a scanner. I need to extract all images contained in the PDF and then examine each image separately. I was able to extract most of the images with a python script found here in SO see question: Extract images from PDF without resampling, in python? Some of the included images were encoded using JBIG2 and I could not find any python or other tool