pdf

Howto convert A4 jpeg scanned page to A4 pdf

≡放荡痞女 提交于 2020-12-04 08:55:31
问题 I have a jpeg scanned page ( toto.jpg ) that I want to convert to a A4 pdf file with imagemagick convert command. I've been trying the -page A4 , -resize 595x842 and -define pdf:fit-page=A4 options but it does not work, I don't obtain the correct size : $ identify toto.jpg toto.jpg JPEG 1644x2304 1644x2304+0+0 8-bit DirectClass 902KB 0.000u 0:00.000 $ convert -density 300 -page a4 toto.jpg toto.pdf $ identify toto.pdf toto.pdf PDF 143x202 143x202+0+0 16-bit Bilevel DirectClass 3.7KB 0.000u 0

PyPDF2 won't extract all text from PDF

别来无恙 提交于 2020-12-01 11:46:29
问题 I'm trying to extract text from a PDF (https://www.sec.gov/litigation/admin/2015/34-76574.pdf) using PyPDF2, and the only result I'm getting is the following string: b'' Here is my code: import PyPDF2 import urllib.request import io url = 'https://www.sec.gov/litigation/admin/2015/34-76574.pdf' remote_file = urllib.request.urlopen(url).read() memory_file = io.BytesIO(remote_file) read_pdf = PyPDF2.PdfFileReader(memory_file) number_of_pages = read_pdf.getNumPages() page = read_pdf.getPage(1)

PyPDF2 won't extract all text from PDF

让人想犯罪 __ 提交于 2020-12-01 11:46:27
问题 I'm trying to extract text from a PDF (https://www.sec.gov/litigation/admin/2015/34-76574.pdf) using PyPDF2, and the only result I'm getting is the following string: b'' Here is my code: import PyPDF2 import urllib.request import io url = 'https://www.sec.gov/litigation/admin/2015/34-76574.pdf' remote_file = urllib.request.urlopen(url).read() memory_file = io.BytesIO(remote_file) read_pdf = PyPDF2.PdfFileReader(memory_file) number_of_pages = read_pdf.getNumPages() page = read_pdf.getPage(1)

Inkscape “PDF + Latex” export

*爱你&永不变心* 提交于 2020-12-01 10:53:09
问题 I'm using inkscape to produce vector figures, save them in SVG format to export them later as "PDF + Latex" much in the vein of TUG inkscape+pdflatex guide. Trying to produce a simple figure, however, turns out to be extremely frustating. The first figure http://i.stack.imgur.com/jIo3z.png is an example of the figure I would like to export in the form of "PDF + Latex" (shown here in PNG format). If I export this to a PDF figure without latex macros the PDF produced looks exactly the same,

Inkscape “PDF + Latex” export

自古美人都是妖i 提交于 2020-12-01 10:50:18
问题 I'm using inkscape to produce vector figures, save them in SVG format to export them later as "PDF + Latex" much in the vein of TUG inkscape+pdflatex guide. Trying to produce a simple figure, however, turns out to be extremely frustating. The first figure http://i.stack.imgur.com/jIo3z.png is an example of the figure I would like to export in the form of "PDF + Latex" (shown here in PNG format). If I export this to a PDF figure without latex macros the PDF produced looks exactly the same,

ps2pdf: preserve page size

老子叫甜甜 提交于 2020-12-01 09:32:49
问题 I have myfile.ps with a vector image included. But when I run ps2pdf myfile.ps it seems that the output page size is A4: the vector image is too large and become cut away, so about one inch is lost. The following pseudo-header is printed in the output PDF file, in addition to the original vector image: PLOT SIZE:8.02x8.62Inches Magnification:7354.21X Is there any option or any way to convert the PS file to a PDF preserving the original paper size? 回答1: I doubt your quoted 2 lines are really

How do I determine programmatically if a PDF is searchable?

烂漫一生 提交于 2020-12-01 07:34:29
问题 I have a CSV with a list of URLs with PDFs: Some of these PDFs are searchable. Some of these PDFS aren't searchable. I want to determine which PDFs are searchable from my list of PDFs. Is there an easy way to do this? 回答1: On the commandline, I'd use pdffonts to determine which fonts the file uses. This runs rather fast as well... Example 1: PDF containing text pdffonts bash-manpage.pdf name type encoding emb sub uni object ID ------------------------------- ------------- --------------- ---

Generate PDF from HTML using Django and Reportlab

只谈情不闲聊 提交于 2020-12-01 07:33:05
问题 I am coming back with a new question which I am unable to answer, having scratched my head the whole day on it. I want to generate a PDF from a webpage by clicking on a "Download PDF" button. I tried several modules including Reportlab and XHTML2PDF but I am not able to generate any pdf nor downloading it... Here is what I did with Reportlab, following Render HTML to PDF in Django site - - views.py - - import cStringIO as StringIO import ho.pisa as pisa from django.template.loader import get

How do I determine programmatically if a PDF is searchable?

自闭症网瘾萝莉.ら 提交于 2020-12-01 07:30:46
问题 I have a CSV with a list of URLs with PDFs: Some of these PDFs are searchable. Some of these PDFS aren't searchable. I want to determine which PDFs are searchable from my list of PDFs. Is there an easy way to do this? 回答1: On the commandline, I'd use pdffonts to determine which fonts the file uses. This runs rather fast as well... Example 1: PDF containing text pdffonts bash-manpage.pdf name type encoding emb sub uni object ID ------------------------------- ------------- --------------- ---

How do I determine programmatically if a PDF is searchable?

爷,独闯天下 提交于 2020-12-01 07:29:02
问题 I have a CSV with a list of URLs with PDFs: Some of these PDFs are searchable. Some of these PDFS aren't searchable. I want to determine which PDFs are searchable from my list of PDFs. Is there an easy way to do this? 回答1: On the commandline, I'd use pdffonts to determine which fonts the file uses. This runs rather fast as well... Example 1: PDF containing text pdffonts bash-manpage.pdf name type encoding emb sub uni object ID ------------------------------- ------------- --------------- ---