ghostscript

python爬虫处理在线预览的pdf文档

会有一股神秘感。 提交于 2020-04-09 11:24:12
引言 最近在爬一个网站,然后爬到详情页的时候发现,目标内容是用pdf在线预览的 比如如下网站: https://camelot-py.readthedocs.io/en/master/_static/pdf/foo.pdf 根据我的分析发现,这样的在线预览pdf的采用了pdfjs加载预览,用爬虫的方法根本无法直接拿到pdf内的内容的,对的,你注意到了我说的【根本无法直接拿到】中的直接两个字,确实直接无法拿到,怎么办呢?只能把pdf先下载到本地,然后用工具转了,经过我查阅大量的相关资料发现,工具还是有很多:   1.借用第三方的pdf转换网站转出来   2.使用Python的包来转:如:pyPdf,pyPdf2,pyPdf4,pdfrw等工具 这些工具在pypi社区一搜一大把: 但是效果怎么样就不知道了,只能一个一个去试了,到后面我终于找到个库,非常符合我的需求的库 ——camelot camelot可以读取pdf文件中的数据,并且自动转换成pandas库(数据分析相关)里的DataFrame类型,然后可以通过DataFrame转为csv,json,html都行,我的目标要的就是转为html格式,好,废话不多说,开始搞 开始解析 1.安装camelot: pip install camelot-py pip install cv2 (因为camelot需要用到这个库) 2.下载pdf

Orcad如何打印智能PDF文件

依然范特西╮ 提交于 2020-03-23 10:54:06
3 月,跳不动了?>>> 在日常工作中我们经常用Cadence的Orcad打印PDF文件,但是如果能够打印智能PDF就方便我们查看和分析,节省我们的查看时间,看起来也舒服直观,达到事半功倍的效果。 第一:非常重要!非常重要!非常重要的两个软件:FreePDF和Ghostscript 首先必须在自己PC上安装好这两个软件,具体如何安装这里不再描述,然后按图一设置; 图一 第二:设置PDF导出,进行如下图二设置; 图二 第三:设置PDF文件输出路径,文件名,打印机名和Ghostscript路径,如下图三和图四,设置完后点击OK即可,如果页面过多打印可能需要稍等一会,请耐心等待;特别注意:设置Ghostscript路径时千万要记得设置路径正确否则可能无法打印例如我的安装在“ C:\ProgramFiles\gs\gs9.27\bin\gswin64c.exe"就进行如下设置如图四; 图三 图四 第四:打印完毕后,我们打开shanyingzhizuo.pdf文件,我们可以看到已经变为智能原理图,点击左侧信息栏可以打开相应的东西,例如图五,点击C10就跳转到C10所在页面位置,是不是很方便,是不是美滋滋,最重要的是好看节省我们查阅时间,时间就是金钱,效率就是生命,你懂的。 来源: oschina 链接: https://my.oschina.net/u/4228486/blog

Why does this PostScript/PS file create way more top-margin than specified?

你离开我真会死。 提交于 2020-03-05 04:04:49
问题 The PS script takes a plaintext document and produces a PDF from it. A big thank you to @RedGrittyBrick for digging up this snippet: %! % % From: Jonathan Monsarrat (jgm@cs.brown.edu) % Subject: PostScript -> ASCII *and* ASCII -> PostScript programs % Newsgroups: comp.lang.postscript % Date: 1992-10-01 04:45:38 PST % % "If anyone is interested, here is an interesting program written by % Professor John Hughes here at Brown University that formats ASCII % in PostScript without a machine

Getting the page sizes of a PostScript document

半腔热情 提交于 2020-03-05 03:55:41
问题 I want to get page size of each page of a PostScript document in a simple program or shell script. Is there any program or library that can get the page sizes. (Like pdfinfo, but dealing with PostScript) 回答1: No doubt, there's some program for that, but you can try using Ghostscript: gs -q -sDEVICE=nullpage -dBATCH -dNOPAUSE \ -c '/showpage{currentpagedevice /PageSize get{=}forall showpage}bind def' \ -f test.ps But then you may need to filter out any warnings or DSC comments. E.g. one of

centos php ppt转图片

左心房为你撑大大i 提交于 2020-02-26 05:34:42
参考: https://blog.csdn.net/aituochang1886/article/details/101167564 安装 Unoconv 参考: https://www.licongying.cn/2018/10/linux-centos-install-unoconv-liboffice/ https://blog.csdn.net/qq_42975335/article/details/102747587 安装 ImageMagick 参考: https://www.cnblogs.com/yzeng/p/11569598.html https://blog.csdn.net/mytt_10566/article/details/80902059 报错解决方案: convert: error while loading shared libraries: libMagickCore-7.Q16HDRI.so.7: cannot open shared object file: No such file or directory https://blog.csdn.net/lvshuocool/article/details/89455700 convert: no images defined `ffcl.png' @ error/convert.c

How to adjust screening (frequency and angle) using Ghostscript?

岁酱吖の 提交于 2020-02-25 04:25:06
问题 I have an EPS file that contains c, m, y and k 4 channels. I try to use Ghostscript ps command to separate it to 4 1bit tiff images. I don't have any idea about screening function in gs command . Does anyone known how to adjust screening in gs command? How to set 1 bit tiff's frequency(lpi) and angle for each separation color? 回答1: I presume you mean you have an EPS file, not an ESP file. If so then the EPS program should not contain any halftoning information. You can use the -c "..." -f

Converting PDF without any images to CMYK

谁都会走 提交于 2020-02-12 05:27:05
问题 I read this post about how to convert PDF to CMYK, but when I try the accepted solution gs \ -o test-cmyk.pdf \ -sDEVICE=pdfwrite \ -sProcessColorModel=DeviceCMYK \ -sColorConversionStrategy=CMYK \ -sColorConversionStrategyForImages=CMYK \ test.pdf I does not get a pdf with CMYK color space, if my original pdf does not contain an image. If I add an image to it, I get the right result (checked with identify ). For example, if I create a svg with inskcape with one rectangle, export it to pdf,

How to center image when creating images from PDF using GhostScript

送分小仙女□ 提交于 2020-02-08 08:37:27
问题 I have several pdf files with different sizes and different width to height ratios. Now I want to create fixed-size thumbnails from 1st page of these files. I do this using this command: gs -dNOPAUSE -sDEVICE=jpeg -dFirstPage=1 -dLastPage=1 -sOutputFile=d:\test\a.jpeg -dJPEGQ=100 -g509x750 -dUseCropBox=true -dPDFFitPage=true -q d:\test\a.pdf -c quit Since the original files are of different widths and heights but thumbnails should be of the same size, there will be white margins in the right

How to reduce the size of the PDF generated by tesseract?

北战南征 提交于 2020-01-22 20:48:06
问题 The setup of my (web) app is the following: I get user uploaded PDF files, I run OCR on them and show them the OCRed PDF. Since everything is online, the minimizing the size of the resulting PDF file is key to reduce loading and wait time for the user. The file I receive from the user is sample.pdf (I've created an archive with the original files as well as those that I generate here: https://dl.dropboxusercontent.com/u/1390155/tess-files/sample.zip). I use tesseract 3.04 and do the following

Linux: Command Line Utility Convert RTF to PDF?

吃可爱长大的小学妹 提交于 2020-01-22 14:13:46
问题 Any recommendations to convert an RTF to a PDF? I need to do this from my LAMP application, so a command line utility like GhostScript would be ideal. 回答1: sudo apt-get install ted /usr/share/ted/Ted/rtf2pdf.sh source-file dest-file or visit this link 回答2: Alternatively, you can use libreoffice for this task: libreoffice --headless --invisible --norestore --convert-to pdf source-file.rtf 回答3: In my Ubuntu 10.4 I have unrtf , which "converts RTF to HTML, LaTeX, Postscript". From Postscript it