docx

(转)pdf及word文档的读取 pyPDF2,docx

心已入冬 提交于 2020-02-15 23:40:29
https://www.cnblogs.com/Ting-light/p/9548127.html #!python3 #-*- coding:utf8 -*- #PyPDF2可能会打不开某些pdf文档,也不能提取图片,图表或者其他媒介从PDF文件中。但是它能提取文本从PDF中,转化为字符。 import PyPDF2 #以二进制方式 读模式打开一个pdf文件 pdfFileObj=open('e:\work\data_service.pdf','rb') #读取pdf文档 pdfReader=PyPDF2.PdfFileReader(pdfFileObj) #返回的是pdf文档的总页数 print(pdfReader.numPages) #获取单页的内容,页码数从0开始 pageObj=pdfReader.getPage(0) #返回单页的文本内容 pageObj.extractText() #对于有加密的pdf文档其读对象有属性 isEncrypted print(pdfReader.isEncrypted) #若有加密,则属性值为True。直接获取某页的文本内容会报错。 #通过方法decrypt()传递解密密码后可正常获取文本内容,密码以字符串形式传入。 #pdfReader.decrypt('rosebud') #写pdf文档 #创建pdf写对象 pdfWriter

docxtpl模块的word模板替换内容

房东的猫 提交于 2020-02-12 15:07:17
一、介绍 这个包使用两个主要的包: 用于读取、写入和创建子文档 用于管理插入到模板docx中的标记 python-docx模板已经创建,因为python-docx对于创建文档非常强大,但是对于修改文档却无能为力。 这个想法是开始创建一个你想要用microsoft word生成的文档的例子,它可以像你想要的那样复杂:图片,索引表,页脚,页眉,变量,任何你可以用word做的事情。然后,由于您仍然在使用microsoft word编辑文档,所以可以直接在文档中插入类似jinja2的标记。您将文档保存为.docx文件(xml格式):它将是您的.docx模板文件。 现在可以使用python-docx-template从.docx模板和关联的上下文变量中生成任意数量的word文档。 更多高级用法请查看文档,以下是简单示例 https://docxtpl.readthedocs.io/en/latest/#jinja2-like-syntax 二、代码 from docxtpl import DocxTemplate def temp_word(tmep_path,word_apth): tpl = DocxTemplate(tmep_path) # 需要替换内容以key:value的方式进行更换 context = { "name":"上海市XXXXXX公司", "num":

Open source php doc/x to pdf conversion?

走远了吗. 提交于 2020-02-03 10:52:26
问题 Are there any open source PHP tools that i can use to convert .doc / .docx to pdf ? If you have any good tutorials or tools that would be greatly appreciated . I was looking into phpLiveDocx but looks like they charge monthly. Or maybe an .odt to pdf in php or linux ? 回答1: Try FPDF(dot)org. Dunno if it's open source but it seems easy to understand & use. EDIT: Didn't notice that it didn't do conversions. Maybe this blog post will help: Word to PDF conversion using OpenOffice on Windows.

Open source php doc/x to pdf conversion?

流过昼夜 提交于 2020-02-03 10:52:23
问题 Are there any open source PHP tools that i can use to convert .doc / .docx to pdf ? If you have any good tutorials or tools that would be greatly appreciated . I was looking into phpLiveDocx but looks like they charge monthly. Or maybe an .odt to pdf in php or linux ? 回答1: Try FPDF(dot)org. Dunno if it's open source but it seems easy to understand & use. EDIT: Didn't notice that it didn't do conversions. Maybe this blog post will help: Word to PDF conversion using OpenOffice on Windows.

Docx to pdf using openoffice headless way too slow

依然范特西╮ 提交于 2020-02-02 02:08:29
问题 I've been using PHPWord for docx files generation. And it's been working great. But now I have the need to also make available some of those files on a pdf version. After a few research I found PyODConverter which use OOo. Seemed quite a good option since I don't want to depend on third party web services. I tried it out on my machine and it works fined, so I've applied it on my server as well. It took a little longer but I've managed to get it working on there too. There is however an (bad)

Version-controlling zipped files (docx, odt)

感情迁移 提交于 2020-01-30 16:26:38
问题 There are formats that are actually zip files in disguise, e.g. docx or odt. If I store them directly in version control, they are handled as binary files. My ideal solution would be have a hook that creates a foo.docx/ directory for each foo.docx files before commit, unzipping all files into it optionally, have a hook that reindents the xml files have a hook that recreates foo.docx from the stored files after update I don't want the docx files themselves to be version-controlled. (I am aware

Version-controlling zipped files (docx, odt)

╄→尐↘猪︶ㄣ 提交于 2020-01-30 16:25:10
问题 There are formats that are actually zip files in disguise, e.g. docx or odt. If I store them directly in version control, they are handled as binary files. My ideal solution would be have a hook that creates a foo.docx/ directory for each foo.docx files before commit, unzipping all files into it optionally, have a hook that reindents the xml files have a hook that recreates foo.docx from the stored files after update I don't want the docx files themselves to be version-controlled. (I am aware

Open .docx with Apache POI and save it with password

我的未来我决定 提交于 2020-01-30 11:34:06
问题 The goal is to open existing .docx document and save it encrypted with password. I use Apache POI library for that. The code below works fine and makes document encrypted and password protected. But after the file creating I can open it with the LibreOffice but can't with the MS Word or OpenOffice Writer. It seems that the file has no content type part cuz OpenOffice asked me about file's filter. But when I choosed "Microsoft Word 2007 XML" I got the "Common Input-Output error" from the

POI读写Word docx文件

五迷三道 提交于 2020-01-28 11:23:00
使用 POI 读写 word docx 文件 目录 1 读docx文件 1.1 通过XWPFWordExtractor读 1.2 通过XWPFDocument读 2 写docx文件 2.1 直接通过XWPFDocument生成 2.2 以docx文件作为模板 POI在读写word docx文件时是通过xwpf模块来进行的,其核心是XWPFDocument。一个XWPFDocument代表一个docx文档,其可以用来读docx文档,也可以用来写docx文档。XWPFDocument中主要包含下面这几种对象: l XWPFParagraph:代表一个段落。 l XWPFRun:代表具有相同属性的一段文本。 l XWPFTable:代表一个表格。 l XWPFTableRow:表格的一行。 l XWPFTableCell:表格对应的一个单元格。 1 读docx文件 跟读doc文件一样,POI在读docx文件的时候也有两种方式,通过XWPFWordExtractor和通过XWPFDocument。在XWPFWordExtractor读取信息时其内部还是通过XWPFDocument来获取的。 1.1 通过XWPFWordExtractor读 在使用XWPFWordExtractor读取docx文档的内容时,我们只能获取到其文本,而不能获取到其文本对应的属性值

Fetch text from Docx using PHP without loosing Html format

让人想犯罪 __ 提交于 2020-01-25 09:37:05
问题 Brief description: I have a Docx file. I am able to export data from it using PHP code but the data looses it Html format. How can i keep that intact and extract the data. So far my php code below: <?php function read_file_docx($filename){ $striped_content = ''; $content = ''; if(!$filename || !file_exists($filename)) return false; $zip = zip_open($filename); if (!$zip || is_numeric($zip)) return false; while ($zip_entry = zip_read($zip)) { if (zip_entry_open($zip, $zip_entry) == FALSE)