docx

Jinja2 for word templating

让人想犯罪 __ 提交于 2019-12-03 17:32:05
I would like to use jinja2 for word templating like mentioned is this short article. The problem I'm facing is as follows, if I put {{title}} in my word-file the resulting xml can look like this: <w:r><w:t>{{</w:t></w:r><w:proofErr w:type="gramStart"/><w:r><w:t>title</w:t></w:r><w:proofErr w:type="gramEnd"/><w:r><w:t>}}</w:t></w:r></w:p> so it is impossible for jinja to replace this accordingly. Is there a possibility to prevent word from splitting {{title}} in separate text elements? (if I copy from a text-editor it works fine) This is an issue that is in word, relating to the proofErr tag.

Docx4j - How to replace placeholder with value

試著忘記壹切 提交于 2019-12-03 14:31:46
问题 I've been trying to work through the examples FieldMailMerge and VariableReplace but can't seem to get a local test case running. I'm basically trying to start with one docx template document and have it create x docx documents from that one template with the variables replaced. In the code below docx4jReplaceSimpleTest() tries to replace a single variable but fails to do so. The ${} values in the template files are removed as part of the processing therefore I believe it's finding them but

How do I extract data from a doc/docx file using Python

本秂侑毒 提交于 2019-12-03 13:55:20
问题 I know there are similar questions out there, but I couldn't find something that would answer my prayers. What I need is a way to access certain data from MS-Word files and save it in an XML file. Reading up on python-docx did not help, as it only seems to allow one to write into word documents, rather than read. To present my task exactly (or how i chose to approach my task): I would like to search for a key word or phrase in the document (the document contains tables) and extract text data

Converting HTML to odt, doc, docx

拥有回忆 提交于 2019-12-03 12:35:56
Is there an easy way to convert HTML(with CSS styles and embedded images) to ODT, DOCX, DOC from the command line on linux server. I searched a lot but have not found a good option. There was a problem the same way to convert to PDF, decided by wkhtmltopdf. Perhaps there are ways to convert the resulting PDF documents to other formats? Zsolt Botykai To convert to odt it's pretty easy after installing pandoc . After the relatively hard part: from odt (or even html ) you can script (Open|Libre)Office via e.g. unoconv Or you can like: abiword --to=doc filename.odt Also see this thread , and this

Replace text templates inside .docx (Apache POI, Docx4j or other)

匿名 (未验证) 提交于 2019-12-03 10:24:21
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I want to do replacements in MS Word ( .docx ) document using regular expression (java RegEx): I tried to get text templates (like %SOME_TEXT% ) use Apache POI - XWPF and replace text, but replacement is not guaranteed, because POI separates runs => I get something like this( System.out.println(run.getText(0)) ): code example: FileInputStream fis = new FileInputStream(new File("document.docx")); XWPFDocument document = new XWPFDocument(fis); List<XWPFParagraph> paragraphs = document.getParagraphs(); paragraphs.forEach(para -> { para.getRuns(

Converting docx to pdf with pure python (on linux, without libreoffice)

自闭症网瘾萝莉.ら 提交于 2019-12-03 10:24:07
I'm dealing with a problem trying to develop a web-app, part of which converts uploaded docx files to pdf files (after some processing). With python-docx and other methods, I do not require a windows machine with word installed, or even libreoffice on linux, for most of the processing (my web server is pythonanywhere - linux but without libreoffice and without sudo or apt install permissions). But converting to pdf seems to require one of those. From exploring questions here and elsewhere, this is what I have so far: import subprocess try: from comtypes import client except ImportError: client

Python win32com.client.Dispatch looping through Word documents and export to PDF; fails when next loop occurs

你。 提交于 2019-12-03 09:43:08
Based on the script here: .doc to pdf using python I've got a semi-working script to export .docx files to pdf from C:\Export_to_pdf into a new folder. The problem is that it gets through the first couple of documents and then fails with: (-2147352567, 'Exception occurred.', (0, u'Microsoft Word', u'Command failed', u'wdmain11.chm', 36966, -2146824090), None) This, apparently is an unhelpful general error message. If I debug slowly it using pdb, I can loop through all files and export successfully. If I also keep an eye on the processes in Windows Task Manager I can see that WINWORD starts

In git how to diff microsoft word documents?

匿名 (未验证) 提交于 2019-12-03 09:06:55
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I've been following this guide here on how to diff Microsoft Word documents, but I ran into this error: Usage: /usr/bin/docx2txt.pl [infile.docx|-|-h] [outfile.txt|-] /usr/bin/docx2txt.pl < infile.docx /usr/bin/docx2txt.pl < infile.docx > outfile.txt In second usage, output is dumped on STDOUT. Use '-h' as the first argument to get this usage information. Use '-' as the infile name to read the docx file from STDIN. Use '-' as the outfile name to dump the text on STDOUT. Output is saved in infile.txt if second argument is omitted. Note:

Convert .doc to .docx using C# [closed]

匿名 (未验证) 提交于 2019-12-03 09:05:37
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I convert PDF file to the word file using PDFFocus.net dll. But for my system I want .docx file. I tried different ways. There some libraries available. But those are not free. This is my pdf to doc convert code. Using System; Using System.Collections.Generic; Using System.Linq; Using System.Text; Using System.Threading.Tasks; Using iTextSharp.text; Using iTextSharp.text.pdf; namespace ConsoleApplication { class Program { static void main(String[] args) { SautinSoft.PdfFocus f=new SautinSoft.PdfFocus(); f.OpenPdf(@"E:\input.pdf"); t.ToWord(@

Linux SCP命令

▼魔方 西西 提交于 2019-12-03 08:31:37
介绍 用于Linux之间复制文件和目录 SCP(secure copy) 加密传输 从本地复制到远程 [root@localhost aa]# scp /root/aa/a.docx root@172.16.4.22:/root/ root@172.16.4.22's password: a.docx [root@localhost ~]# scp -r /root/bb root@172.16.4.22:/root/ root@172.16.4.22's password: b.docx 100% 15MB 65.3MB/s 00:00 从远程复制到本地 scp root@172.16.4.22:/root/c.docx /root/ root@172.16.4.22's password: c.docx scp -r root@172.16.4.22:/root/aa /root/ root@172.16.4.22's password: b.docx 总结 如果是传输文件用 SCP 源文件地址 root(目标账号)@目的IP地址 路径 如果是文件夹则用SCP -r 源文件路径 root@目的IP地址 路径 来源: https://www.cnblogs.com/longlogs/p/11785003.html