docx

python对不同类型文件(doc,txt,pdf)的字符查找

你说的曾经没有我的故事 提交于 2019-12-16 11:46:05
python对不同类型文件的字符查找 TXT文件: def txt_handler(self, f_name, find_str): """ 处理txt文件 :param file_name: :return: """ line_count = 1; file_str_dict = {} if os.path.exists(f_name): f = open(f_name, 'r', encoding='utf-8') for line in f : if find_str in line: file_str_dict['file_name'] = f_name file_str_dict['line_count'] = line_count break else: line_count += 1 return file_str_dict docx文件 需要用到docx包 pip install python-docx 参考https://python-docx.readthedocs.io/en/latest/ from docx import Document def docx_handler(self, f_name, find_str): """ 处理word docx文件 :param file_name: :return: """ # line_count = 1;

How to read doc file using Poi?

杀马特。学长 韩版系。学妹 提交于 2019-12-14 03:27:17
问题 I am trying to view word file in my editor pane I tried these lines import java.awt.Dimension; import java.awt.GridLayout; import java.io.File; import java.io.FileInputStream; import javax.swing.JEditorPane; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.extractor.WordExtractor; public class editorpane extends JEditorPane { public editorpane(File file) { try { FileInputStream fis = new FileInputStream(file.getAbsolutePath()); HWPFDocument hwpfd = new HWPFDocument(fis);

Is there a way to count doc, docx, pdf pages with only js (without Node.js)?

自古美人都是妖i 提交于 2019-12-14 02:44:58
问题 I need to count number of pages in doc, docx and pdf files. I know that it is possible to do with PHP, NodeJS But is it possible to do it with only javascript if file is on server? 回答1: https://www.npmjs.com/package/docx-pdf-pagecount can be used to get docx and pdf page count. const getPageCount = require('docx-pdf-pagecount'); getPageCount('E:/sample/document/aa/test.docx') .then(pages => { console.log(pages); }) .catch((err) => { console.log(err); }); getPageCount('E:/sample/document/vb

Converting a docx containing a chart to PDF

冷暖自知 提交于 2019-12-14 02:21:43
问题 I've got a docx4j generated file which contains several tables, titles and, finally, an excel-generated curve chart. I have tried many approaches in order to convert this file to PDF, but did not get to any successful result. Docx4j with xsl-fo did not work, most of the things included in the docx file are not yet implemented and show up in red text as "not implemented". JODConverter did not work either, I got a resulting PDF in which everything was pretty good (just little formatting/styling

Docx missing attributes

£可爱£侵袭症+ 提交于 2019-12-14 01:54:35
问题 I'm trying to do anything considering word document using docx library in python. The problem is, whatever I import, I get error message about 'no attribute'. For eample - Document from docx import Document gives output cannot import name Document and any try to use Document ends with error AttributeError: 'module' object has no attribute 'Document' Any syntax seems to be correct. I'm using docx module version 0.2.4 . Thanks for all help. 回答1: from official documentation python-docx versions

Converting docx to pdf using openxml and pdfcreator in c#

六眼飞鱼酱① 提交于 2019-12-14 00:27:45
问题 I need to convert docx to pdf file in server. I have seen PDFCreator will do, based on below link(http://sourceforge.net/projects/pdfcreator/). I need some suggestions on this as listed below: can i use PDF Creator in server side. without creating word object, can i convert docx to pdf with openxml by using pdfcreator API. Please give me reply soon. 回答1: You can use docx4j.NET to convert a docx to XSL FO, and from there, to PDF. Or, indeed, to any of the other output formats supported by

Extract GPS coordinates from .docx file with python

吃可爱长大的小学妹 提交于 2019-12-13 22:05:05
问题 I have some hectic task to do for which I need some help from python. Please see this word document. I am to extract texts and GPS coordinates from each row. There are currently over 100 coordinates in 10 docx file. My "hefty" python knowledge get me to this. from docx import Document import re main_file = Document("D:/DOCUMENTS/Google_Link/1 Category I/1 Category I.docx") table = main_file.tables[1] #this is same for every document data = [] keys = None for i, row in enumerate(table.rows):

Inserting a bullet point and styling to [onshow.] entires in openTBS

筅森魡賤 提交于 2019-12-13 21:23:36
问题 I was wondering if there was a way to pass through a bullet point and a basic CSS colour styling for the bullet point via the variable that gets applied via onshow. IE $string = '<span style="color:red">&#149;</span> The rest of the string'; $TBS -> VarRef['bulletPoint'] = $string; And then in the docx template have [onshow.bulletPoint] which gets replaced with • The rest of the string But with the bullet point red in this case. 回答1: For the bullet, you can use the UTF8 common character.

Clear new lines in docx

半城伤御伤魂 提交于 2019-12-13 19:59:56
问题 I've a docx file, this contains a lot of new lines between sections, I need to clear a new line when it appears on more than one occasion consecutively. I unzip the file using: z = zipfile.ZipFile('File.docx','a') z.extractall() Inside of a directory: word, is a file document.xml, this contains all the data, but i don't get how to know in xml where's a new line. I Know that extract it is not the solution (I use here just only to show where is the file). I think i can use: z.write('Document

How to add item transform to VS2012 .proj msbuild file

烂漫一生 提交于 2019-12-13 18:15:49
问题 Based off this answer describing an item transform to convert image files from jpg to png, I made an item transform that converts .docx file to .pdf. When I call it from my projectname.proj build file I get this error message: Error 1 The condition " '%(Extension)' == '.docx' " on the "WordToPdf" target has a reference to item metadata. References to item metadata are not allowed in target conditions unless they are part of an item transform. [project path]\.build\WordToPdf.Tasks.target 7 9