docx | 易学教程

python对不同类型文件（doc,txt,pdf）的字符查找

阅读更多关于 python对不同类型文件（doc,txt,pdf）的字符查找

python对不同类型文件的字符查找 TXT文件: def txt_handler(self, f_name, find_str): """ 处理txt文件 :param file_name: :return: """ line_count = 1; file_str_dict = {} if os.path.exists(f_name): f = open(f_name, 'r', encoding='utf-8') for line in f : if find_str in line: file_str_dict['file_name'] = f_name file_str_dict['line_count'] = line_count break else: line_count += 1 return file_str_dict docx文件需要用到docx包 pip install python-docx 参考https://python-docx.readthedocs.io/en/latest/ from docx import Document def docx_handler(self, f_name, find_str): """ 处理word docx文件 :param file_name: :return: """ # line_count = 1;

How to read doc file using Poi?

阅读更多关于 How to read doc file using Poi?

问题 I am trying to view word file in my editor pane I tried these lines import java.awt.Dimension; import java.awt.GridLayout; import java.io.File; import java.io.FileInputStream; import javax.swing.JEditorPane; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.extractor.WordExtractor; public class editorpane extends JEditorPane { public editorpane(File file) { try { FileInputStream fis = new FileInputStream(file.getAbsolutePath()); HWPFDocument hwpfd = new HWPFDocument(fis);

Is there a way to count doc, docx, pdf pages with only js (without Node.js)?

阅读更多关于 Is there a way to count doc, docx, pdf pages with only js (without Node.js)?

问题 I need to count number of pages in doc, docx and pdf files. I know that it is possible to do with PHP, NodeJS But is it possible to do it with only javascript if file is on server? 回答1: https://www.npmjs.com/package/docx-pdf-pagecount can be used to get docx and pdf page count. const getPageCount = require('docx-pdf-pagecount'); getPageCount('E:/sample/document/aa/test.docx') .then(pages => { console.log(pages); }) .catch((err) => { console.log(err); }); getPageCount('E:/sample/document/vb

Converting a docx containing a chart to PDF

阅读更多关于 Converting a docx containing a chart to PDF

问题 I've got a docx4j generated file which contains several tables, titles and, finally, an excel-generated curve chart. I have tried many approaches in order to convert this file to PDF, but did not get to any successful result. Docx4j with xsl-fo did not work, most of the things included in the docx file are not yet implemented and show up in red text as "not implemented". JODConverter did not work either, I got a resulting PDF in which everything was pretty good (just little formatting/styling

Docx missing attributes

阅读更多关于 Docx missing attributes

问题 I'm trying to do anything considering word document using docx library in python. The problem is, whatever I import, I get error message about 'no attribute'. For eample - Document from docx import Document gives output cannot import name Document and any try to use Document ends with error AttributeError: 'module' object has no attribute 'Document' Any syntax seems to be correct. I'm using docx module version 0.2.4 . Thanks for all help. 回答1: from official documentation python-docx versions

Converting docx to pdf using openxml and pdfcreator in c#

阅读更多关于 Converting docx to pdf using openxml and pdfcreator in c#

问题 I need to convert docx to pdf file in server. I have seen PDFCreator will do, based on below link(http://sourceforge.net/projects/pdfcreator/). I need some suggestions on this as listed below: can i use PDF Creator in server side. without creating word object, can i convert docx to pdf with openxml by using pdfcreator API. Please give me reply soon. 回答1: You can use docx4j.NET to convert a docx to XSL FO, and from there, to PDF. Or, indeed, to any of the other output formats supported by

Extract GPS coordinates from .docx file with python

阅读更多关于 Extract GPS coordinates from .docx file with python

问题 I have some hectic task to do for which I need some help from python. Please see this word document. I am to extract texts and GPS coordinates from each row. There are currently over 100 coordinates in 10 docx file. My "hefty" python knowledge get me to this. from docx import Document import re main_file = Document("D:/DOCUMENTS/Google_Link/1 Category I/1 Category I.docx") table = main_file.tables[1] #this is same for every document data = [] keys = None for i, row in enumerate(table.rows):

Inserting a bullet point and styling to [onshow.] entires in openTBS

阅读更多关于 Inserting a bullet point and styling to [onshow.] entires in openTBS

问题 I was wondering if there was a way to pass through a bullet point and a basic CSS colour styling for the bullet point via the variable that gets applied via onshow. IE $string = '<span style="color:red"></span> The rest of the string'; $TBS -> VarRef['bulletPoint'] = $string; And then in the docx template have [onshow.bulletPoint] which gets replaced with The rest of the string But with the bullet point red in this case. 回答1: For the bullet, you can use the UTF8 common character.

Clear new lines in docx

阅读更多关于 Clear new lines in docx

问题 I've a docx file, this contains a lot of new lines between sections, I need to clear a new line when it appears on more than one occasion consecutively. I unzip the file using: z = zipfile.ZipFile('File.docx','a') z.extractall() Inside of a directory: word, is a file document.xml, this contains all the data, but i don't get how to know in xml where's a new line. I Know that extract it is not the solution (I use here just only to show where is the file). I think i can use: z.write('Document

How to add item transform to VS2012 .proj msbuild file

阅读更多关于 How to add item transform to VS2012 .proj msbuild file

问题 Based off this answer describing an item transform to convert image files from jpg to png, I made an item transform that converts .docx file to .pdf. When I call it from my projectname.proj build file I get this error message: Error 1 The condition " '%(Extension)' == '.docx' " on the "WordToPdf" target has a reference to item metadata. References to item metadata are not allowed in target conditions unless they are part of an item transform. [project path]\.build\WordToPdf.Tasks.target 7 9