document-conversion

How to convert multiple documents using the Document Conversion service ina script bash?

不问归期 提交于 2020-01-11 09:58:05
问题 How can I convert more than one document using the Document Conversion service. I have between 50-100 MS Word and PDF documents that I want to convert using the convert_document API method? For example, can you supply multiple .pdf or *.doc files like this?: curl -u "username":"password" -X POST -F "config={\"conversion_target\":\"ANSWER_UNITS\"};type=application/json" -F "file=@\*.doc;type=application/msword" "https://gateway.watsonplatform.net/document-conversion-experimental/api/v1/convert

Convert pdf, doc, ppt to html5 [closed]

三世轮回 提交于 2019-12-31 08:06:07
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . I've googled (without any luck) for open source software that can convert doc, ppt, and pdf to HTML5. (Exactly what Scribd does) Are there open source equivalents to the type of conversion Scribd does? If anyone knows of a paid service, that would also work. Scribd has an API, but that's for use with the flash

How do I send a PDF to Watson's Document Conversion service without writing it to disk first?

落花浮王杯 提交于 2019-12-25 18:24:06
问题 I am trying to convert this document (http://www.redbooks.ibm.com/redbooks/pdfs/ga195486.pdf) to answer units in Watson's Document Conversion service using the watson-developer-cloud node.js library. In the actual program (not this test program), I am retrieving the document and converting it on-the-fly, without writing it to disk first. I have done this before with other documents, but the latest version of the library (v 1.7.0 ) seems to have changed and it no longer works the way I was

IBM Watson Document Conversion not working

南楼画角 提交于 2019-12-25 08:46:46
问题 I recently implemented the Document Conversion API from IBM Watson. I always get an encoding error for converting pdf document!!! #!/usr/bin/env python #coding: utf-8 import json from watson_developer_cloud import DocumentConversionV1 from io import open document_conversion = DocumentConversionV1( username='{XXXXXXXXXXX}', password='{XXXXXXXXXXXXX}', version='2015-12-15' ) config = { 'conversion_target': 'ANSWER_UNITS', # Use a custom configuration. 'word': { 'heading': { 'fonts': [ {'level':

An efficient way to convert document to pdf format

主宰稳场 提交于 2019-12-18 11:06:19
问题 I have been trying to find the efficient way to convert document e.g. doc, docx, ppt, pptx to pdf. So far i have tried docsplit and oowriter , but both took > 10 seconds to complete the job on pptx file having size 1.7MB. Can any one suggest me a better way or suggestions to improve my approach? What i have tried: from subprocess import Popen, PIPE import time def convert(src, dst): d = {'src': src, 'dst': dst} commands = [ '/usr/bin/docsplit pdf --output %(dst)s %(src)s' % d, 'oowriter -

An efficient way to convert document to pdf format

跟風遠走 提交于 2019-12-18 11:05:42
问题 I have been trying to find the efficient way to convert document e.g. doc, docx, ppt, pptx to pdf. So far i have tried docsplit and oowriter , but both took > 10 seconds to complete the job on pptx file having size 1.7MB. Can any one suggest me a better way or suggestions to improve my approach? What i have tried: from subprocess import Popen, PIPE import time def convert(src, dst): d = {'src': src, 'dst': dst} commands = [ '/usr/bin/docsplit pdf --output %(dst)s %(src)s' % d, 'oowriter -

Libreoffice convert-to not working

有些话、适合烂在心里 提交于 2019-12-17 19:12:53
问题 I'm trying to convert documents from html,txt to pdf,odt and vice versa.. But only odt to pdf seems to work.. No other file formats are converted Here are my commands libreoffice --headless --convert-to pdf test.html [Not working] libreoffice --headless --convert-to odt test.html [Not working] libreoffice --headless --convert-to pdf test.docx [Not working] libreoffice --headless --convert-to pdf test.odt [Working] 回答1: This is a known issue in LibreOffice that was fixed in version 5.3.0.

How does Apache commons IO convert my XML header from UTF-8 to UTF-16?

半城伤御伤魂 提交于 2019-12-14 02:41:58
问题 I’m using Java 6. I have an XML template, which begins like so <?xml version="1.0" encoding="UTF-8"?> However, I notice when I parse and output it with the following code (using Apache Commons-io 2.4) … Document doc = null; InputStream in = this.getClass().getClassLoader().getResourceAsStream(“my-template.xml”); try { byte[] data = org.apache.commons.io.IOUtils.toByteArray( in ); InputSource src = new InputSource(new StringReader(new String(data))); DocumentBuilderFactory factory =

Convert PDF file to a single HTML file

扶醉桌前 提交于 2019-12-10 11:12:15
问题 I am trying to convert a PDF document to a single HTML file in java. Most of the converters online converts one PDF file to multiple HTML files. I want to convert the whole PDF to a single HTML file. Any suggestions? 回答1: Any suggestions? You might always write some code using the JSoup API to write a single document that incorporates the body of each of the multiple HTML files. Combining styles & style-sheets (CSS) might be a bit more tricky (especially if the original HTML uses 'id'