问题
I've saw some pages that allow user to upload PDF
and returns a DOC
file, like PdfToWord
Is there any way to convert a PDF
file to a DOC/DOCX
file using Python or any Unix command ?
Thanks in advance
回答1:
If you have LibreOffice installed
lowriter --invisible --convert-to doc '/your/file.pdf'
If you want to use Python for this:
import os
import subprocess
for top, dirs, files in os.walk('/my/pdf/folder'):
for filename in files:
if filename.endswith('.pdf'):
abspath = os.path.join(top, filename)
subprocess.call('lowriter --invisible --convert-to doc "{}"'
.format(abspath), shell=True)
回答2:
This is difficult because PDFs are presentation oriented and word documents are content oriented. I have tested both and can recommend the following projects.
- PyPDF2
- PDFMiner
However, you are most definitely going to lose presentational aspects in the conversion.
回答3:
You can use GroupDocs.Conversion Cloud SDK for python without installing any third-party tool or software.
Sample Python code:
# Import module
import groupdocs_conversion_cloud
# Get your app_sid and app_key at https://dashboard.groupdocs.cloud (free registration is required).
app_sid = "xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
app_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Create instance of the API
convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(app_sid, app_key)
file_api = groupdocs_conversion_cloud.FileApi.from_keys(app_sid, app_key)
try:
#upload soruce file to storage
filename = 'Sample.pdf'
remote_name = 'Sample.pdf'
output_name= 'sample.docx'
strformat='docx'
request_upload = groupdocs_conversion_cloud.UploadFileRequest(remote_name,filename)
response_upload = file_api.upload_file(request_upload)
#Convert PDF to Word document
settings = groupdocs_conversion_cloud.ConvertSettings()
settings.file_path =remote_name
settings.format = strformat
settings.output_path = output_name
loadOptions = groupdocs_conversion_cloud.PdfLoadOptions()
loadOptions.hide_pdf_annotations = True
loadOptions.remove_embedded_files = False
loadOptions.flatten_all_fields = True
settings.load_options = loadOptions
convertOptions = groupdocs_conversion_cloud.DocxConvertOptions()
convertOptions.from_page = 1
convertOptions.pages_count = 1
settings.convert_options = convertOptions
.
request = groupdocs_conversion_cloud.ConvertDocumentRequest(settings)
response = convert_api.convert_document(request)
print("Document converted successfully: " + str(response))
except groupdocs_conversion_cloud.ApiException as e:
print("Exception when calling get_supported_conversion_types: {0}".format(e.message))
I'm developer evangelist at aspose.
来源:https://stackoverflow.com/questions/26358281/convert-pdf-to-doc-python-bash