Convert PDF to .docx with Python

吃可爱长大的小学妹 提交于 2019-12-06 09:39:36

I am not aware of a way to convert a pdf file into a Word file using libreoffice.
However, you can convert from a pdf to a html and then convert the html to a docx.
Firstly, get the commands running on the command line. (The following is on Linux. So you may have to fill in path names to the soffice binary and use a full path for the input file on your OS)

soffice --convert-to html ./my_pdf_file.pdf

then

soffice --convert-to docx:'MS Word 2007 XML' ./my_pdf_file.html

You should end up with:

my_pdf_file.pdf
my_pdf_file.html
my_pdf_file.docx

Now wrap the commands in your subprocess code

This should fix your issues.

import os
import subprocess
path = '/my/pdf/folder'
for files in os.listdir(path):
    for filename in files:
        flipfilename = filename[::-1]
        ext,trash = flipfilename.split('.',1)
        if ext = 'pdf':
           abspath = os.path.join(path, filename)
           subprocess.call('lowriter --invisible --convert-to doc "{}"'
                            .format(abspath), shell=True)

Maybe try subprocess.Popen?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!