I\'m trying to use python-docx
module (pip install python-docx
)
but it seems to be very confusing as in github repo test sample they are using
Without Installing python-docx
docx
is basically is a zip file with several folders and files within it. In the link below you can find a simple function to extract the text from docx
file, without the need to rely on python-docx
and lxml
the latter being sometimes hard to install:
http://etienned.github.io/posts/extract-text-from-word-docx-simply/