How to extract the url in hyperlinks from a docx file using python

前端 未结 5 1189
清酒与你
清酒与你 2020-12-18 11:21

I\'ve been trying to find out how to get urls from a docx file using python, but failed to find anything, i\'ve tried python-docx, and python-docx2txt, but python-docx only

5条回答
  •  梦毁少年i
    2020-12-18 11:58

    I'm late to this party, but if you want something that pulls all the links out of .docx files and makes a spreadsheet of them (or returns a list of them), I have a script that might do that for you. It includes both the URL and the linked text, and you can feed it a whole folder if you want.

    https://github.com/Colin-Fredericks/hx-py/blob/master/XML_utilities/GetWordLinks.py

    It uses BeautifulSoup and UnicodeCSV, both of which you can also grab from that same repo. Runs in Python3. Instructions at the top of the file. Handles non-ascii characters. Only tested on Mac and Ubuntu so far. Excel does not reliably import Unicode CSVs, though Google Drive does. Offer void() where prohibited.

提交回复
热议问题