问题
I have been looking at mostly the xlrd and openpyxl libraries for Excel file manipulation. However, xlrd currently does not support formatting_info=True
for .xlsx files, so I can not use the xlrd hyperlink_map
function. So I turned to openpyxl, but have also had no luck extracting a hyperlink from an excel file with it. Test code below (the test file contains a simple hyperlink to google with hyperlink text set to "test"):
import openpyxl
wb = openpyxl.load_workbook('testFile.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
r = 0
c = 0
print ws.cell(row = r, column = c). value
print ws.cell(row = r, column = c). hyperlink
print ws.cell(row = r, column = c). hyperlink_rel_id
Output:
test
None
I guess openpyxl does not currently support formatting completely either? Is there some other library I can use to extract hyperlink information from Excel (.xlsx) files?
回答1:
In my experience getting good .xlsx interaction requires moving to IronPython. This lets you work with the Common Language Runtime (clr) and interact directly with excel'
http://ironpython.net/
import clr
clr.AddReference("Microsoft.Office.Interop.Excel")
import Microsoft.Office.Interop.Excel as Excel
excel = Excel.ApplicationClass()
wb = excel.Workbooks.Open('testFile.xlsx')
ws = wb.Worksheets['Sheet1']
address = ws.Cells(row, col).Hyperlinks.Item(1).Address
回答2:
This should be possible with openpyxl now:
import openpyxl
wb = openpyxl.load_workbook('yourfile.xlsm')
ws = wb['Sheet1']
print(ws.cell(row=2, column=1).hyperlink.target) # This will fail if there is no
# hyperlink to target
回答3:
FYI, the problem with openpyxl
is an actual bug.
And, yes, xlrd
cannot read the hyperlink without formatting_info
, which is currently not supported for xlsx
.
回答4:
Starting from at least version openpyxl-2.4.0b1 this bug https://bitbucket.org/openpyxl/openpyxl/issue/152/hyperlink-returns-empty-string-instead-of was fixed. Now it's return for cell Hyperlink object:
hl_obj = ws.row(col).hyperlink # getting Hyperlink object for Cell
#hl_obj = ws.cell(row = r, column = c).hyperlink This could be used as well.
if hl_obj:
print(hl_obj.display)
print(hl_obj.target)
print(hl_obj.tooltip) # you can see it when hovering mouse on hyperlink in Excel
print(hl_obj) # to see other stuff if you need
回答5:
A successful solution I've worked with is to install unoconv on the server and implement a method that invokes this command line tool via the subprocess module to convert the file from xlsx to xls since hyperlink_map.get() works with xls.
回答6:
For direct manipulation of Excel files it's also worth looking at the excellent XlWings library.
回答7:
If instead of just .hyperlink, doing .hyperlink.target should work. I was getting a 'None' as well from using just ".hyperlink" on the cell object before that.
来源:https://stackoverflow.com/questions/16676727/extracting-hyperlinks-from-excel-xlsx-with-python