I would like to read in an excel spreadsheet to python / pandas, but have the formulae instead of the cell results.
For example, if cell A1 is 25, and cell B1 is =A1
Yes, it is doable. I recently found a package that solves this issue in a quite sophisticated way. It is called portable-spreadsheet (available via pip install portable-spreadsheet
). It basically encapsulates xlsxwriter
. Simple example:
import portable_spreadsheet as ps
sheet = ps.Spreadsheet.create_new_sheet(5, 5)
# Set values
sheet.iloc[0, 0] = 25 # Set A1
sheet.iloc[1, 0] = sheet.iloc[0, 0] # reference to A1
# Export to Excel
sheet.to_excel('output/sample.xlsx')
Actually, it is doable. You currently have something like this
import pandas as pd
import numpy as np
a2_value = "=A1"
data = [1, "=A1", 3, 4]
s = pd.Series(list(data))
writer = pd.ExcelWriter('output.xlsx')
s.to_excel(writer, 'Sheet1', header=False, index=False)
writer.save()
What you can do is actually this:
import pandas as pd
import numpy as np
data = [1, '=CONCATENATE("=A1", '')', 3, 4]
s = pd.Series(list(data))
writer = pd.ExcelWriter('output.xlsx')
s.to_excel(writer, 'Sheet1', header=False, index=False)
writer.save()
That's as well formula, but =A1 will be visible instead of its value.
OpenPyXL provides this capacity out-of-the-box. See here and here. An example:
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = 'empty_book.xlsx')
sheet_names = wb.get_sheet_names()
name = sheet_names[0]
sheet_ranges = wb[name]
df = pd.DataFrame(sheet_ranges.values)