I am trying to replace a bunch of strings in an .xlsx sheet (~70k rows, 38 columns). I have a list of the strings to be searched and replaced in a file, formatted as below:-
For reading and writing xls with Python, use xlrd and xlwt, see http://www.python-excel.org/
A simple xlrd example:
from xlrd import open_workbook
wb = open_workbook('simple.xls')
for s in wb.sheets():
print 'Sheet:',s.name
for row in range(s.nrows):
values = []
for col in range(s.ncols):
print(s.cell(row,col).value)
and for replacing target text, use a dict
replace = {
'bird produk': 'bird product',
'pig': 'pork',
'ayam': 'chicken'
...
'kuda': 'horse'
}
Dict will give you O(1)(most of the time, if keys don't collide) time complexity when checking membership using 'text' in replace. there's no way to get better performance than that.
Since I don't know what your bunch of strings look like, this answer may be inaccurate or incomplete.