I have a fairly big pandas dataframe - 50
or so headers and a few hundred thousand rows of data - and I\'m looking to transfer this data to a database using the
I managed to figure this out in the end.
So if you have a Pandas Dataframe which you want to write to a database using ceODBC
which is the module I used, the code is:
(with all_data
as the dataframe) map dataframe values to string and store each row as a tuple in a list of tuples
for r in all_data.columns.values:
all_data[r] = all_data[r].map(str)
all_data[r] = all_data[r].map(str.strip)
tuples = [tuple(x) for x in all_data.values]
for the list of tuples, change all null value signifiers - which have been captured as strings in conversion above - into a null type which can be passed to the end database. This was an issue for me, might not be for you.
string_list = ['NaT', 'nan', 'NaN', 'None']
def remove_wrong_nulls(x):
for r in range(len(x)):
for i,e in enumerate(tuples):
for j,k in enumerate(e):
if k == x[r]:
temp=list(tuples[i])
temp[j]=None
tuples[i]=tuple(temp)
remove_wrong_nulls(string_list)
create a connection to the database
cnxn=ceODBC.connect('DRIVER={SOMEODBCDRIVER};DBCName=XXXXXXXXXXX;UID=XXXXXXX;PWD=XXXXXXX;QUIETMODE=YES;', autocommit=False)
cursor = cnxn.cursor()
define a function to turn the list of tuples into a new_list
which is a further indexing on the list of tuples, into chunks of 1000. This was necessary for me to pass the data to the database whose SQL Query could not exceed 1MB.
def chunks(l, n):
n = max(1, n)
return [l[i:i + n] for i in range(0, len(l), n)]
new_list = chunks(tuples, 1000)
define your query.
query = """insert into XXXXXXXXXXXX("XXXXXXXXXX", "XXXXXXXXX", "XXXXXXXXXXX") values(?,?,?)"""
Run through the the new_list
containing the list of tuples in groups of 1000 and perform executemany
. Follow this by committing and closing the connection and that's it :)
for i in range(len(new_list)):
cursor.executemany(query, new_list[i])
cnxn.commit()
cnxn.close()