how to transform pandas dataframe for insertion via executemany() statement?

前端未结

关注

 3  1865

天命终不由人 2020-12-30 04:01

I have a fairly big pandas dataframe - 50 or so headers and a few hundred thousand rows of data - and I\'m looking to transfer this data to a database using the

3条回答

鱼传尺愫 (楼主)

2020-12-30 04:46
I managed to figure this out in the end. So if you have a Pandas Dataframe which you want to write to a database using ceODBC which is the module I used, the code is:

(with all_data as the dataframe) map dataframe values to string and store each row as a tuple in a list of tuples
```
for r in all_data.columns.values:
    all_data[r] = all_data[r].map(str)
    all_data[r] = all_data[r].map(str.strip)   
tuples = [tuple(x) for x in all_data.values]
```
for the list of tuples, change all null value signifiers - which have been captured as strings in conversion above - into a null type which can be passed to the end database. This was an issue for me, might not be for you.
```
string_list = ['NaT', 'nan', 'NaN', 'None']

def remove_wrong_nulls(x):
    for r in range(len(x)):
        for i,e in enumerate(tuples):
            for j,k in enumerate(e):
                if k == x[r]:
                    temp=list(tuples[i])
                    temp[j]=None
                    tuples[i]=tuple(temp)

remove_wrong_nulls(string_list)
```
create a connection to the database
```
cnxn=ceODBC.connect('DRIVER={SOMEODBCDRIVER};DBCName=XXXXXXXXXXX;UID=XXXXXXX;PWD=XXXXXXX;QUIETMODE=YES;', autocommit=False)
cursor = cnxn.cursor()
```
define a function to turn the list of tuples into a new_list which is a further indexing on the list of tuples, into chunks of 1000. This was necessary for me to pass the data to the database whose SQL Query could not exceed 1MB.
```
def chunks(l, n):
    n = max(1, n)
    return [l[i:i + n] for i in range(0, len(l), n)]

new_list = chunks(tuples, 1000)
```
define your query.
```
query = """insert into XXXXXXXXXXXX("XXXXXXXXXX", "XXXXXXXXX", "XXXXXXXXXXX") values(?,?,?)"""
```
Run through the the new_list containing the list of tuples in groups of 1000 and perform executemany. Follow this by committing and closing the connection and that's it :)
```
for i in range(len(new_list)):
    cursor.executemany(query, new_list[i])
cnxn.commit()
cnxn.close()
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...