Read CSV/Excel files from SFTP file, make some changes in those files using Pandas, and save back

痞子三分冷 提交于 2020-05-22 07:47:51

问题


I want to read some CSV/Excel files on a secure SFTP folder, make some changes (fixed changes in each file like remove column 2) in those files, upload them to a Postgre DB and also the upload them to a different SFTP path in Python

What's the best way to this?

I have made a connection to the SFTP using pysftp library and am reading the Excel:

import pysftp
import pandas as pd

myHostname = "*****"
myUsername = "****"
myPassword = "***8"
cnopts =pysftp.CnOpts()
cnopts.hostkeys = None  

sftp=pysftp.Connection(host=myHostname, username=myUsername, 
password=myPassword,cnopts=cnopts)
print ("Connection succesfully stablished ... ")
sftp.chdir('test/test')
#sftp.pwd
a=[]
for i in sftp.listdir_attr():
    with sftp.open(i.filename) as f:
        df=pd.read_csv(f)

How should I proceed with the upload to DB and making those changes to the CSV permanent?


回答1:


You have the download part done.

For the upload part, see How to Transfer Pandas DataFrame to .csv on SFTP using Paramiko Library in Python? – While it's for Paramiko, pysftp Connection.open method behaves identically to Paramiko SFTPClient.open, so the code is the same.

Full code can be like:

with sftp.open("/remote/path/data.csv", "r+", bufsize=32768) as f:
    # Download CSV contents from SFTP to memory
    df = pd.read_csv(f)

    # Modify as you need (just an example)
    df.at[0, 'Name'] = 'changed'

    # Upload the in-memory data back to SFTP
    f.seek(0)
    df.to_csv(f, index=False)
    # Truncate the remote file in case the new version of the contents is smaller
    f.truncate(f.tell())

The above updates the same file. If you want to upload to a different file, use this:

# Download CSV contents from SFTP to memory
with sftp.open("/remote/path/source.csv", "r") as f:
    df = pd.read_csv(f)

# Modify as you need (just an example)
df.at[0, 'Name'] = 'changed'

# Upload the in-memory data back to SFTP
with sftp.open("/remote/path/target.csv", "w", bufsize=32768) as f:
    df.to_csv(f, index=False)

For the purpose of bufsize, see:
Writing to a file on SFTP server opened using pysftp "open" method is slow


Obligatory warning: Do not set cnopts.hostkeys = None, unless you do not care about security. For the correct solution see Verify host key with pysftp.




回答2:


That's several questions in one question :)

I would suggest go with that approach:

  1. Make a local copy of the file (not sure how big it is, no point to shuffle it around between your local machine and sftp server. You cna use get method
  2. Make operations on your data with pandas, then dump it back to csv with to_csv method
  3. load data to the postgree using either pandas.io or pure SQLAlchemy. Check the docs here
  4. Upload the file to the destination you want with put method


来源:https://stackoverflow.com/questions/61817226/read-csv-excel-files-from-sftp-file-make-some-changes-in-those-files-using-pand

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!