Reading a file from a private S3 bucket to a pandas dataframe

前端未结

关注

 8  775

猫巷女王i 2020-12-08 10:19

I\'m trying to read a CSV file from a private S3 bucket to a pandas dataframe:

df = pandas.read_csv(\'s3://mybucket/file.csv\')

I can read

8条回答

猫巷女王i (楼主)

2020-12-08 10:45

In addition to other awesome answers, if a custom endpoint is required, it is possible to use pd.read_csv('s3://...') syntax by monkey patching the s3fs init method.

import s3fs
s3fsinit = s3fs.S3FileSystem.__init__
def s3fsinit_patched(self, *k, *kw):
    s3fsinit(self, *k, client_kwargs={'endpoint_url': 'https://yourcustomendpoint'}, **kw)
s3fs.S3FileSystem.__init__ = s3fsinit_patched

Or, a more elegant way:

import s3fs
class S3FileSystemPatched(s3fs.S3FileSystem):
    def __init__(self, *k, **kw):
        super(S3FileSystemPatched, self).__init__( *k,
                                                  key = os.environ['aws_access_key_id'],
                                                  secret = os.environ['aws_secret_access_key'],
                                                  client_kwargs={'endpoint_url': 'https://yourcustomendpoint'},
                                                  **kw)
        print('S3FileSystem is patched')
s3fs.S3FileSystem = S3FileSystemPatched

Also see: s3fs custom endpoint url

0 讨论(0)

查看其它8个回答