Reading a file from a private S3 bucket to a pandas dataframe

前端 未结 8 774
猫巷女王i
猫巷女王i 2020-12-08 10:19

I\'m trying to read a CSV file from a private S3 bucket to a pandas dataframe:

df = pandas.read_csv(\'s3://mybucket/file.csv\')

I can read

8条回答
  •  孤城傲影
    2020-12-08 10:26

    Updated for Pandas 0.20.1

    Pandas now uses s3fs to handle s3 coonnections. link

    pandas now uses s3fs for handling S3 connections. This shouldn’t break any code. However, since s3fs is not a required dependency, you will need to install it separately, like boto in prior versions of pandas.

    import os
    
    import pandas as pd
    from s3fs.core import S3FileSystem
    
    # aws keys stored in ini file in same path
    # refer to boto3 docs for config settings
    os.environ['AWS_CONFIG_FILE'] = 'aws_config.ini'
    
    s3 = S3FileSystem(anon=False)
    key = 'path\to\your-csv.csv'
    bucket = 'your-bucket-name'
    
    df = pd.read_csv(s3.open('{}/{}'.format(bucket, key),
                             mode='rb')
                     )
    

提交回复
热议问题