Load S3 Data into AWS SageMaker Notebook

前端 未结 7 1320
生来不讨喜
生来不讨喜 2020-12-01 07:39

I\'ve just started to experiment with AWS SageMaker and would like to load data from an S3 bucket into a pandas dataframe in my SageMaker python jupyter notebook for analysi

相关标签:
7条回答
  • 2020-12-01 08:08

    Do make sure the Amazon SageMaker role has policy attached to it to have access to S3. It can be done in IAM.

    0 讨论(0)
  • 2020-12-01 08:13
    import boto3
    import pandas as pd
    from sagemaker import get_execution_role
    
    role = get_execution_role()
    bucket='my-bucket'
    data_key = 'train.csv'
    data_location = 's3://{}/{}'.format(bucket, data_key)
    
    pd.read_csv(data_location)
    
    0 讨论(0)
  • 2020-12-01 08:15

    You could also access your bucket as your file system using s3fs

    import s3fs
    fs = s3fs.S3FileSystem()
    
    # To List 5 files in your accessible bucket
    fs.ls('s3://bucket-name/data/')[:5]
    
    # open it directly
    with fs.open(f's3://bucket-name/data/image.png') as f:
        display(Image.open(f))
    
    0 讨论(0)
  • 2020-12-01 08:16

    This code sample to import csv file from S3, tested at SageMaker notebook.

    Use pip or conda to install s3fs. !pip install s3fs

    import pandas as pd
    
    my_bucket = '' #declare bucket name
    my_file = 'aa/bb.csv' #declare file path
    
    import boto3 # AWS Python SDK
    from sagemaker import get_execution_role
    role = get_execution_role()
    
    data_location = 's3://{}/{}'.format(my_bucket,my_file)
    data=pd.read_csv(data_location)
    data.head(2)
    
    0 讨论(0)
  • 2020-12-01 08:19

    If you have a look here it seems you can specify this in the InputDataConfig. Search for "S3DataSource" (ref) in the document. The first hit is even in Python, on page 25/26.

    0 讨论(0)
  • 2020-12-01 08:20

    You can also use AWS Data Wrangler https://github.com/awslabs/aws-data-wrangler:

    import awswrangler as wr
    
    df = wr.pandas.read_csv(path="s3://...")
    
    0 讨论(0)
提交回复
热议问题