Load S3 Data into AWS SageMaker Notebook

前端未结

关注

 7  1320

I\'ve just started to experiment with AWS SageMaker and would like to load data from an S3 bucket into a pandas dataframe in my SageMaker python jupyter notebook for analysi

相关标签:

7条回答

无人及你

2020-12-01 08:08

Do make sure the Amazon SageMaker role has policy attached to it to have access to S3. It can be done in IAM.

0 讨论(0)
发布评论:

提交评论
- 加载中...

情话喂你

2020-12-01 08:13

import boto3
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()
bucket='my-bucket'
data_key = 'train.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)

pd.read_csv(data_location)

0 讨论(0)

执笔经年

2020-12-01 08:15

You could also access your bucket as your file system using s3fs

import s3fs
fs = s3fs.S3FileSystem()

# To List 5 files in your accessible bucket
fs.ls('s3://bucket-name/data/')[:5]

# open it directly
with fs.open(f's3://bucket-name/data/image.png') as f:
    display(Image.open(f))

0 讨论(0)

南方客

2020-12-01 08:16

This code sample to import csv file from S3, tested at SageMaker notebook.

Use pip or conda to install s3fs. !pip install s3fs

import pandas as pd

my_bucket = '' #declare bucket name
my_file = 'aa/bb.csv' #declare file path

import boto3 # AWS Python SDK
from sagemaker import get_execution_role
role = get_execution_role()

data_location = 's3://{}/{}'.format(my_bucket,my_file)
data=pd.read_csv(data_location)
data.head(2)

0 讨论(0)

轻奢々

2020-12-01 08:19

If you have a look here it seems you can specify this in the InputDataConfig. Search for "S3DataSource" (ref) in the document. The first hit is even in Python, on page 25/26.

0 讨论(0)
发布评论:

提交评论
- 加载中...
轮回少年

2020-12-01 08:20
You can also use AWS Data Wrangler https://github.com/awslabs/aws-data-wrangler:
```
import awswrangler as wr

df = wr.pandas.read_csv(path="s3://...")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页