How to read bucket image from AWS S3 into Sagemaker Jupyter Instance

蓝咒 提交于 2021-01-29 15:33:29

问题


I am very new to AWS and the cloud environment. I am a machine learning engineer, I am planning to build a custom CNN into the AWS environment to predict a given image has an iPhone present or not.

What I have done:

Step 1:

I have created a S3 bucket for iPhone classifier with the below folder structure :

 Iphone_Classifier > Train > Yes_iphone_images > 1000 images
                           > No_iphone_images  > 1000 images

                   > Dev   > Yes_iphone_images > 100 images
                           > No_iphone_images  > 100 images

                   > Test  > 30 random images

Permission - > Block all public access

Step 2:

Then I go to Amazon Sagemaker, and create an instance:

I select the following

 Name: some-xyz,
 Type: ml.t2.medium
 IAM : created new IAM role ( root access was enabled.)
 others: All others were in default

Then the notebook instance was created and opened.

Step 3:

Once I had the instance opened,

1. I used to prefer - conda_tensorflow2_p36 as interpreter
2. Created a new Jupyter notebook and stated.
3. I checked image classification examples but was confused, and most others used CSV files, but I want to retrieve images from S3 buckets. 

Question:

1. How simply can we access the S3 bucket image dataset from the Jupiter Instances of Sagemaker? 
2. I exactly need the reference code to access the S3 bucket images. 
3. Is it a good approach to copy the data to the notebook or is it better to work from the S3 bucket.

What I have tried was:

import boto3
client = boto3.client('s3')

# I tried this one and failed
#path = 's3://iphone/Train/Yes_iphone_images/100.png'

# I tried this one and failed
path = 's3://iphone/Test/10.png'

# I uploaded to the notebook instance an image file and when I try to read it works
#path = 'thiyaga.jpg'
print(path)

import cv2
from matplotlib import pyplot as plt
print(cv2.__version__)
plt.imshow(img)

回答1:


If your image is binary-encoded, you could try this:

import boto3 
import matplotlib.pyplot as plt 

# Define Bucket and Key 
s3_bucket, s3_key = 'YOUR_BUCKET', 'YOUR_IMAGE_KEY'

with BytesIO() as f:
    boto3.client("s3").download_fileobj(Bucket=s3_bucket, Key=s3_key, Fileobj=f)
    f.seek(0)
    img = plt.imread(f, format='png')

in other case, the following code works out (based on the documentation):

s3 = boto3.resource('s3')

img = s3.Bucket(s3_bucket).download_file(s3_key, 'local_image.jpg')

In both cases, you can visualize the image with plt.imshow(img).

In your path example path = 's3://iphone/Test/10.png', the bucket and key will be s3_bucket = 'iphone' and s3_key=Test/10.png

Additional Resources: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-download-file.html




回答2:


I think the most convenient way is to upload your images directly into the space that your notebook exists. Sagemaker comes with a minimum space of 5G or much more if you specify during creating the instance. First you can compress the whole data set (folder) into a .tgz file using your shell:

tar -cvzf <name of tarball>.tgz /path/to/source/folder

Then use the upload button of your jupyter instance to upload. Next step to untar, run below command in a cell of your notebook:

!tar -xzvf <name of tarball>.tgz

At this point you should be able to simply access your files/folder through python syntax eg.:

path = Path("./folder_name/")



回答3:


An easy way is to use S3FS. You can read all images in a directory. For example, a directory can contain all images with an iphone.

import s3fs
fs = s3fs.S3FileSystem()

no_iphone_images_directory = 's3://iphone_images/no_iphone_images'
filenames = fs.ls(no_iphone_images_directory)
for filename in filenames:
    if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
        with fs.open(filename, 'rb') as f:
            # Do something with the image


来源:https://stackoverflow.com/questions/63328246/how-to-read-bucket-image-from-aws-s3-into-sagemaker-jupyter-instance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!