Reading an JSON file from S3 using Python boto3

前端 未结 5 922
灰色年华
灰色年华 2020-12-08 04:41

I kept following JSON in S3 bucket \'test\'

{
  \'Details\' : \"Something\" 
}

I am using following code to read this JSON and printing the

相关标签:
5条回答
  • 2020-12-08 04:48

    The following worked for me.

    # read_s3.py
    import boto3
    BUCKET = 'MY_S3_BUCKET_NAME'
    FILE_TO_READ = 'FOLDER_PATH/my_file.json'
    client = boto3.client('s3',
                           aws_access_key_id='MY_AWS_KEY_ID',
                           aws_secret_access_key='MY_AWS_SECRET_ACCESS_KEY'
                         )
    result = client.get_object(Bucket=BUCKET, Key=FILE_TO_READ) 
    text = result["Body"].read().decode()
    print(text['Details']) # Use your desired JSON Key for your value 
    

    It is not good idea to hard code the AWS Id & Secret Keys directly. For best practices, you can consider either of the followings:

    (1) Read your AWS credentials from a json file stored in your local storage:

    import json
    credentials = json.load(open('aws_cred.json'))
    client = boto3.client('s3',
                           aws_access_key_id=credentials['MY_AWS_KEY_ID'],
                           aws_secret_access_key=credentials['MY_AWS_SECRET_ACCESS_KEY']
                         )
    

    (2) Read from your environment variable (my preferred option for deployment):

    import os
    client = boto3.client('s3',
                           aws_access_key_id=os.environ['MY_AWS_KEY_ID'],
                           aws_secret_access_key=os.environ['MY_AWS_SECRET_ACCESS_KEY']
                         )
    

    Let's prepare a shell script (set_env.sh) for setting the environment variables and add our python script (read_s3.py) as follows:

    # set_env.sh
    export MY_AWS_KEY_ID='YOUR_AWS_ACCESS_KEY_ID'
    export MY_AWS_SECRET_ACCESS_KEY='YOUR_AWS_SECRET_ACCESS_KEY'
    # execute the python file containing your code as stated above that reads from s3
    python read_s3.py # will execute the python script to read from s3
    

    Now execute the shell script in a terminal as follows:

    sh set_env.sh
    
    0 讨论(0)
  • 2020-12-08 04:58

    I was stuck for a bit as the decoding didn't work for me (s3 objects are gzipped).

    Found this discussion which helped me: Python gzip: is there a way to decompress from a string?

    import boto3
    import zlib
    
    key = event["Records"][0]["s3"]["object"]["key"]
    bucket_name = event["Records"][0]["s3"]["bucket"]["name"]
    
    s3_object = S3_RESOURCE.Object(bucket_name, key).get()['Body'].read()
    
    jsonData = zlib.decompress(s3_object, 16+zlib.MAX_WBITS)
    

    If youprint jsonData, you'll see your desired JSON file! If you are running test in AWS itself, be sure to check CloudWatch logs as in lambda it wont output full JSON file if its too long.

    0 讨论(0)
  • 2020-12-08 05:02

    As mentioned in the comments above, repr has to be removed and the json file has to use double quotes for attributes. Using this file on aws/s3:

    {
      "Details" : "Something"
    }
    

    and the following Python code, it works:

    import boto3
    import json
    
    s3 = boto3.resource('s3')
    
    content_object = s3.Object('test', 'sample_json.txt')
    file_content = content_object.get()['Body'].read().decode('utf-8')
    json_content = json.loads(file_content)
    print(json_content['Details'])
    # >> Something
    
    0 讨论(0)
  • 2020-12-08 05:07

    Wanted to add that the botocore.response.streamingbody works well with json.load:

    import json
    import boto3
    
    s3 = boto3.resource('s3')
    
    obj = s3.Object(bucket, key)
    data = json.load(obj.get()['Body']) 
    
    0 讨论(0)
  • 2020-12-08 05:07

    You can use the below code in AWS Lambda to read the JSON file from the S3 bucket and process it using python.

    import json
    import boto3
    import sys
    import logging
    
    # logging
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    VERSION = 1.0
    
    s3 = boto3.client('s3')
    
    def lambda_handler(event, context):
        bucket = 'my_project_bucket'
        key = 'sample_payload.json'
        
        response = s3.get_object(Bucket = bucket, Key = key)
        content = response['Body']
        jsonObject = json.loads(content.read())
        print(jsonObject)
    
    0 讨论(0)
提交回复
热议问题