retrieving s3 path from payload inside AWS glue pythonshell job

别来无恙 提交于 2019-12-11 11:02:14

问题


I have a pythonshell job inside AWS glue that needs to download a file from a s3 path. This s3 path location is a variable so will come to the glue job as a payload in start_run_job call like below:

import boto3    
payload = {'s3_target_file':s3_TARGET_FILE_PATH,
            's3_test_file': s3_TEST_FILE_PATH}
    job_def = dict(
                JobName=MY_GLUE_PYTHONSHELL_JOB,
                Arguments=payload,
                WorkerType='Standard',
                NumberOfWorkers=2,
            )

response = glue.start_job_run(**job_def)

My question is, how do I retrieve those s3 paths from the payload inside AWS Glue pythonshell job that comes through boto3? Is there any sort of handler we need to write similar to AWS Lambda?

Please suggest.


回答1:


Check the docimentation. All you need is here.

You can use the getResolvedOptions as follows:

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv,
                          ['JOB_NAME',
                           'day_partition_key',
                           'hour_partition_key',
                           'day_partition_value',
                           'hour_partition_value'])
print "The day partition key is: ", args['day_partition_key']
print "and the day partition value is: ", args['day_partition_value']


来源:https://stackoverflow.com/questions/58044032/retrieving-s3-path-from-payload-inside-aws-glue-pythonshell-job

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!