Apache Airflow: operator to copy s3 to s3

淺唱寂寞╮ 提交于 2020-05-28 04:40:27

问题


What is the best operator to copy a file from one s3 to another s3 in airflow? I tried S3FileTransformOperator already but it required either transform_script or select_expression. My requirement is to copy the exact file from source to destination.


回答1:


You have 2 options (even when I disregard Airflow)

  1. Use AWS CLI: cp command
    • aws s3 cp <source> <destination>
    • In Airflow this command can be run using BashOperator (local machine) or SSHOperator (remote machine)
  2. Use AWS SDK aka boto3
    • Here you'll be using boto3's S3Client
    • Airflow already provides a wrapper over it in form of S3Hook
    • Even copy_object(..) method of S3Client is available in S3Hook as (again) copy_object(..)
    • You can use S3Hook inside any suitable custom operator or just PythonOperator



回答2:


Use S3CopyObjectOperator

copy_step = S3CopyObjectOperator(
   source_bucket_key='source_file',
   dest_bucket_key='dest_file',
   aws_conn_id='aws_connection_id',
   source_bucket_name='source-bucket',
   dest_bucket_name='dest-bucket'
)


来源:https://stackoverflow.com/questions/55135735/apache-airflow-operator-to-copy-s3-to-s3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!