Simplest tool in AWS for very simple (transform in) ETL?

痞子三分冷 提交于 2019-12-25 01:11:36

问题


We have numerous files in S3 totally tens of gigabytes. We need to get them into CSV format, currently the files have delimiters that are not commas. Normally I would do this on a server using sed but I don't want to have to transfer the files to a server, I want to read directly from S3, translate to CSV line by line, and write the results back to new S3 files.

Glue appears to be able to do this but I sense the learning curve and setup for such a simple task is overkill.

Is there not some easy way to do easy tasks such as this, maybe in EMR or some other AWS tool? We use Athena and I'm wonder if this could be done in an SQL statement using Athena? Thanks


回答1:


Yes that should be very easy and you dont need any external ETL tool or glue. Suppose you have a pipe delimited table named "cust_transaction_pipe" which is based on a pipe delimited file and you can query the table using Athena without any issues. To convert that table to comma delimited, just use the query below:

create table cust_transaction_csv 
with (external_location = 's3://YOUR_S3_BUCKET_NAME/cust_tx_csv/',format='TEXTFILE',field_delimiter = ',')
as 
select * from cust_transaction_pipe

Once its complete, you can check the location you specified. There will be file comma delimited. You can specify lots of other options inside the WITH (). For complete set of options, please see the Athena AWS Documentation link here.



来源:https://stackoverflow.com/questions/54114789/simplest-tool-in-aws-for-very-simple-transform-in-etl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!