Can I delete data (rows in tables) from Athena?

旧巷老猫 提交于 2019-12-10 14:54:00

问题


Is it possible to delete data stored in S3 through an Athena query? I have some rows I have to delete from a couple of tables (they point to separate buckets in S3).

I couldn't find a way to do it in the Athena User Guide: https://docs.aws.amazon.com/athena/latest/ug/athena-ug.pdf and DELETE FROM isn't supported, but I'm wondering if there is an easier way than trying to find the files in S3 and deleting them.


回答1:


You can leverage Athena to find out all the files that you want to delete and then delete them separately. There is a special variable "$path".

Select "$path" from <table> where <condition to get row of files to delete>

To automate this, you can have iterator on Athena results and then get filename and delete them from S3.




回答2:


I would just like to add to Dhaval's answer.

You can find out the path of the file with the rows that you want to delete and instead of deleting the entire file, you can just delete the rows from the S3 file which I am assuming would be in the Json format.

The process is to download the particular file which has those rows, remove the rows from that file and upload the same file to S3.

This just replaces the original file with the one with modified data (in your case, without the rows that got deleted). After the upload, Athena would tranform the data again and the deleted rows won't show up.




回答3:


I also would like to add that after you find the files to be updated you can filter the rows you want to delete, and create new files using CTAS: https://docs.aws.amazon.com/athena/latest/ug/ctas.html

Later you can replace the old files with the new ones created by CTAS. I think it is the most simple way to go



来源:https://stackoverflow.com/questions/48815504/can-i-delete-data-rows-in-tables-from-athena

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!