s3 - how to get fast line count of file? wc -l is too slow

眉间皱痕 提交于 2019-12-01 22:58:01

Here's two methods that might work for you...

Amazon S3 has a new feature called S3 Select that allows you to query files stored on S3.

You can perform a count of the number of records (lines) in a file and it can even work on GZIP files. Results may vary depending upon your file format.

Amazon Athena is also a similar option that might be suitable. It can query files stored in Amazon S3.

Yes, Amazon S3 is having the SELECT feature, also keep an eye on the cost while executing any query from SELECT tab.. For example, here is the price @Jun2018 (This may varies) S3 Select pricing is based on the size of the input, the output, and the data transferred. Each query will cost 0.002 USD per GB scanned, plus 0.0007 USD per GB returned.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!