Google Cloud Storage: How to get list of new files in bucket/folder using gsutil

怎甘沉沦 提交于 2019-12-05 17:04:26

This is not a feature that gsutil or the GCS API provides, as there is no way to list objects by timestamp.

Instead, you could subscribe to new objects using the GCS Cloud Pub/Sub feature.

You could use some bash-fu:

gsutil ls -l gs://your-bucket-name | sort -k2n | tail -n1 | awk 'END {$1=$2=""; sub(/^[ \t]+/, ""); print }'

breaking that down:

  • gsutil ls -l gs://your-bucket-name # grab detailed list of objects in bucket
  • sort -k2n # sort by number on the date field
  • tail -n1 # grab the last row returned
  • awk 'END {$1=$2=""; sub(/^[ \t]+/, ""); print }' # delete first two cols (size and date) and ltrim to remove whitespace

Tested with Google Cloud SDK v186.0.0, gsutil v4.28

If you are interested in new files or we can say in another words the files which are not present in your destination bucket then alternatively you can use gsutil -n option as it copies only those files which are not present in destination bucket.

From documentation https://cloud.google.com/storage/docs/gsutil/commands/cp?hl=ru

No-clobber. When specified, existing files or objects at the destination will not be overwritten. Any items that are skipped by this option will be reported as being skipped. This option will perform an additional GET request to check if an item exists before attempting to upload the data. This will save retransmitting data, but the additional HTTP requests may make small object transfers slower and more expensive.

cons with this approach is, it makes a check request for every file present in your source bucket

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!