问题
I have a bucket/folder into which a lot for files are coming in every minutes. How can I read only the new files based on file timestamp.
eg: list all files with timestamp > my_timestamp
回答1:
This is not a feature that gsutil or the GCS API provides, as there is no way to list objects by timestamp.
Instead, you could subscribe to new objects using the GCS Cloud Pub/Sub feature.
回答2:
You could use some bash-fu:
gsutil ls -l gs://your-bucket-name | sort -k2n | tail -n1 | awk 'END {$1=$2=""; sub(/^[ \t]+/, ""); print }'
breaking that down:
gsutil ls -l gs://your-bucket-name
# grab detailed list of objects in bucketsort -k2n
# sort by number on the date fieldtail -n1
# grab the last row returnedawk 'END {$1=$2=""; sub(/^[ \t]+/, ""); print }'
# delete first two cols (size and date) and ltrim to remove whitespace
Tested with Google Cloud SDK v186.0.0
, gsutil v4.28
回答3:
If you are interested in new files or we can say in another words the files which are not present in your destination bucket then alternatively you can use gsutil -n option as it copies only those files which are not present in destination bucket.
From documentation https://cloud.google.com/storage/docs/gsutil/commands/cp?hl=ru
No-clobber. When specified, existing files or objects at the destination will not be overwritten. Any items that are skipped by this option will be reported as being skipped. This option will perform an additional GET request to check if an item exists before attempting to upload the data. This will save retransmitting data, but the additional HTTP requests may make small object transfers slower and more expensive.
cons with this approach is, it makes a check request for every file present in your source bucket
来源:https://stackoverflow.com/questions/44017463/google-cloud-storage-how-to-get-list-of-new-files-in-bucket-folder-using-gsutil