问题
I'm using gsutil rsync, copying from s3 to gs, and I'm getting the following error after gsutil has gone partway through a bucket:
Caught non-retryable exception while listing s3://[bucket]/: BadRequestException: 400 None CommandException: Caught non-retryable exception - aborting rsync
This is undesirable behavior, because I can manually copy from s3 to gs other files. I can't bypass by using the "-C" switch, since this isn't an error in copying.
Edit: Appears that if a "#" is in a filename in s3, gsutil replaces it with "?versionId=". For example:
S3 filename: Updaet#2_Montgomery Building Permits.xlsx
GS lists in debug output as: Updaet?versionId=2_Montgomery Building Permits.xlsx
回答1:
can you please provide more details about this failure by running:
gsutil -D rsync your-source your-destination
and then excerpting the HTTP request/response that shows the error? When you do please redact the authorization: header.
If you'd prefer not to post the details of your request on the public forum you can email them to me at gs-team@google.com
Thanks.
回答2:
This same thing happened to me yesterday, and the '#' is indeed the problem.
The issue appears to be in boto, not necessarily gsutil, though I don't know exactly where the fix is. BotoTranslation._StorageUriForObject()
calls boto.storage_uri()
which uses VERSION_RE ('(?P<versionless_uri_str>.+)#(?P<version_id>.+)$')
to find a version in the uri_str/path. If the object name contains a '#', everything after it will therefore get treated as an S3 version ID. I don't see that there is currently any way to escape or encode the '#' so that it doesn't get treated as a version separator.
来源:https://stackoverflow.com/questions/28582964/gsutil-rsync-gives-a-400-non-retryable-exception-on-s3-bucket