s3 urls - get bucket name and path

后端 未结 7 809
南方客
南方客 2020-12-13 23:20

I have a variable which has the aws s3 url

s3://bucket_name/folder1/folder2/file1.json

I want to get the bucket_name in a variables and re

相关标签:
7条回答
  • 2020-12-13 23:42

    Since it's just a normal URL, you can use urlparse to get all the parts of the URL.

    >>> from urlparse import urlparse
    >>> o = urlparse('s3://bucket_name/folder1/folder2/file1.json', allow_fragments=False)
    >>> o
    ParseResult(scheme='s3', netloc='bucket_name', path='/folder1/folder2/file1.json', params='', query='', fragment='')
    >>> o.netloc
    'bucket_name'
    >>> o.path
    '/folder1/folder2/file1.json'
    

    You may have to remove the beginning slash from the key as the next answer suggests.

    o.path.lstrip('/')
    

    With Python 3 urlparse moved to urllib.parse so use:

    from urllib.parse import urlparse
    

    Here's a class that takes care of all the details.

    try:
        from urlparse import urlparse
    except ImportError:
        from urllib.parse import urlparse
    
    
    class S3Url(object):
        """
        >>> s = S3Url("s3://bucket/hello/world")
        >>> s.bucket
        'bucket'
        >>> s.key
        'hello/world'
        >>> s.url
        's3://bucket/hello/world'
    
        >>> s = S3Url("s3://bucket/hello/world?qwe1=3#ffffd")
        >>> s.bucket
        'bucket'
        >>> s.key
        'hello/world?qwe1=3#ffffd'
        >>> s.url
        's3://bucket/hello/world?qwe1=3#ffffd'
    
        >>> s = S3Url("s3://bucket/hello/world#foo?bar=2")
        >>> s.key
        'hello/world#foo?bar=2'
        >>> s.url
        's3://bucket/hello/world#foo?bar=2'
        """
    
        def __init__(self, url):
            self._parsed = urlparse(url, allow_fragments=False)
    
        @property
        def bucket(self):
            return self._parsed.netloc
    
        @property
        def key(self):
            if self._parsed.query:
                return self._parsed.path.lstrip('/') + '?' + self._parsed.query
            else:
                return self._parsed.path.lstrip('/')
    
        @property
        def url(self):
            return self._parsed.geturl()
    
    0 讨论(0)
  • 2020-12-13 23:42

    Here it is as a one-liner using regex:

    import re
    
    s3_path = "s3://bucket/path/to/key"
    
    bucket, key = re.match(r"s3:\/\/(.+?)\/(.+)", s3_path).groups()
    
    0 讨论(0)
  • 2020-12-13 23:44

    Pretty easy to accomplish with a single line of builtin string methods...

    s3_filepath = "s3://bucket-name/and/some/key.txt"
    bucket, key = s3_filepath.replace("s3://", "").split("/", 1)
    
    0 讨论(0)
  • 2020-12-13 23:48

    This is a nice project:

    s3path is a pathlib extention for aws s3 service

    >>> from s3path import S3Path
    >>> path = S3Path.from_uri('s3://bucket_name/folder1/folder2/file1.json')
    >>> print(path.bucket)
    '/bucket_name'
    >>> print(path.key)
    'folder1/folder2/file1.json'
    >>> print(list(path.key.parents))
    [S3Path('folder1/folder2'), S3Path('folder1'), S3Path('.')]
    
    0 讨论(0)
  • 2020-12-13 23:50

    A solution that works without urllib or re (also handles preceding slash):

    def split_s3_path(s3_path):
        path_parts=s3_path.replace("s3://","").split("/")
        bucket=path_parts.pop(0)
        key="/".join(path_parts)
        return bucket, key
    

    To run:

    bucket, key = split_s3_path("s3://my-bucket/some_folder/another_folder/my_file.txt")
    

    Returns:

    bucket: my-bucket
    key: some_folder/another_folder/my_file.txt
    
    0 讨论(0)
  • 2020-12-13 23:53

    For those who like me was trying to use urlparse to extract key and bucket in order to create object with boto3. There's one important detail: remove slash from the beginning of the key

    from urlparse import urlparse
    o = urlparse('s3://bucket_name/folder1/folder2/file1.json')
    bucket = o.netloc
    key = o.path
    boto3.client('s3')
    client.put_object(Body='test', Bucket=bucket, Key=key.lstrip('/'))
    

    It took a while to realize that because boto3 doesn't throw any exception.

    0 讨论(0)
提交回复
热议问题