How does paging work in the list_blobs function in Google Cloud Storage Python Client Library

前端 未结 3 685
说谎 2021-02-20 02:35

I want to get a list of all the blobs in a Google Cloud Storage bucket using the Client Library for Python.

According to the documentation I should use the list_bl

  •  时光取名叫无心
    2021-02-20 02:46

    I'm just going to leave this here. I'm not sure if the libraries have changes in the last 2 years since this answer was posted, but if you're using prefix, then for blob in bucket.list_blobs() doesn't work right. It seems like getting blobs and getting prefixes are fundamentally different. And using pages with prefixes is confusing.

    I found a post in a github issue (here). This works for me.

    def list_gcs_directories(bucket, prefix):
        # from
        iterator = bucket.list_blobs(prefix=prefix, delimiter='/')
        prefixes = set()
        for page in iterator.pages:
            print page, page.prefixes
        return prefixes

    A different comment on the same issue suggested this:

    def get_prefixes(bucket):
        iterator = bucket.list_blobs(delimiter="/")
        response = iterator._get_next_page_response()
        return response['prefixes']

    Which only gives you the prefixes if all of your results fit on a single page.
