Python Google Drive API - list the entire drive file tree

前端 未结 4 1691
走了就别回头了
走了就别回头了 2020-12-10 06:02

I\'m building a python application that uses the Google drive APIs, so fare the development is good but I have a problem to retrieve the entire Google drive file tree, I nee

相关标签:
4条回答
  • 2020-12-10 06:13

    Stop thinking about Drive as being a tree structure. It isn't. "Folders" are simply labels, eg. a file can have multiple parents.

    In order to build a representation of a tree in your app, you need to do this ...

    1. Run a Drive List query to retrieve all Folders
    2. Iterate the result array and examine the parents property to build an in-memory hierarchy
    3. Run a second Drive List query to get all non-folders (ie. files)
    4. For each file returned, place it in your in-memory tree

    If you simply want to check if file-A exists in folder-B, the approach depends on whether the name "folder-B" is guaranteed to be unique.

    If it's unique, just do a FilesList query for title='file-A', then do a Files Get for each of its parents and see if any of them are called 'folder-B'.

    If 'folder-B' can exist under both 'folder-C' and 'folder-D', then it's more complex and you'll need to build the in-memory hierarchy from steps 1 and 2 above.

    You don't say if these files and folders are being created by your app, or by the user with the Google Drive Webapp. If your app is the creator of these files/folders there is a trick you can use to restrict your searches to a single root. Say you have

    MyDrive/app_root/folder-C/folder-B/file-A
    

    you can make all of folder-C, folder-B and file-A children of app_root

    That way you can constrain all of your queries to include

    and 'app_root_id' in parents
    
    0 讨论(0)
  • 2020-12-10 06:17

    I agree with @pinoyyid - Google drive is not a typical tree structure.

    BUT, for printing the folder structure I would still consider using a tree visualization library (for example like treelib).

    Below is a full solution for printing your google drive file system recursively.

    from treelib import Node, Tree
    
    from pydrive.auth import GoogleAuth
    from pydrive.drive import GoogleDrive
    
    gauth = GoogleAuth()
    gauth.LocalWebserverAuth()
    drive = GoogleDrive(gauth)
    
    ### Helper functions ### 
    def get_children(root_folder_id):
        str = "\'" + root_folder_id + "\'" + " in parents and trashed=false"
        file_list = drive.ListFile({'q': str}).GetList()
        return file_list
    
    def get_folder_id(root_folder_id, root_folder_title):
        file_list = get_children(root_folder_id)
        for file in file_list:
            if(file['title'] == root_folder_title):
                return file['id']
    
    def add_children_to_tree(tree, file_list, parent_id):
        for file in file_list:
            tree.create_node(file['title'], file['id'], parent=parent_id)
            print('parent: %s, title: %s, id: %s' % (parent_id, file['title'], file['id']))
    
    ### Recursion over all children ### 
    def populate_tree_recursively(tree,parent_id):
        children = get_children(parent_id)
        add_children_to_tree(tree, children, parent_id)
        if(len(children) > 0):
            for child in children:
                populate_tree_recursively(tree, child['id'])
    
    
    ### Create tree and start populating from root ###
    def main():
        root_folder_title = "your-root-folder"
        root_folder_id = get_folder_id("root", root_folder_title)
    
        tree = Tree()
        tree.create_node(root_folder_title, root_folder_id)
        populate_tree_recursively(tree, root_folder_id)
        tree.show()
    
    if __name__ == "__main__":
        main()
    
    0 讨论(0)
  • 2020-12-10 06:19

    Will never work like that except for very small trees. You have to rethink your entire algorithm for a cloud app (you have written it like a desktop app where you own the machine) since it will timeout easily. You need to mirror the tree beforehand (taskqueues and datastore) not just to avoid timeouts but also to avoid drive rate limits, and keep it in sync somehow (register for push etc). Not easy at all. Ive done a drive tree viewer before.

    0 讨论(0)
  • 2020-12-10 06:27

    An easy way to check if a file exist in a specific path is: drive_service.files().list(q="'THE_ID_OF_SPECIFIC_PATH' in parents and title='a file'").execute()

    To walk all folders and files:

    import sys, os
    import socket
    
    import googleDriveAccess
    
    import logging
    logging.basicConfig()
    
    FOLDER_TYPE = 'application/vnd.google-apps.folder'
    
    def getlist(ds, q, **kwargs):
      result = None
      npt = ''
      while not npt is None:
        if npt != '': kwargs['pageToken'] = npt
        entries = ds.files().list(q=q, **kwargs).execute()
        if result is None: result = entries
        else: result['items'] += entries['items']
        npt = entries.get('nextPageToken')
      return result
    
    def uenc(u):
      if isinstance(u, unicode): return u.encode('utf-8')
      else: return u
    
    def walk(ds, folderId, folderName, outf, depth):
      spc = ' ' * depth
      outf.write('%s+%s\n%s  %s\n' % (spc, uenc(folderId), spc, uenc(folderName)))
      q = "'%s' in parents and mimeType='%s'" % (folderId, FOLDER_TYPE)
      entries = getlist(ds, q, **{'maxResults': 200})
      for folder in entries['items']:
        walk(ds, folder['id'], folder['title'], outf, depth + 1)
      q = "'%s' in parents and mimeType!='%s'" % (folderId, FOLDER_TYPE)
      entries = getlist(ds, q, **{'maxResults': 200})
      for f in entries['items']:
        outf.write('%s -%s\n%s   %s\n' % (spc, uenc(f['id']), spc, uenc(f['title'])))
    
    def main(basedir):
      da = googleDriveAccess.DAClient(basedir) # clientId=None, script=False
      f = open(os.path.join(basedir, 'hierarchy.txt'), 'wb')
      walk(da.drive_service, 'root', u'root', f, 0)
      f.close()
    
    if __name__ == '__main__':
      logging.getLogger().setLevel(getattr(logging, 'INFO'))
      try:
        main(os.path.dirname(__file__))
      except (socket.gaierror, ), e:
        sys.stderr.write('socket.gaierror')
    

    using googleDriveAccess github.com/HatsuneMiku/googleDriveAccess

    0 讨论(0)
提交回复
热议问题