I\'m building a python application that uses the Google drive APIs, so fare the development is good but I have a problem to retrieve the entire Google drive file tree, I nee
Stop thinking about Drive as being a tree structure. It isn't. "Folders" are simply labels, eg. a file can have multiple parents.
In order to build a representation of a tree in your app, you need to do this ...
If you simply want to check if file-A exists in folder-B, the approach depends on whether the name "folder-B" is guaranteed to be unique.
If it's unique, just do a FilesList query for title='file-A', then do a Files Get for each of its parents and see if any of them are called 'folder-B'.
If 'folder-B' can exist under both 'folder-C' and 'folder-D', then it's more complex and you'll need to build the in-memory hierarchy from steps 1 and 2 above.
You don't say if these files and folders are being created by your app, or by the user with the Google Drive Webapp. If your app is the creator of these files/folders there is a trick you can use to restrict your searches to a single root. Say you have
MyDrive/app_root/folder-C/folder-B/file-A
you can make all of folder-C, folder-B and file-A children of app_root
That way you can constrain all of your queries to include
and 'app_root_id' in parents
I agree with @pinoyyid - Google drive is not a typical tree structure.
BUT, for printing the folder structure I would still consider using a tree visualization library (for example like treelib).
Below is a full solution for printing your google drive file system recursively.
from treelib import Node, Tree
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
### Helper functions ###
def get_children(root_folder_id):
str = "\'" + root_folder_id + "\'" + " in parents and trashed=false"
file_list = drive.ListFile({'q': str}).GetList()
return file_list
def get_folder_id(root_folder_id, root_folder_title):
file_list = get_children(root_folder_id)
for file in file_list:
if(file['title'] == root_folder_title):
return file['id']
def add_children_to_tree(tree, file_list, parent_id):
for file in file_list:
tree.create_node(file['title'], file['id'], parent=parent_id)
print('parent: %s, title: %s, id: %s' % (parent_id, file['title'], file['id']))
### Recursion over all children ###
def populate_tree_recursively(tree,parent_id):
children = get_children(parent_id)
add_children_to_tree(tree, children, parent_id)
if(len(children) > 0):
for child in children:
populate_tree_recursively(tree, child['id'])
### Create tree and start populating from root ###
def main():
root_folder_title = "your-root-folder"
root_folder_id = get_folder_id("root", root_folder_title)
tree = Tree()
tree.create_node(root_folder_title, root_folder_id)
populate_tree_recursively(tree, root_folder_id)
tree.show()
if __name__ == "__main__":
main()
Will never work like that except for very small trees. You have to rethink your entire algorithm for a cloud app (you have written it like a desktop app where you own the machine) since it will timeout easily. You need to mirror the tree beforehand (taskqueues and datastore) not just to avoid timeouts but also to avoid drive rate limits, and keep it in sync somehow (register for push etc). Not easy at all. Ive done a drive tree viewer before.
An easy way to check if a file exist in a specific path is: drive_service.files().list(q="'THE_ID_OF_SPECIFIC_PATH' in parents and title='a file'").execute()
To walk all folders and files:
import sys, os
import socket
import googleDriveAccess
import logging
logging.basicConfig()
FOLDER_TYPE = 'application/vnd.google-apps.folder'
def getlist(ds, q, **kwargs):
result = None
npt = ''
while not npt is None:
if npt != '': kwargs['pageToken'] = npt
entries = ds.files().list(q=q, **kwargs).execute()
if result is None: result = entries
else: result['items'] += entries['items']
npt = entries.get('nextPageToken')
return result
def uenc(u):
if isinstance(u, unicode): return u.encode('utf-8')
else: return u
def walk(ds, folderId, folderName, outf, depth):
spc = ' ' * depth
outf.write('%s+%s\n%s %s\n' % (spc, uenc(folderId), spc, uenc(folderName)))
q = "'%s' in parents and mimeType='%s'" % (folderId, FOLDER_TYPE)
entries = getlist(ds, q, **{'maxResults': 200})
for folder in entries['items']:
walk(ds, folder['id'], folder['title'], outf, depth + 1)
q = "'%s' in parents and mimeType!='%s'" % (folderId, FOLDER_TYPE)
entries = getlist(ds, q, **{'maxResults': 200})
for f in entries['items']:
outf.write('%s -%s\n%s %s\n' % (spc, uenc(f['id']), spc, uenc(f['title'])))
def main(basedir):
da = googleDriveAccess.DAClient(basedir) # clientId=None, script=False
f = open(os.path.join(basedir, 'hierarchy.txt'), 'wb')
walk(da.drive_service, 'root', u'root', f, 0)
f.close()
if __name__ == '__main__':
logging.getLogger().setLevel(getattr(logging, 'INFO'))
try:
main(os.path.dirname(__file__))
except (socket.gaierror, ), e:
sys.stderr.write('socket.gaierror')
using googleDriveAccess github.com/HatsuneMiku/googleDriveAccess