Download a spreadsheet from Google Docs using Python

懵懂的女人 提交于 2019-11-27 06:07:50

In case anyone comes across this looking for a quick fix, here's another (currently) working solution that doesn't rely on the gdata client library:

#!/usr/bin/python

import re, urllib, urllib2

class Spreadsheet(object):
    def __init__(self, key):
        super(Spreadsheet, self).__init__()
        self.key = key

class Client(object):
    def __init__(self, email, password):
        super(Client, self).__init__()
        self.email = email
        self.password = password

    def _get_auth_token(self, email, password, source, service):
        url = "https://www.google.com/accounts/ClientLogin"
        params = {
            "Email": email, "Passwd": password,
            "service": service,
            "accountType": "HOSTED_OR_GOOGLE",
            "source": source
        }
        req = urllib2.Request(url, urllib.urlencode(params))
        return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]

    def get_auth_token(self):
        source = type(self).__name__
        return self._get_auth_token(self.email, self.password, source, service="wise")

    def download(self, spreadsheet, gid=0, format="csv"):
        url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i"
        headers = {
            "Authorization": "GoogleLogin auth=" + self.get_auth_token(),
            "GData-Version": "3.0"
        }
        req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers)
        return urllib2.urlopen(req)

if __name__ == "__main__":
    import getpass
    import csv

    email = "" # (your email here)
    password = getpass.getpass()
    spreadsheet_id = "" # (spreadsheet id here)

    # Create client and spreadsheet objects
    gs = Client(email, password)
    ss = Spreadsheet(spreadsheet_id)

    # Request a file-like object containing the spreadsheet's contents
    csv_file = gs.download(ss)

    # Parse as CSV and print the rows
    for row in csv.reader(csv_file):
        print ", ".join(row)

The https://github.com/burnash/gspread library is a newer, simpler way to interact with Google Spreadsheets, rather than the old answers to this that suggest the gdata library which is not only too low-level, but is also overly-complicated.

You will also need to create and download (in JSON format) a Service Account key: https://console.developers.google.com/apis/credentials/serviceaccountkey

Here's an example of how to use it:

import csv
import gspread
from oauth2client.service_account import ServiceAccountCredentials

scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('credentials.json', scope)

docid = "0zjVQXjJixf-SdGpLKnJtcmQhNjVUTk1hNTRpc0x5b9c"

client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(docid)
for i, worksheet in enumerate(spreadsheet.worksheets()):
    filename = docid + '-worksheet' + str(i) + '.csv'
    with open(filename, 'wb') as f:
        writer = csv.writer(f)
        writer.writerows(worksheet.get_all_values())

You might try using the AuthSub method described in the Exporting Spreadsheets section of the documentation.

Get a separate login token for the spreadsheets service and substitue that for the export. Adding this to the get_spreadsheet code worked for me:

import gdata.spreadsheet.service

def get_spreadsheet(key, gid=0):
    # ...
    spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
    spreadsheets_client.email = gd_client.email
    spreadsheets_client.password = gd_client.password
    spreadsheets_client.source = "My Fancy Spreadsheet Downloader"
    spreadsheets_client.ProgrammaticLogin()

    # ...
    entry = gd_client.GetDocumentListEntry(uri)
    docs_auth_token = gd_client.GetClientLoginToken()
    gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
    gd_client.Export(entry, file_path)
    gd_client.SetClientLoginToken(docs_auth_token) # reset the DocList auth token

Notice I also used Export, as Download seems to give only PDF files.

This no longer works as of gdata 2.0.1.4:

gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())

Instead, you have to do:

gd_client.SetClientLoginToken(gdata.gauth.ClientLoginToken(spreadsheets_client.GetClientLoginToken()))

(Jul 2016) Rephrasing with current terminology: "How do I download a Google Sheet in CSV format from Google Drive using Python?". (Google Docs now only refers to the cloud-based word processor/text editor which doesn't provide access to Google Sheets spreadsheets.)

First, all other answers are pretty much outdated or will be, either because they use the old GData ("Google Data") Protocol, ClientLogin, or AuthSub, all of which have been deprecated. The same is true for all code or libraries that use the Google Sheets API v3 or older.

Modern Google API access occurs using API keys (public data) or OAuth2 authorization (authorized data), primarily with the Google APIs Client Libraries, including the one for Python. (And no, you don't have to build an entire auth system just to access the APIs... see the blogpost below.)

To perform the task requested in/by the OP, you need authorzed access to the Google Drive API, perhaps to query for specific Sheets to download, and then to perform the actual export(s). Since this is likely a common operation, I wrote a blogpost sharing a code snippet that does this for you. If you wish to pursue this even more, I've got another pair of posts along with a video that outlines how to upload files to and download files from Google Drive.

Note that there is also a newer Google Sheets API v4, but it's primarily for spreadsheet-oriented operations, i.e., inserting data, reading spreadsheet rows, cell formatting, creating charts, adding pivot tables, etc., not file-based request like exporting where the Drive API is the correct one to use.

To see an example of exporting a Google Sheet as CSV from Drive, check out this blog post I wrote; to learn more about using Google Sheets with Python, see this answer I wrote for a similar question.

If you're completely new to Google APIs, then you need to take a further step back and review these videos first:

The following code works in my case (Ubuntu 10.4, python 2.6.5 gdata 2.0.14)

import gdata.docs.service
import gdata.spreadsheet.service
gd_client = gdata.docs.service.DocsService()
gd_client.ClientLogin(email,password)
spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.ClientLogin(email,password)
#...
file_path = file_path.strip()+".xls"
docs_token = gd_client.auth_token
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
gd_client.Export(entry, file_path)  
gd_client.auth_token = docs_token

I've simplified @Cameron's answer even further, by removing the unnecessary object orientation. This makes the code smaller and easier to understand. I also edited the url, which might work better.

#!/usr/bin/python
import re, urllib, urllib2

def get_auth_token(email, password):
    url = "https://www.google.com/accounts/ClientLogin"
    params = {
        "Email": email, "Passwd": password,
        "service": 'wise',
        "accountType": "HOSTED_OR_GOOGLE",
        "source": 'Client'
    }
    req = urllib2.Request(url, urllib.urlencode(params))
    return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]

def download(spreadsheet, worksheet, email, password, format="csv"):
    url_format = 'https://docs.google.com/spreadsheets/d/%s/export?exportFormat=%s#gid=%s'

    headers = {
        "Authorization": "GoogleLogin auth=" + get_auth_token(email, password),
        "GData-Version": "3.0"
    }
    req = urllib2.Request(url_format % (spreadsheet, format, worksheet), headers=headers)
    return urllib2.urlopen(req)


if __name__ == "__main__":
    import getpass
    import csv

    spreadsheet_id = ""             # (spreadsheet id here)
    worksheet_id = ''               # (gid here)
    email = ""                      # (your email here)
    password = getpass.getpass()

    # Request a file-like object containing the spreadsheet's contents
    csv_file = download(spreadsheet_id, worksheet_id, email, password)

    # Parse as CSV and print the rows
    for row in csv.reader(csv_file):
        print ", ".join(row)

This isn't a complete answer, but Andreas Kahler wrote up an interesting CMS solution using Google Docs + Google App Engline + Python. Not having any experience in the area, I cannot see exactly what portion of the code may be of use to you, but check it out. I know it interfaces with a Google Docs account and plays with files, so I have a feeling you'll recognize what's going on. It should at least point you in the right direction.

Google AppEngine + Google Docs + Some Python = Simple CMS

Gspread is indeed a big improvement over GoogleCL and Gdata (both of which I've used and thankfully phased out in favor of Gspread). I think that this code is even quicker than the earlier answer to get the contents of the sheet:

username = 'sdfsdfsds@gmail.com'
password = 'sdfsdfsadfsdw'
sheetname = "Sheety Sheet"

client = gspread.login(username, password)
spreadsheet = client.open(sheetname)

worksheet = spreadsheet.sheet1
contents = []
for rows in worksheet.get_all_values():
    contents.append(rows)

(Dec 16) Try another library i wrote : pygsheets. Its similar to gspread, but uses google api v4. It has an export method to export spreadsheet.

import pygsheets

gc = pygsheets.authorize()

# Open spreadsheet and then workseet
sh = gc.open('my new ssheet')
wks = sh.sheet1

#export as csv
wks.export(pygsheets.ExportType.CSV)

(Mar 2019, Python 3) My data is usually not sensitive and I use usually table format similar to CSV.

In such case, one can simply publish to the web the sheet and than use it as a CSV file on a server.

(One publishes it using File -> Publish to the web ... -> Sheet 1 -> Comma separated values (.csv) -> Publish).

import csv
import io
import requests

url = "https://docs.google.com/spreadsheets/d/e/<GOOGLE_ID>/pub?gid=0&single=true&output=csv"  # you can get the whole link in the 'Publish to the web' dialog
r = requests.get(url)
r.encoding = 'utf-8'
csvio = io.StringIO(r.text, newline="")
data = []
for row in csv.DictReader(csvio):
    data.append(row)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!