Download/Export Public Google Spreadsheet as TSV from Command Line?

孤街醉人 提交于 2020-07-17 07:18:25

问题


I have a public (published) Google spreadsheet that I’m trying to download programmatically in TSV form.

In my browser, with a Google login active, for some actual key $key, https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=$key&exportFormat=tsv works and produces a TSV file.

In my shell, however:

  • curl -L "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=$key&exportFormat=tsv" produces a bunch of javascript.
  • curl -L "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=$key&exportFormat=csv" also produces a bunch of javascript.
  • curl -L "https://docs.google.com/spreadsheet/pub?key=$key&single=true&gid=0&output=csv" works and produces a CSV file.
  • curl -L "https://docs.google.com/spreadsheet/pub?key=$key&single=true&gid=0&output=tsv" produces an error message.

(Attempts to use wget produced similar results.)

How do I make this work? All the Google documentation I’ve been able to find so far is geared towards much more complicated problems than a simple download and format change, and if the solution to my problem is in there somewhere, I haven’t been able to find it yet.


回答1:


I found this to be frustratingly undocumented. I'm sure it's documented somewhere... but I never found it.

The premise is that your Google Sheet is published publicly. This is not intuitive for many folks. (Choose File -> Publish to Web...)

When you publish a sheet, you are given a url like this to copy: https://docs.google.com/spreadsheets/d/1XsfK2TN418FuEstNGG2eI9FmEV-4eY-FnndigHWIhk4/pubhtml

That url is nicely browsable... but it's not the downloadable CSV I wanted. Through a lengthy combination of search and trial-and-error I came up with this:

curl "https://docs.google.com/spreadsheets/d/1XsfK2TN418FuEstNGG2eI9FmEV-4eY-FnndigHWIhk4/export?gid=0&format=csv"

I find it to be tremendously helpful. I hope somebody comments with a link to the official docs explaining this in more detail.




回答2:


I can download through the shell in this way:

  1. File => Publish to Web
  2. Choose a Sheet and the format do you want to download.
  3. Click on Publish
  4. Copy the link
  5. and then use it:

    wget -O ./filename.csv "LINK"
    

    or

    curl -L "LINK" > ./filename.csv
    

in my case it worked as expected.

Plus I think that it publish all the formats so you can choose what to download changing the last part of the URL without un-publish and re-publish it:

output=tsv
output=csv



回答3:


To add to the answer written by @mdahlman: there is a gid=<value> argument that lets you chose the sheet to view (as CSV and TSV support the viewing of just one sheet). This is a sheet ID and you can pick it up from the URL of each sheet.

So, to get a CSV/TSV publish link, do this:

  1. Publish the document to get a URL like https://docs.google.com/spreadsheets/d/e/{key}/pub?output=tsv.

  2. Then for each spreadsheet:

    1. Click on it.

    2. View its URL in your browser's address bar. It'll end with edit#gid={gid}. That's what you want.

    3. Make your URL from the one in step 1. and gid in 2.2.: https://docs.google.com/spreadsheets/d/e/{key}/pub?output=tsv&gid={gid}.

GIDs don't go in sequence (0, 1, 2,...). They are long numbers (9 digits for me), seemingly in no straight order or anything, so they're really more like sheet keys than what one would expect as an "id".

In my document, one of the GIDs was zero. I am assuming it's some sort of a default or a first created sheet. That explains why gid=0 worked for some people above, yet produced an error for others (those who don't have a sheet with such GID... they have possibly deleted it or something).




回答4:


My answer is about how to find the answer.

In Chrome browser, navigate to you google document.

In the upper right corner of the browser, go to the three dots->more tools-> developer tools

This will bring up the html... debugger.

At the top of the debugger window, select network.

Now in your document, initiate the download as that you're trying to automate.

In the debugger, it'll show you any web requests that are made. The first new one is probably what you want.

You should be able to right click->copy-> copy link address

The url includes an ID. I don't know what it's for, but curl was able to download the doc without.

Hope it's helpful.




回答5:


Private files require OAuth authorization credentials to be downloaded. You can read more about the process on the Google Drive API's Download Files guide.



来源:https://stackoverflow.com/questions/24255472/download-export-public-google-spreadsheet-as-tsv-from-command-line

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!