twitter API rate limit | 易学教程

问题

I want to collect data from twitter over a period of several weeks.

To do so, I use RStudio Server and crontab to automatically run several scripts like the following:

require(ROAuth)
require(twitteR)
require(plyr)

load("twitter_authentication.Rdata")
registerTwitterOAuth(cred)

searchResults <- searchTwitter("#hashtag", n=15000, since = as.character(Sys.Date()-1), until = as.character(Sys.Date()))
head(searchResults)

tweetsDf = ldply(searchResults, function(t) t$toDataFrame())
write.csv(tweetsDf, file = paste("tweets_test_", Sys.Date() - 1, ".csv", sep = ""))

On some days, I will only have a few tweets (up to 100) per hashtag and so the script runs smoothly. However, on other days there will be thousands of tweets for a certain hashtag (of course I am not using the term "hashtag" but the term I need for my study).

I can add retryOnRateLimit=10 to serchTwitter. But when I search for multiple hashtags every day, how should I time these queries in crontab?

In order to organize these queries, I need to know how many tweets I am able to collect by running the script once within the 15 minute time interval! Does anybody know the answer? (of course, according to the Twitter API rate limits, I can do

180 queries per 15 minute window

but how many tweets is this?)

回答1:

Rather than performing a search every few minutes, you should use the Streaming API

This will deliver you a real-time feed of all the data flowing through Twitter. You can set a filter for your search term.

There's no "rate limit" as such - you just make a single persistent connection and Twitter deliver a sample of all the tweets matching your search term.

来源：https://stackoverflow.com/questions/28151307/twitter-api-rate-limit

标签

twitter

crontab

twitter-oauth