问题
I want to collect data from twitter over a period of several weeks.
To do so, I use RStudio Server and crontab to automatically run several scripts like the following:
require(ROAuth)
require(twitteR)
require(plyr)
load("twitter_authentication.Rdata")
registerTwitterOAuth(cred)
searchResults <- searchTwitter("#hashtag", n=15000, since = as.character(Sys.Date()-1), until = as.character(Sys.Date()))
head(searchResults)
tweetsDf = ldply(searchResults, function(t) t$toDataFrame())
write.csv(tweetsDf, file = paste("tweets_test_", Sys.Date() - 1, ".csv", sep = ""))
On some days, I will only have a few tweets (up to 100) per hashtag and so the script runs smoothly. However, on other days there will be thousands of tweets for a certain hashtag (of course I am not using the term "hashtag" but the term I need for my study).
I can add retryOnRateLimit=10
to serchTwitter
. But when I search for multiple hashtags every day, how should I time these queries in crontab?
In order to organize these queries, I need to know how many tweets I am able to collect by running the script once within the 15 minute time interval! Does anybody know the answer? (of course, according to the Twitter API rate limits, I can do
180 queries per 15 minute window
but how many tweets is this?)
回答1:
Rather than performing a search every few minutes, you should use the Streaming API
This will deliver you a real-time feed of all the data flowing through Twitter. You can set a filter for your search term.
There's no "rate limit" as such - you just make a single persistent connection and Twitter deliver a sample of all the tweets matching your search term.
来源:https://stackoverflow.com/questions/28151307/twitter-api-rate-limit