问题
Does anyone know what's the ratio between the number of tweets we get from twitter sample API over the total number of tweets which the Twitter server receives? I am doing some analysis based on the data read from the sample API, and would like to estimate the actual workload handled by Twitter server. I observed that the number of tweets we get from the API varies over time. So, I presume it is something like percentage sample. Any clue is highly appreciated.
Thanks
回答1:
When Twitter Spritzer (basically the old-fashioned Streaming API) was launched, it was supposedly about 1-2% of all tweets. Based on my use of the current Streaming API, I'd be surprised if it was any more than 1% right now, and possibly less. According to the docs, the "Twitter streaming volume is not constant," but they neglect to mention if the volume outputted by the API is proportional to the rate of actual tweets.
回答2:
The sample stream /statuses/sample
does return roughly 1% of all tweets. Twitter samples the tweets by delivering only tweets created within a 10-millisecond window out of the 1,000 milliseconds in every second. If you want more details, you can read my blog post: http://blog.falcondai.com/2013/06/666-and-how-twitter-samples-tweets-in.html
回答3:
On 2 February 2015 Twitter announced intent to reset the streaming API sample rate to 1% (it had crept higher unintentionally):
The public Streaming API sample endpoints (aka POST statuses/filter and GET statuses/sample) are intended to be levelled at approximately 1% of the public Tweet volumes at any time.
Due to some past inconsistencies in configuration, there have been periods of time where the volumes of Tweets delivered via the Streaming API may have exceeded these parameters.
This notice is to indicate that over the next couple of weeks, we will be making changes to the public Streaming API to rebalance the volume of Tweets at the 1% capacity that was intended.
This plot shows the effect of the reset on a typical tweet stream.
回答4:
This is something I found at https://brightplanet.com/2013/06/25/twitter-firehose-vs-twitter-api-whats-the-difference-and-why-should-you-care/. I hope you find this useful.
Studies have estimated that using Twitter’s Streaming API users can expect to receive anywhere from 1% of the tweets to over 40% of tweets in near real-time.
There are references to the studies they have cited at the bottom of the webpage.
来源:https://stackoverflow.com/questions/13055370/how-many-percent-of-the-tweets-does-twitter-sample-api-give