R Find all replies to a user's tweets from their follower list

半世苍凉 提交于 2019-12-06 14:30:18

The biggest obstacle would be the time it takes to collect up to 3,200 of the most recent tweets posted by more than 42 million followers of @realDonaldTrump.

> djt <- lookup_users(targettwittername)
> djt[, c("screen_name", "followers_count", "friends_count", "statuses_count")]
# A tibble: 1 x 4
      screen_name followers_count friends_count statuses_count
            <chr>           <int>         <int>          <int>
1 realDonaldTrump        42793758            45          36398

Twitter limits the number of follower user IDs collected to 75,000 every 15 minutes.

flw <- get_followers("realDonaldTrump", n = 75000)
> flw
# A tibble: 75,000 x 1
              user_id
                <chr>
 1          928808378
 2 926186565231136768
 3 931237514253426688
 4 930584682701475842
 5 902580952165216256
 6 931236663950372864
 7 931237367024820224
 8 922140807024578560
 9 931235142047211520
10 931235653412708352
# ... with 74,990 more rows

Assuming you have a reliable internet connection and time, then you can use the following code to get all 42 million follower IDs.

flw <- get_followers(
  "realDonaldTrump", n = 42793758, retryonratelimit = TRUE
)

Then you'd probably want to construct a for loop that uses get_timeline() and handles API rate limits. In the example code below, I've made the loop sleep until the rate limit reset after every 56 calls.

flw_tml <- vector("list", length(flw$user_id))
for (i in seq_along(flw$user_id)) {
  flw_tml[[i]] <- get_timeline(
    flw$user_id[i], n = 3200
  )
  if (i %% 56 == 0L) {
    rl <- rate_limit("get_timeline")
    Sys.sleep(as.numeric(rl$reset, "secs"))
  }
  cat(i, " ")
}

As you can see, this would take a really long time. You'd be better off trying to collect all the replies in the past 6-9 days. The code below gets up to 5 million replies to Trump's tweets from the past 9 days. Warning: if there are actually that many replies (I honestly have no idea) available from the past 9 days, this search would take just under three days to finish.

at_rdt <- search_tweets(
  "to:realdonaldtrump", 
  n = 5e6,
   retryonratelimit = TRUE
)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!