Navigating and scraping with R (rvest)

旧时模样 提交于 2021-02-20 19:09:10


I am trying to log in in stackoverflow and navigating on the search bar, searching by tidyverse package.

The main problem is when I set the url, which is not giving me the form to fill with my email and my password:

So url<-"" doesnt work. I tried the url: url<-"" which is the url that I have when I click on the the Log in bottom, but I also can't find the form to fill with my email and password when using html_form. This is my code:

    (session <- html_session(url))
    (form <- html_form(read_html(url))[[1]])
    (filled_form <- set_values(form,email="",pass="mypassword"))

And after this, I would like to do a search by the term: [tidyverse] and start scraping it.

I think this second part I will be able to manage if I solve the problem of the code above if I fix the login/password/form problem.

Any help guys


You could directly set the search term in the URL, without need to log into stackoverflow :


getStackQuestions <- function(search) {
  stackoverflow <- read_html(paste0('',search,'?tab=Newest'))
  questions <- stackoverflow %>% html_nodes(".question-hyperlink:not(.mb0)")
  question.href <- questions %>% html_attr('href')
  question.text <- questions %>% html_text()
  questions <- data.frame( text = question.text, href = paste0("",question.href))

tidyverse_questions <- getStackQuestions('tidyverse')

[1] "Python/Pandas equivalent of across and weighted average"                                                                                        
[2] "Transforming columns based off separate dataframe - R solution"                                                                                 
[3] "Group by summarize in between dates with dplyr"                                                                                                 
[4] "Transpose complex data.frame with tidyR"                                                                                                        
[5] "Create 1 composite variable derived from different combinations of values of 2nd variable that are separated by specific levels of 3rd variable"
[6] "extracting a cv.glmnet object from Tune_results"  

