Web-Scraping with Login and Redirect using R and rvest/httr

江枫思渺然 提交于 2020-02-23 05:44:11

问题


I would like to scrape information from a webpage. There is a login screen, and when I am logged in, I can access all kinds off pages from which I would like to scrape information (such as the last name of a player, the object .lastName). I am using R and the packages rvest and httr.

Somehow, the login seems to work, but I am clueless how to be redirected to the page I need to get the info from.

The login form can be accessed on http://kickbase.sky.de/anmelden and the relevant pages have the form http://kickbase.sky.de/spielerprofil/player-name/number, e.g. http://kickbase.sky.de/spielerprofil/nadiem-amiri/1639#.

Here is the code I used. Thank you very much for your help.

install.packages("rvest")
install.packages("httr")
library(rvest)
library(httr)

handle <- handle("http://kickbase.sky.de")  # Create handle
path   <- "anmelden" #  Login Path

# fields found in the login form.
login <- list(
  email = "testscrape@gmail.com"
  ,password  = "tester"
  ,redirect_url =  # I want to be redirected to this page and then scrape info from here
    "http://kickbase.sky.de/spielerprofil/nadiem-amiri/1639#"
)

response <- POST(handle = handle, path = path, body = login)

webpage <- read_html(response)
name_data <- html_text(html_nodes(webpage, ".lastName"))
name_data

回答1:


library(rvest)
url<-"https://kickbase.sky.de/"
page<-html_session(url)
page<-rvest:::request_POST(page,url="https://kickbase.sky.de/api/v1/user/login",
                           body=list("email"="testscrape@gmail.com",
                                     "password"="tester",
                                     "redirect_url"="http://kickbase.sky.de/spielerprofil/nadiem-amiri/1639#"),
                         encode='json'
                           )
player_page<-jump_to(page,"https://kickbase.sky.de/api/v1/news?skip=0&player=1639&limit=3")
data<-jsonlite::fromJSON(readBin(player_page$response$content,what="json"))

print(data)

Please note that the website provides an API and that is where you get the data https://kickbase.sky.de/api/v1/news?skip=0&player=1639&limit=3

variable data has all the information needed



来源:https://stackoverflow.com/questions/53835737/web-scraping-with-login-and-redirect-using-r-and-rvest-httr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!