问题
I am trying to use R scrape a website:
http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/GO/90000609234
It has several fields with lots of information. I am only interested in the url above the field "site do candidato". In this example, the url I want is: "http://vanderlansenador111.com.br"
The problem is, there is no HTML (visible). So, I don't think using rvest is helpful (at least, I don't know how to use it). Is there a way to scrape it without using selenium (I never used Rselenium and had some problems trying to run it).
Points to any direction much appreciated.
回答1:
Don't waste your time with Selenium. Use the Developer Tools part of your browser to find the XHR request: http://divulgacandcontas.tse.jus.br/divulga/rest/v1/candidatura/buscar/2018/GO/2022802018/candidato/90000609234
and just use jsonlite::fromJSON()
:
str(jsonlite::fromJSON("http://divulgacandcontas.tse.jus.br/divulga/rest/v1/candidatura/buscar/2018/GO/2022802018/candidato/90000609234"))
The str()
output is large & complete. You should be able to find what you need there.
回答2:
Selenium is a good choice for this, and alternative is you can use PhantomJS there is a good tutorial on the process over at datacamp (not as clean solution as Selenium)
https://www.datacamp.com/community/tutorials/scraping-javascript-generated-data-with-r
来源:https://stackoverflow.com/questions/52013411/web-scraping-with-r-no-html-visible