Web scraping with R - no HTML visible

问题

I am trying to use R scrape a website:

http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/GO/90000609234

It has several fields with lots of information. I am only interested in the url above the field "site do candidato". In this example, the url I want is: "http://vanderlansenador111.com.br"

The problem is, there is no HTML (visible). So, I don't think using rvest is helpful (at least, I don't know how to use it). Is there a way to scrape it without using selenium (I never used Rselenium and had some problems trying to run it).

Points to any direction much appreciated.

回答1:

Don't waste your time with Selenium. Use the Developer Tools part of your browser to find the XHR request: http://divulgacandcontas.tse.jus.br/divulga/rest/v1/candidatura/buscar/2018/GO/2022802018/candidato/90000609234

and just use jsonlite::fromJSON():

str(jsonlite::fromJSON("http://divulgacandcontas.tse.jus.br/divulga/rest/v1/candidatura/buscar/2018/GO/2022802018/candidato/90000609234"))

The str() output is large & complete. You should be able to find what you need there.

回答2:

Selenium is a good choice for this, and alternative is you can use PhantomJS there is a good tutorial on the process over at datacamp (not as clean solution as Selenium)

https://www.datacamp.com/community/tutorials/scraping-javascript-generated-data-with-r

来源：https://stackoverflow.com/questions/52013411/web-scraping-with-r-no-html-visible

标签

web-scraping

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!