Scraping location data in rvest

落爺英雄遲暮 提交于 2020-01-24 04:20:26

问题


I'm currently trying to scrape latitude/longitude data from a list of urls I have using rvest. Each URL has an embedded google map with a specific location, but the urls themselves don't show the path that the API is taking.

When looking at the page source, I see that the part I'm after is here:

<script type="text/javascript" src="http://maps.google.com/maps/api/js?sensor=false">
</script>
<script type="text/javascript">
function initialize() {
var myLatlng = new google.maps.LatLng(43.805170,-70.722084);
var myOptions = {
  zoom: 16,
  center: myLatlng,
  mapTypeId: google.maps.MapTypeId.SATELLITE
}
var map = new google.maps.Map(document.getElementById("map_canvas"), myOptions);

var marker = new google.maps.Marker({
    position: myLatlng, 
    map: map,
    title:"F.E. Wood & Sons - Natural Energy"
});   

Now, if I can just get the line that has the LatLng(....) input, I can use some string parsing operations to derive the latitude and longitude values for all of the URLs.

I've written the following code to get my data:

require(rvest)
require(magrittr)
fetchLatLong<-function(url){
  url<-as.character(url)
  solNum<-html(url)%>%
    html_nodes("#map_canvas")%>%
    html_attr("script")
}

(the "map_canvas" selector was found using the selectorGadget; you can view the entire source here).

I'm having the worst time getting this to read what I'm after. I've tried many nodes and combinations of nodes, to no avail. I've played around with phantom.js, but the problem is that it's not js-rendered html content I'm after: I'm looking for the API query input, which is written into the page code (or, at least, to my amateur eye appears to be).

Does anyone have any advice?


回答1:


This seems to work:

library(rvest)
library(magrittr)
library(stringr)

pg <- html("http://biomassmagazine.com/plants/view/2285")

pg %>% 
  html_nodes("div.pad20 > script") %>% 
  extract2(2) %>% 
  html_text %>% 
  str_match_all("LatLng\\(([[:digit:]\\.\\-]+),([[:digit:]\\.\\-]+)") %>% 
  extract2(1) %>% 
  extract(2:3) -> lat_lng

lat_lng

## [1] "43.805170"  "-70.722084"


来源:https://stackoverflow.com/questions/30721124/scraping-location-data-in-rvest

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!