问题
For example, I want to extract the price(top-right) and The space(Accommodates: 2,Bathrooms: 1 etc) https://www.airbnb.com/rooms/12949270?guests=1&s=_JaPbz-J
Here is my code for price:
remDr$navigate(url)
doc <- htmlParse(remDr$getPageSource()[[1]])
var <- remDr$findElement('id','details')
varxml <- htmlTreeParse(vartxt, useInternalNodes=T)
Price <- xpathApply(varxml,"//div[@class='book-it__price-amount h3 text-special pull-left']",xmlValue)
But it returns me empty list. Maybe it hapepend, beacause the class "'book-it__price-amount h3 text-special pull-left' is not the upper class? If so - how to correct that? If not, where did I make a mistake?
回答1:
For me the code below works. About forbidden scraper on the web. In general if it's not allowed to use scraper you take risk if you use data for commercial purposes or you on regular bases send get requests. So depends how you are gonna use it
library(RCurl)
library(XML)
url<-getURL("https://www.airbnb.cz/rooms/12949270?guests=1&s=_JaPbz-J",ssl.verifypeer = F)
url2<-htmlParse(url)
Price <- xpathSApply(url2,"//div[@class='book-it__price-amount h3 text-special pull-left']",xmlValue)
conditions <- xpathSApply(url2,"//div[@class='col-md-6']",xmlValue)
来源:https://stackoverflow.com/questions/37675626/web-scraping-with-xpathsapply-getting-xmlvalue