Web-scraping with xpathSApply. Getting xmlValue

问题

For example, I want to extract the price(top-right) and The space(Accommodates: 2,Bathrooms: 1 etc) https://www.airbnb.com/rooms/12949270?guests=1&s=_JaPbz-J

Here is my code for price:

remDr$navigate(url)
doc <- htmlParse(remDr$getPageSource()[[1]])
var <- remDr$findElement('id','details')

varxml <- htmlTreeParse(vartxt, useInternalNodes=T)
Price <- xpathApply(varxml,"//div[@class='book-it__price-amount h3 text-special pull-left']",xmlValue)

But it returns me empty list. Maybe it hapepend, beacause the class "'book-it__price-amount h3 text-special pull-left' is not the upper class? If so - how to correct that? If not, where did I make a mistake?

回答1:

For me the code below works. About forbidden scraper on the web. In general if it's not allowed to use scraper you take risk if you use data for commercial purposes or you on regular bases send get requests. So depends how you are gonna use it

library(RCurl)
library(XML)

url<-getURL("https://www.airbnb.cz/rooms/12949270?guests=1&s=_JaPbz-J",ssl.verifypeer = F)
url2<-htmlParse(url)
Price <- xpathSApply(url2,"//div[@class='book-it__price-amount h3 text-special pull-left']",xmlValue)
conditions <- xpathSApply(url2,"//div[@class='col-md-6']",xmlValue)

来源：https://stackoverflow.com/questions/37675626/web-scraping-with-xpathsapply-getting-xmlvalue

标签

web-scraping

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!