Scraping a dynamic ecommerce page with infinite scroll

问题

I\'m using rvest in R to do some scraping. I know some HTML and CSS.

I want to get the prices of every product of a URI:

http://www.linio.com.co/tecnologia/celulares-telefonia-gps/

The new items load as you go down on the page (as you do some scrolling).

What I\'ve done so far:

Linio_Celulares <- html(\"http://www.linio.com.co/celulares-telefonia-gps/\")

Linio_Celulares %>%
  html_nodes(\".product-itm-price-new\") %>%
  html_text()

And i get what i need, but just for the 25 first elements (those load for default).

 [1] \"$ 1.999.900\" \"$ 1.999.900\" \"$ 1.999.900\" \"$ 2.299.900\" \"$ 2.279.900\"
 [6] \"$ 2.279.900\" \"$ 1.159.900\" \"$ 1.749.900\" \"$ 1.879.900\" \"$ 189.900\"  
[11] \"$ 2.299.900\" \"$ 2.499.900\" \"$ 2.499.900\" \"$ 2.799.000\" \"$ 529.900\"  
[16] \"$ 2.699.900\" \"$ 2.149.900\" \"$ 189.900\"   \"$ 2.549.900\" \"$ 1.395.900\"
[21] \"$ 249.900\"   \"$ 41.900\"    \"$ 319.900\"   \"$ 149.900\"

Question: How to get all the elements of this dynamic section?

I guess, I could scroll the page until all elements are loaded and then use html(URL). But this seems like a lot of work (i\'m planning of doing this on different sections). There should be a programmatic work around.

回答1:

As @nrussell suggested, you can use RSelenium to programatically scroll down the page before getting the source code.

You could for example do:

library(RSelenium)
library(rvest)
#start RSelenium
checkForServer()
startServer()
remDr <- remoteDriver()
remDr$open()

#navigate to your page
remDr$navigate("http://www.linio.com.co/tecnologia/celulares-telefonia-gps/")

#scroll down 5 times, waiting for the page to load at each time
for(i in 1:5){      
remDr$executeScript(paste("scroll(0,",i*10000,");"))
Sys.sleep(3)    
}

#get the page html
page_source<-remDr$getPageSource()

#parse it
html(page_source[[1]]) %>% html_nodes(".product-itm-price-new") %>%
  html_text()

回答2:

library(rvest)
url<-"https://www.linio.com.co/c/celulares-y-tablets?page=1"
page<-html_session(url)

html_nodes(page,css=".price-secondary") %>% html_text()

Loop through the website https://www.linio.com.co/c/celulares-y-tablets?page=2 and 3 and so on and it will be easy for you to scrape the data

EDIT dated 07/05/2019

The website elements changed. Hence new code

library(rvest)
url<-"https://www.linio.com.co/c/celulares-y-tablets?page=1"
page<-html_session(url)

html_nodes(page,css=".price-main") %>% html_text()

来源：https://stackoverflow.com/questions/29861117/scraping-a-dynamic-ecommerce-page-with-infinite-scroll

标签

web-scraping

infinite-scroll

rvest