simulate clicking link on web page

北战南征 提交于 2019-12-11 04:41:55

问题


I am trying to scrape below webpage

http://www.houseoffraser.co.uk/Eliza+J+3/4+sleeve+ruched+waist+dress/165288648,default,pd.html

The stock data for each colour/size combination appears only when the colour or size is selected. In r is it possible to simulate this to get the data.

So far, I have been able to capture the colour and size

mcolour = toString(xpathSApply(page,'//ul[@class="colour-swatches-list toggle-panel"]//li[@title]',xmlGetAttr,"title"))

size = xpathSApply(page,'//ul[@class="size-swatches-list toggle-panel"]//li[@data-size]',xmlGetAttr,"data-size")

but I am not sure how capture stock levels per colour/size combination.

Please advice !

============================================================ I could not find new as a method, Am I missing anything ?

firefoxClass
Generator for class "firefoxClass":

Class fields:

Name:  exceptionTable     javaWarMes     javaDriver   javaNavigate
Class:         matrix            ANY            ANY            ANY

Class Methods:  
"back", "callSuper", "close", "copy", "export", "field", "findElementByClassName", 
 "findElementByCssSelector", "findElementById", "findElementByLinkText",  "findElementByName", 
 "findElementByPartialLinkText", "findElementByTagName", "findElementByXPath", 
 "findElementsByClassName", "findElementsByCssSelector", "findElementsById", 
 "findElementsByLinkText", "findElementsByName", "findElementsByPartialLinkText", 
 "findElementsByTagName", "findElementsByXPath", "forward", "get", "getCapabilities", 
 "getClass", "getCurrentUrl", "getPageSource", "getRefClass", "getTitle", "getVersion", 
  "import", "initFields", "initialize", "initialize#exceptionClass", "printHtml",   "refresh", 
  "show", "show#envRefClass", "trace", "tryExc", "untrace", "usingMethods"


  Reference Superclasses:  
  "exceptionClass", "envRefClass"

回答1:


For a given product ID pid which you can scrape from the page, you can get stock availability by querying:

http://www.houseoffraser.co.uk/on/demandware.store/Sites-hof-Site/default/Product-UpdateQuantityList?pid=165288698&quantity=1

you don't even need to set any cookies for that query. That returns an HTML and javascript chunk that is used to set the control on the page. Here's an example of limited stock (currently 2, although I might have just bought all of them by accident):

http://www.houseoffraser.co.uk/on/demandware.store/Sites-hof-Site/default/Product-UpdateQuantityList?pid=165288648&quantity=1

You could get the number in stock by either parsing the availabilityMessage string or the <select> control.

The only step I've not worked out is getting the pid values, and how you would map those to the descriptions, but that should all be on the page somewhere if it isn't being downloaded by Ajax requests (which is where the stock data comes from).

You are using the Chrome debugger/inspector aren't you?




回答2:


Here is an example using relenium, which you can easily extend to also query product colours:

require(relenium) # More info: https://github.com/LluisRamon/relenium
require(XML)
firefox <- firefoxClass$new() # init browser
firefox$get("http://www.houseoffraser.co.uk/Eliza+J+3/4+sleeve+ruched+waist+dress/165288648,default,pd.html") # open url
sizes <- xpathSApply(htmlParse(firefox$getPageSource()), "//ul[@class='size-swatches-list toggle-panel']/li/a", xmlValue) # read available sizes

stockMsg <- vector() # init stock message vector
for (size in sizes) { # for each available size
  sizeLink <- firefox$findElementByXPath(sprintf("//ul[@class='size-swatches-list toggle-panel']/li[@data-size='%s']", size)) # focus size link
  sizeLink$click() # click size link
  stockMsg <- c(stockMsg, # and append stock message to stock message vector
                firefox$findElementByXPath("/html/body/div/div[3]/div/div/div[4]/div/div/div/div/form/div[4]/div[4]/div")$getText()
                )
}
setNames(stockMsg, sizes) # name stock msg vector and print it
# 8                       10 
# "in stock"               "in stock" 
# 12                       14 
# "in stock"               "in stock" 
# 16                       18 
# "in stock" "in stock, only 17 left" 
# 20                       22 
# "in stock, only 2 left"  "in stock, only 2 left" 
# 24                       26 
# "Out of stock"           "Out of stock" 
# 28 
# "Out of stock" 


来源:https://stackoverflow.com/questions/22121615/simulate-clicking-link-on-web-page

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!