Simulating clicking on a javascript link in python

一世执手 提交于 2019-11-29 23:52:14

问题


I am trying to collate reviews of restaurants. Urllib2 works fine for the initial page of reviews, but there is then a link to load the next increment of comments which is a javascript link. An example page is here, and the code for the link "Next 25" is:

<a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$RestRatings$Next','')" class="red" id="ctl00_ContentPlaceHolder1_RestRatings_Next">NEXT 25&gt;&gt; </a>

I have looked at all the previous answers (e.g.), and I have to say I'm none the wiser. Looking at the console in Firebug doesn't offer up a handy link. Could you suggest the best (easiest) way to achieve this?

Edit: With thanks to Seleniumnewbie this code will print out all the comments from the reviews.:

from selenium import webdriver
from BeautifulSoup import BeautifulSoup
import re

driver = webdriver.Firefox()

def getURLinfo(url):

    driver.get(url)
    html = driver.page_source
    next25 = "ctl00_ContentPlaceHolder1_RestRatings_Next"
    soup = BeautifulSoup(html)

    while soup.find(id=re.compile(next25)):            
        driver.find_element_by_id(next25).click()
        html = html + driver.page_source
        soup = BeautifulSoup(driver.page_source)

    soup = BeautifulSoup(html)
    comment = soup.findAll(id=re.compile("divComment"))

    for entry in comment:
        print entry.div.contents #for comments

    driver.close()

回答1:


Find the element by id="ctl00_ContentPlaceHolder1_RestRatings_Next" and then click it.




回答2:


When a user clicks that link, the function __doPostBack is being called in javascript on the client. The link to the other question you provided assumes this function makes an AJAX call and then places the result in the same page.

However, the review pages you have linked to doesn't do that. It does make an AJAX call, but then it reloads the same page. I couldn't get to trap what the AJAX call is because it reloads immediately, but since the page is just reloading with the new comments I'm pretty sure that it is telling the server to move you to the next page.

So, in order to get your next page of comments you will have to call the same url that the __doPostBack function is calling and then reload the page you are on. To find this url, I would de-obfuscate their javascript and find the function being called. I believe the actual URL that will be called will depend on the parameter to that function so you want to make sure to replicate what it does.



来源:https://stackoverflow.com/questions/13436418/simulating-clicking-on-a-javascript-link-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!