Empty list for hrefs to achieve pagination through JavaScript onclick functions

老子叫甜甜 提交于 2020-01-06 04:20:29

问题


My intension is to achieve the pagination from javascript functions, so for example I am taking the URL as http://events.justdial.com/events/index.php?city=Hyderabad, from this URL as you can see the pagination at the end of the page, so if you observe HTML of that they are written through JavaScript functions which has href tags as #, I am just trying to collect that href tags even though they are #. The following is my code

class justdialdotcomSpider(BaseSpider):
   name = "justdialdotcom"
   allowed_domains = ["www.justdial.com"]
   start_urls = ["http://events.justdial.com/events/index.php?city=Hyderabad"]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       pagination = hxs.select('//div[@id="main"]/div[@id="content"]/div[@id="pagination"]/a').extract()
       print pagination,">>>>>>>>>>>>>>>>>."

When I run the above code I am getting the result as [], I mean none,can anyone tell me how to achieve the pagination through that JavaScript onclick functions and why the result is empty.And I am observing some kind of wierd in HTML that for example one of the page in pagination has anchor tag as <a onclick="jdevents.setPageNo(2)" href="#">2</a> but when I tried to view this by clicking view page sourcethrough browser I can't see any function as jdevents.setPageNo(2), (I expect if we can see what he is doing in HTML we can post that through formdata as request) I am really confused and unable to go through this.


回答1:


If you tracked the requests, you'll find post requests to the following URL : http://events.justdial.com/events/search.php

Post Data :

city:Hyderabad 
cat:0 
area:0 
fromDate: 
toDate: 
subCat:0 
pageNo:2
fetch:events

and the response is in JSON format.

So, your code should be the following

import re
import json

class justdialdotcomSpider(BaseSpider):
    name = "justdialdotcom"
    domain_name = "www.justdial.com"
    start_urls = ["http://events.justdial.com/events/search.php"]


    # Initial request
    def parse(self, response):
        return [FormRequest(url="http://events.justdial.com/events/search.php",
                                        formdata={'fetch': 'area',
                                                  'pageNo': '1',
                                                  'city' : 'Hyderabad',
                                                  'cat' : '0',
                                                  'area' : '0',
                                                  'fromDate': '',
                                                  'toDate' : '',
                                                  'subCat' : '0'
                                                  },
                                        callback=self.area_count
                                        )]


# Get total count and paginate through events
    def area_count(self, response):
        total_count = 0
        for area in  json.loads(response.body):
            total_count += int(area["count"])

        pages_count = (total_count / 10) + 1

        page = 1
        while (page <= pages_count):
            yield FormRequest(url="http://events.justdial.com/events/search.php",
                                        formdata={'fetch': 'events',
                                                  'pageNo': str(page),
                                                  'city' : 'Hyderabad',
                                                  'cat' : '0',
                                                  'area' : '0',
                                                  'fromDate': '',
                                                  'toDate' : '',
                                                  'subCat' : '0'
                                                  },
                                        callback=self.parse_events
                                        )
            page += 1


# parse events 
    def parse_events(self, response):
        events = json.loads(response.body)
        events.pop(0)

        for event_details in events:
            yield FormRequest(url="http://events.justdial.com/events/search.php",
                                        formdata={'fetch': 'event',
                                                  'eventId': str(event_details["id"]),
                                                  },
                                        callback=self.parse_event
                                        )



    def parse_event(self, response):
        event_details = json.loads(response.body)
        items = []
        #item = Product()

        items.append(item)
        return items


来源:https://stackoverflow.com/questions/10975337/empty-list-for-hrefs-to-achieve-pagination-through-javascript-onclick-functions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!