Scrapy crawling not working on ASPX website

前端 未结 3 1921
栀梦
栀梦 2020-12-17 05:52

I\'m scraping the Madrid Assembly\'s website, built in aspx, and I have no idea how to simulate clicks on the links where I need to get the corresponding politicians from. I

3条回答
  •  既然无缘
    2020-12-17 06:34

    I think that scrapy's from_response could help you a lot (maybe this isn't the best re but for it, but you'll get the idea), try something like this:

    import scrapy
    import urllib
    from scrapy.http.request.form import FormRequest
    
    
    class AsambleaMadrid(scrapy.Spider):
        name = "Asamblea_Madrid"
        start_urls = ['http://www.asambleamadrid.es/ES/QueEsLaAsamblea/ComposiciondelaAsamblea/LosDiputados/Paginas/RelacionAlfabeticaDiputados.aspx']
    
        def parse(self, response):
            ids_re = r'WebForm_PostBackOptions\(([^,]*)'
            for id in response.css('#moduloBusqueda li a').re(ids_re):
                target = urllib.unquote(id).strip('"')
                formdata = {'__EVENTTARGET': target}
                request = FormRequest.from_response(response=response,
                                                    formdata=formdata,
                                                    callback=self.takeEachParty,
                                                    dont_click=True)
                yield request
    
        def takeEachParty(self, response):
            print response.css('.listadoVert02 li a::text').extract()
    

提交回复
热议问题