Scrapy crawling not working on ASPX website

前端未结

关注

 3  1921

栀梦 2020-12-17 05:52

I\'m scraping the Madrid Assembly\'s website, built in aspx, and I have no idea how to simulate clicks on the links where I need to get the corresponding politicians from. I

3条回答

既然无缘 (楼主)

2020-12-17 06:34

I think that scrapy's from_response could help you a lot (maybe this isn't the best re but for it, but you'll get the idea), try something like this:

import scrapy
import urllib
from scrapy.http.request.form import FormRequest


class AsambleaMadrid(scrapy.Spider):
    name = "Asamblea_Madrid"
    start_urls = ['http://www.asambleamadrid.es/ES/QueEsLaAsamblea/ComposiciondelaAsamblea/LosDiputados/Paginas/RelacionAlfabeticaDiputados.aspx']

    def parse(self, response):
        ids_re = r'WebForm_PostBackOptions\(([^,]*)'
        for id in response.css('#moduloBusqueda li a').re(ids_re):
            target = urllib.unquote(id).strip('"')
            formdata = {'__EVENTTARGET': target}
            request = FormRequest.from_response(response=response,
                                                formdata=formdata,
                                                callback=self.takeEachParty,
                                                dont_click=True)
            yield request

    def takeEachParty(self, response):
        print response.css('.listadoVert02 li a::text').extract()

0 讨论(0)

查看其它3个回答