Scrapy + Splash: scraping element inside inner html

情到浓时终转凉″ 提交于 2019-12-01 01:55:42

The problem is that iframe content is not returned as a part of html. You can either try to fetch iframe content directly (by its src), or use render.json endpoint with iframes=1 option:

# ...
    yield SplashRequest(url, self.parse_result, endpoint='render.json', 
                        args={'html': 1, 'iframes': 1})

def parse_result(self, response):
    iframe_html = response.data['childFrames'][0]['html']
    sel = parsel.Selector(iframe_html)
    item = {
        'my_field': sel.xpath(...),
        # ...  
    }

/execute endpoint doesn't support fetching iframes content as of Splash 2.3.3.

An alternative way to deal with iframe can be (response if the main page):

    urls = response.css('iframe::attr(src)').extract()
    for url in urls :
            parse the url

this way the iframe is parsed like it was a normal page, but at the moment i cannot send the cookies in the main page to the html inside the iframe and that's a problem

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!