I\'m using Scrapy + Splash to crawl webpages and try to extract data form google ad banners and other ads and I\'m having difficulty getting scrapy to follow the xpath into
The problem is that iframe content is not returned as a part of html. You can either try to fetch iframe content directly (by its src), or use render.json endpoint with iframes=1 option:
# ...
yield SplashRequest(url, self.parse_result, endpoint='render.json',
args={'html': 1, 'iframes': 1})
def parse_result(self, response):
iframe_html = response.data['childFrames'][0]['html']
sel = parsel.Selector(iframe_html)
item = {
'my_field': sel.xpath(...),
# ...
}
/execute
endpoint doesn't support fetching iframes content as of Splash 2.3.3.