Scrapy empty output

天涯浪子 提交于 2020-07-23 07:34:34

问题


I am trying to use Scrapy to extract data from page. But I get an empty output. What is the problem?

spider:

class Ratemds(scrapy.Spider):
    name = 'ratemds'
    allowed_domains = ['ratemds.com']

    custom_settings = {
        'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36 OPR/60.0.3255.50747 OPRGX/60.0.3255.50747',
    }

    def start_requests(self): 
        yield scrapy.Request('https://www.ratemds.com/doctor-ratings/dr-aaron-morrow-md-greensboro-nc-us' , callback=self.profile)

    def profile(self, response):
 
        item =  {
                'url': response.request.url,
                'Image': response.css('.doctor-profile-image::attr(src)').get(),
                'First_and_Last_Name': response.css('h1::text').get()
            }
        yield item

output:

{'url': 'https://www.ratemds.com/doctor-ratings/dr-aaron-morrow-md-greensboro-nc-us', 'Image': None, 'First_and_Last_Name': None}

回答1:


The problem is that this website has captcha protection. And when you try to collect information from it you are redirecting to the page, like this one:

and as you can see this page not contains information which you are looking for. To collect information from such website you can try the following:

  1. Use scrapy-selenium/splash to collect information.
  2. use captcha solving tools like death-by-captcha , anticaptcha or similar.



回答2:


It's likely to do with the use of css() method. Consider using xpath() instead:

For example to extract a text,

response.xpath("//td[@class='name']/span/text()").extract()


来源:https://stackoverflow.com/questions/62610604/scrapy-empty-output

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!