Python Scrapy - mimetype based filter to avoid non-text file downloads
问题 I have a running scrapy project, but it is being bandwidth intensive because it tries to download a lot of binary files (zip, tar, mp3, ..etc). I think the best solution is to filter the requests based on the mimetype (Content-Type:) HTTP header. I looked at the scrapy code and found this setting: DOWNLOADER_HTTPCLIENTFACTORY = 'scrapy.core.downloader.webclient.ScrapyHTTPClientFactory' I changed it to: DOWNLOADER_HTTPCLIENTFACTORY = 'myproject.webclients.ScrapyHTTPClientFactory' And played a