问题
I have a Django project with a scrapy application.
After the user fill some form fields, I pass the filled data to the spider and crawl some pages.
Everything is working like a charm, the database is being populated. Except for one thing.
When the user press the submit button, the results page is blank because the spider didn't finish crawling and the data isn't in the database.
How can I, inside a Django view, the same that called the spider, know that that crawl has finished?
Here goes my code:
def search_process(request):
"""
Get data from the user and redirect him to results page.
"""
db = get_db()
process_number = request.POST.get('process_number', '').strip()
court = request.POST.get('court', '').strip()
start_crawl(process_number, court)
process = db.processes.find_one({
'process_number': process_number,
'court': court
})
context = {
'process': process,
}
return render(request, 'process_result.html', context)
def start_crawl(process_number, court):
"""
Starts the crawler.
Args:
process_number (str): Process number to be found.
court (str): Court of the process.
"""
runner = CrawlerRunner()
dispatcher.connect(reactor.stop, signal=signals.spider_closed)
process_info = runner.crawl(ProcessesSpider,
process_number=process_number,
court=court)
process_info.addBoth(lambda _: reactor.stop())
回答1:
Not sure if my answer will work, but you can try it or if anyone has a better idea do share.
in your crawling function return a boolean value
def start_crawl(process_number, court):
....rest of your code....
return True
and at your view function
def search_process(request):
...rest of your code...
crawling = start_crawl(process_number, court)
if crawling:
return render(request, 'process_result.html', context)
来源:https://stackoverflow.com/questions/52158543/django-redirect-to-results-page-after-scrapy-finish