问题
I'm trying to scrape through this site https://www.tahko.com/fi/menovinkit/?ql=tapahtumat. In particular, I'm trying to scrape through the 3 tables on the site.
I've managed this with
tables = response.xpath('//*[@class="table table-stripefd"]')
Then I'd like to get each of the rows for the table, which I did with
rows = tables.xpath('//tr')
The problem here is, that after scraping and printing out some of the data I noticed that there are multiple entries for some rows. For example, the event "Tahko vuorijuoksu" shows up on the website once, but in my scraped data I have 3 instances of it.
Could anyone point out why this is happening?
回答1:
When you use the selector like this:
rows = tables.xpath('//tr')
It will select every tr
element in it self or descendent axis, unbounded by the parent element. So it will return all the 207 tr
elements, for each of the 3 table
elements.
To get only the tr
elements childs of each table you can use it like this:
rows = tables.xpath('.//tr') # notice the .
Usually is more intuitive to write it like this:
for table in tables:
rows = table.xpath('tr')
This is only a suggestion though, the previous solution works just fine.
来源:https://stackoverflow.com/questions/64305876/scrapy-repeating-rows