Scrapy repeating rows

问题

I'm trying to scrape through this site https://www.tahko.com/fi/menovinkit/?ql=tapahtumat. In particular, I'm trying to scrape through the 3 tables on the site.

I've managed this with

tables = response.xpath('//*[@class="table table-stripefd"]')

Then I'd like to get each of the rows for the table, which I did with

rows = tables.xpath('//tr')

The problem here is, that after scraping and printing out some of the data I noticed that there are multiple entries for some rows. For example, the event "Tahko vuorijuoksu" shows up on the website once, but in my scraped data I have 3 instances of it.

Could anyone point out why this is happening?

回答1:

When you use the selector like this:

rows = tables.xpath('//tr')

It will select every tr element in it self or descendent axis, unbounded by the parent element. So it will return all the 207 tr elements, for each of the 3 table elements.

To get only the tr elements childs of each table you can use it like this:

rows = tables.xpath('.//tr') # notice the .

Usually is more intuitive to write it like this:

for table in tables:
    rows = table.xpath('tr')

This is only a suggestion though, the previous solution works just fine.

来源：https://stackoverflow.com/questions/64305876/scrapy-repeating-rows

标签

python-3.x

xpath

web-scraping

scrapy

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!