scrapy and xpath function 'matches' syntax

后端 未结 2 865
挽巷
挽巷 2021-01-06 17:20

I\'m running scrapy 0.20.2.

$ scrapy shell \"http://newyork.craigslist.org/ata/\"

I would like to make the list of all links to advertiseme

2条回答
  •  时光取名叫无心
    2021-01-06 17:56

    For this special use case, there is an XPath 1.0-workaround using translate(...):

    //a[
      translate(substring-before(@href, '.html'), '0123456789', '') = ''
      and @href != '.html'
      and substring-after(@href, '.html') = '']
    

    The translate(...) call removes all digits from the name part before the .html extension. The second line check makes sure .html is excluded (nothing before the dot), the last makes sure .html actually is the file extension.

提交回复
热议问题