I\'m running scrapy 0.20.2.
$ scrapy shell \"http://newyork.craigslist.org/ata/\"
I would like to make the list of all links to advertiseme
For this special use case, there is an XPath 1.0-workaround using translate(...)
:
//a[
translate(substring-before(@href, '.html'), '0123456789', '') = ''
and @href != '.html'
and substring-after(@href, '.html') = '']
The translate(...)
call removes all digits from the name part before the .html
extension. The second line check makes sure .html
is excluded (nothing before the dot), the last makes sure .html
actually is the file extension.