scrapy选择器知识整理

先写一下大家都中所周知的
css写法就不在此赘述了,因为本人不喜欢用css选择器

response.xpath('//table/tr/td[3]/a/@href').extract() response.xpath('//table/tr/td[3]/a/@href').extract_first() response.xpath('//div[@id="PN"]/div/font//text()').extract()

这个是获取font标签下的所有子标签的文本值,但是在使用有bug,也不能获取所有的,不知道原因,

response.xpath('//div[@id="PN"]/div/font/text()').extract()

所以我找到了另一种方法string(.)

response.xpath('//div[@style="font-family:Times New Roman;"]/p/b/font').xpath('string(.)').extract() response.xpath('normalize-space(//div/table/tr[@bgcolor="#549dc5"]/td/div/font)')

再说一下scrapy的re方法,response.xpath().re(),re必须有表达式

response.xpath('//div[@style="font-family:Times New Roman;"]/p/b/font').xpath('string(.)').re('ITEM|Item.*')

还有一个很重要的,但是不常用的,也是需要知道的,大家都知道选择一个标签所有的属性值,但是如何排除一个标签属性那,XPATH如何选择不包含某一个属性的节点?
这里可以用到not。例如排除一个属性的节点可以使用//tbody/tr[not(@class)]来写，排除一个或者两个属性可以使用//tbody/tr[not(@class or @id)]来选择。

response.xpath('//div[@style="font-family:Times New Roman;"]/p[not(@align="center")]').xpath('string(.)').extract()

来源：51CTO

作者：彩伊

链接：https://blog.csdn.net/weixin_42185136/article/details/100931077

标签

css选择器

response

font

xpath