发表新帖

发表新帖

How can i extract only text in scrapy selector in python

前端未结

关注

 5  823

[愿得一人] 2020-12-13 14:42

I have this code

   site = hxs.select(\"//h1[@class=\'state\']\")
   log.msg(str(site[0].extract()),level=log.ERROR)

The ouput is

5条回答

爱一瞬间的悲伤 (楼主)

2020-12-13 15:15
```
//h1[@class='state']
```
in your above xpath you are selecting h1 tag that has class attribute state

so that's why it's selecting everything that comes in h1 element

if you just want to select text of h1 tag all you have to do is
```
//h1[@class='state']/text()
```
if you want to select text of h1 tag as well as its children tags, you have to use
```
//h1[@class='state']//text()
```
so the difference is /text() for specific tag text and //text() for text of specific tag as well as its children tags

below mentioned code works for you
```
site = ''.join(hxs.select("//h1[@class='state']/text()").extract()).strip()
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题