Nested Selectors in Scrapy

ぐ巨炮叔叔 提交于 2019-12-04 04:11:29

问题


I have trouble getting nested Selectors to work as described in the documentation of Scrapy (http://doc.scrapy.org/en/latest/topics/selectors.html)

Here's what I got:

sel = Selector(response)
level3fields = sel.xpath('//ul/something/*')

for element in level3fields:
    site = element.xpath('/span').extract()

When I print out "element" in the loop I get < Selector xpath='stuff seen above' data="u'< span class="something">text< /span>>

Now I got two problems:

  1. Firstly, within the element, there should also be an "a"-node (as in <a href), but it doesn't show up in the print out, only if I extract it directly, then it does show up. Is that just a printing error or doesn't the "element-Selector" hold the a-node (without extraction)

  2. when I print out "site" above, it should show a list with the span-nodes. However, it doesn't, it only prints out an empty list.

I tried a combination of changes (multiple to no slashes and stars (*) in different places), but none of it brought me any closer.

Essentially, I just want to get a nested Selector which gives me the span-node in the second step (the loop).

Anyone got any tips?


回答1:


Regarding your first question, it's just a print "error". __repr__ and __str__ methods on Selectors only print the first 40 characters of the data (element represented as HTML/XML or text content). See https://github.com/scrapy/scrapy/blob/master/scrapy/selector/unified.py#L143

In your loop on level3fields you should use relative XPath expressions. Using /span will look for span elements directly under the root node, that's not what you want I guess.

Try this:

sel = Selector(response)
level3fields = sel.xpath('//ul/something')

for element in level3fields:
    site = element.xpath('.//span').extract()


来源:https://stackoverflow.com/questions/22631819/nested-selectors-in-scrapy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!