scrapy get nth-child text of same class

霸气de小男生 提交于 2020-03-22 06:26:31

问题


I've attached a picture. The problem I'm facing is that getting the first element of same class. I'm trying to get .adxHeader > .adxExtraInfo (1st one) > .adxExtraInfoPart (1st one) > a::text

I wrote the following code but not working. Any Idea?

response.css('div.adxViewContainer div.adxHeader div.adxExtraInfo:nth-child(1) div.adxExtraInfoPart:nth-child(1) a::text').extract_first()

expected output: الرياض

<div class="adxHeader">
        <h3 itemprop="name"> »  درج داخلي للاجار جديد حي المونسيه</h3>

                            <div class="adxExtraInfo">
                                <div class="adxExtraInfoPart"><a href="/city/الرياض"><i class="fa fa-map-marker"></i> الرياض</a></div>
                                <div class="adxExtraInfoPart"><a href="/users/ابو نوره"><i class="fa fa-user"></i> ابو نوره</a></div>
                            </div>

                            <div class="adxExtraInfo">
                                <div class="adxExtraInfoPart"> قبل  ساعه و 27 دقيقه</div>
                                <div class="adxExtraInfoPart">#20467014</div>
                            </div>
                            <div class="moveLeft">


                                <a href="www.google.com" class="nextad"> &#8592; التالي      </a>
                                          <br />

                            </div>

        </div>

回答1:


The <div class="adxExtraInfo"> that you are targetting is not the 1st child of its <div class="adxHeader"> parent. The <h3> is. So div.adxExtraInfo:nth-child(1) will not match anything in your input:

>>> s = scrapy.Selector(text='''<div class="adxHeader">
...         <h3 itemprop="name"> »  درج داخلي للاجار جديد حي المونسيه</h3>
... 
...                             <div class="adxExtraInfo">
...                                 <div class="adxExtraInfoPart"><a href="/city/الرياض"><i class="fa fa-map-marker"></i> الرياض</a></div>
...                                 <div class="adxExtraInfoPart"><a href="/users/ابو نوره"><i class="fa fa-user"></i> ابو نوره</a></div>
...                             </div>
... 
...                             <div class="adxExtraInfo">
...                                 <div class="adxExtraInfoPart"> قبل  ساعه و 27 دقيقه</div>
...                                 <div class="adxExtraInfoPart">#20467014</div>
...                             </div>
...                             <div class="moveLeft">
... 
... 
...                                 <a href="www.google.com" class="nextad"> &#8592; التالي      </a>
...                                           <br />
... 
...                             </div>
... 
...         </div>''')

>>> s.css('div.adxHeader > div.adxExtraInfo:nth-child(1)').extract()
[]
>>> s.css('div.adxHeader > *:nth-child(1)').extract()
[u'<h3 itemprop="name"> \xbb  \u062f\u0631\u062c \u062f\u0627\u062e\u0644\u064a \u0644\u0644\u0627\u062c\u0627\u0631 \u062c\u062f\u064a\u062f \u062d\u064a \u0627\u0644\u0645\u0648\u0646\u0633\u064a\u0647</h3>']
>>> 

But you may want to anchor div.adxExtraInfo with the <h3> in that case, using the Adjacent sibling combinator (in other words, the <div class="adxExtraInfo"> immediately following the <h3>):

>>> print(
...     s.css('''div.adxHeader
...                 > h3:nth-child(1) + div.adxExtraInfo
...                     div.adxExtraInfoPart:nth-child(1) a::text''').extract_first())
 الرياض
>>> 



回答2:


You could use xpath instead of css:

response.xpath('(//div[@class="adxExtraInfo"])[1]//a/text()').extract_first()



回答3:


using your snippet this should extract what you want (it also work if you use nth-child(1):

response.css('.adxExtraInfoPart:first-child > a::text').extract()


来源:https://stackoverflow.com/questions/42816164/scrapy-get-nth-child-text-of-same-class

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!