xpath to get data starts with specific character or string

别说谁变了你拦得住时间么 提交于 2021-01-28 20:15:51

问题


I need to extract certain text elements from the following code.

<div class="inhalt-links">
    <h2>
        Deutsche Verkehrswacht
        <br>
        Verkehrswacht Dortmund e. V.
        <br>
    </h2>
    <h3>
        Standnummer:&nbsp;
            <span style="font-weight: normal;">4.E08</span>
    </h3>
    <div class="clear"></div>
    <br>
    Benediktinerstraße 82
    <br>
    44287&nbsp;Dortmund
    <br>
    Deutschland
    <br>
    <br>
    Tel.:+49 231 447687
    <br>
    Fax:+49 231 447136
    <br>
    E-Mail:info@verkehrswacht-dortmund.de
    <br>
    <a href="http://www.verkehrswacht-dortmund.de" class="url" target="_blank">www.verkehrswacht-dortmund.de</a>
    <br>
    <div class="social"></div>
    <br>
</div>

For extracting the Tel.:+49 231 447687, i can use div[@class='inhalt-links']/text()[4]. And for other details like Fax, Email, Website, i just need to change the position number of text() element. But, the position of these texts will be of different order sometimes, like in the following code:

<div class="inhalt-links">
    <h2>
        DEW21
        <br>
    </h2>
    <h3>
        Standnummer:&nbsp;
            <span style="font-weight: normal;">4.B56</span>
    </h3>
    <div class="clear"></div>
    <br>
    Günter-Samtlebe-Platz 1
    <br>
    44135&nbsp;Dortmund
    <br>
    Postfach:104141
    <br>
    44041&nbsp;Dortmund
    <br>
    Deutschland
    <br>
    <br>
    Tel.:+49 231 544-0
    <br>
    Fax:+49 231 544-1130
    <br>
    E-Mail:vertrieb@dew21.de
    <br>
    <a href="http://www.dew21.de" class="url" target="_blank">www.dew21.de</a>
    <br>
    <div class="social"></div>
    <br>
</div>

The xpath div[@class='inhalt-links']/text()[4] will select the text "44041 Dortmund" instead of Tel.:+49 231 544-0. Is there any xpath like "div[@class='inhalt-links']/text[starts with "Tel.:"]" to select the Tel.:element?


回答1:


" Is there any xpath like "//div[@class='inhalt-links']/text[starts with "Tel.:"]" to select the Tel.: element?"

Sure, try this way :

//div[@class='inhalt-links']/text()[starts-with(normalize-space(), 'Tel.:')]

The XPath returns text node -rather than element- that starts with, after removing leading and trailing whitespaces*, the keyword Tel.:.


*) For reference of what normalize-space() is doing more precisely :

The normalize-space function strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string. [Mozilla Developer Network]



来源:https://stackoverflow.com/questions/36663142/xpath-to-get-data-starts-with-specific-character-or-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!