问题
I'm trying to make my program check the return of an xpath expression and if it is null it should try a different one, how do I do this? I have tried all the examples on the website and the blank single quotes will not compile.
<var-def name="googleResults">
<xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/div/text()">
<html-to-xml>
<http url="http://google.com/shopping?q=asus laptops&hl=en"/>
</html-to-xml>
</xpath>
</var-def>
<var-def name="productTruth">
<case>
<if condition="${googleResults != null}">
<var name="googleResults"/>
</if>
<else>
<xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/text()">
<html-to-xml>
<http url="http://google.com/shopping?q=asus laptops&hl=en"/>
</html-to-xml>
</xpath>
</else>
</case>
</var-def>
Also is there any way to manipulate a defined variable to exclude certain parts of strings like symbols and numbers?
回答1:
I have found the same problem as you, where the example from the official WH user manual does not work, because of double single quotes.
as a work around I use: variable.toString().length() > 0
and here is your code:
<var-def name="googleResults">
<xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/div/text()">
<html-to-xml>
<http url="http://google.com/shopping?q=asus laptops&hl=en"/>
</html-to-xml>
</xpath>
</var-def>
<var-def name="productTruth">
<case>
<if condition="${googleResults.toString().length() > 0}">
<var name="googleResults"/>
</if>
<else>
<xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/text()">
<html-to-xml>
<http url="http://google.com/shopping?q=asus laptops&hl=en"/>
</html-to-xml>
</xpath>
</else>
</case>
</var-def>
Also, a few notes on your code in general:
1) Actually downloading the page is the most time and memory - consuming part of web harvest. If the information you want is not collected by the first xpath, you end up re-downloading the page (re-running the http request). save the result of the http request in a variable and you can then re-query the result, without repeating the download - this also limits the number of times you hit the source server, which becomes an issue if you have multiple pages to scrape.
<var-def name="pagetext">
<html-to-xml>
<http url="http://google.com/shopping?q=asus laptops&hl=en"/>
</html-to-xml>
</var-def>
<var-def name="googleResults">
<xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/div/text()">
<var name="pagetext"/>
</xpath>
</var-def>
<var-def name="productTruth">
<case>
<if condition="${googleResults.toString().length() > 0}">
<var name="googleResults"/>
</if>
<else>
<xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/text()">
<var name="pagetext"/>
</xpath>
</else>
</case>
</var-def>
2) you can avoid the whole conditional by changing the xpath:
//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/descendant-or-self::text()
<var-def name="pagetext">
<html-to-xml>
<http url="http://google.com/shopping?q=asus laptops&hl=en"/>
</html-to-xml>
</var-def>
<var-def name="googleResults">
<xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/descendant-or-self::text()">
<var name="pagetext"/>
</xpath>
</var-def>
回答2:
You may use normalize-space(.) != '' instead of ${googleResults != null}.
To manipulate a defined variable to exclude certain parts of strings like symbols and numbers use starts-with() ends-with() matches(), contains() any one of them as per your needs and webharvest support.
Take an example to check <b>dfsdffsnavindfds</b>
element:
- /b[starts-with(text(), 'd')] -- to find out if it is has starting character 'd'
- /b[ends-with(text(), 's')] -- to find out it if is has starting character 's'
- /b[contains(text(), 'navin')] -- to find out if it is has string 'navin'
For more information look at http://www.w3schools.com/xpath/xpath_functions.asp
来源:https://stackoverflow.com/questions/16331718/webharvest-if-and-null-test