Find Xpath of an element in a html page content using java

不打扰是莪最后的温柔 提交于 2019-12-13 18:19:08

问题


I'm begginer to xpath expression ,

I have below url :

http://www.newark.com/white-rodgers/586-902/contactor-spst-no-12vdc-200a-bracket/dp/35M1913?MER=PPSO_N_P_EverywhereElse_None

which holds html pagecontent,using following xpaths it results same ul element in javascript:

  1. //*[@id="moreStock_5257711"]
  2. //*[@id="priceWrap"]/div[1]/div/a/following-sibling::ul
  3. //html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul

using this xpaths how sholud i get same ul element in java

I have tried using "html cleaner" it failed in xpath -

"//*[@id="priceWrap"]/div[1]/div/a/following-sibling::ul",
"//html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul"

it got worked for "//*[@id='moreStock_5257711']" this xpath. So below code which i have tried in html cleaner

package com.test.htmlcleaner.HtmlCleaner;

import java.io.IOException;

import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;
import org.htmlcleaner.XPatherException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Test {
 public static void main(String[] args) {

  try {
 HtmlCleaner htmCleaner = new HtmlCleaner();
   CleanerProperties cleanerProperties = htmCleaner.getProperties();
   cleanerProperties.setTranslateSpecialEntities(true);
   cleanerProperties.setTransResCharsToNCR(true);
   cleanerProperties.setOmitComments(true);

   String s = "http://www.newark.com/white-rodgers/586-902/contactor-spst-no-12vdc-200a-bracket/dp/35M1913?MER=PPSO_N_P_EverywhereElse_None"; 
     Document doc = Jsoup.connect(s).timeout(30000).userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2").get();

    String pageContent=doc.toString();
    TagNode node = htmCleaner.clean(pageContent);
    Object[] statsNode = node.evaluateXPath("//*[@id='moreStock_5257711']");
    if(statsNode.length > 0) {    
             for(int i=0;i<statsNode.length;i++){
               TagNode resultNode = (TagNode)statsNode[i];
               System.out.println("hi");
                System.out.println("Element Text : " +resultNode.getText().toString().trim());                 
               }
          }
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (XPatherException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }
 }
}

I required all xpaths sholud work with one package in java

Can any one suggest me to get working all xpaths expression of getting ul element using java.

Thanks for advance regards.


回答1:


Try to debug the actual HTML DOM tree being created by HtmlCleaner. Use the following code:

String pageContent = doc.toString();
TagNode node = htmCleaner.clean(pageContent);

StringWriter buffer = new StringWriter();
node.serialize(new PrettyHtmlSerializer(cleanerProperties), buffer);

System.out.println(buffer.toSting());

Now, try to apply all the XPaths on this buffer output and see why they don't work.



来源:https://stackoverflow.com/questions/28713685/find-xpath-of-an-element-in-a-html-page-content-using-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!