Need to Fetch the specific data from external page

泄露秘密 提交于 2019-12-11 10:25:47

问题


I am making a cfhttp call and getting the data back..

Now I am getting a complete page like below:

<html><title>MyPage</title><head><link rel="stylesheet" href="style.css"></head>
<body>
<table></table>
<table></table>
<table></table>
<table></table>
<table></table>
<table></table>
</body>
</html>

Now the issue I want the code which which is inside the body tag, and also remove the last table tag completely.

I am not sure where to start [p.s JSOUP is not an option]

tried like below but it did not yielded any results:

<cfset objPattern = CreateObject("java","java.util.regex.Pattern").Compile(JavaCast("string","(?i)<table[^>]*>([\w\W](?!<table))+?</table>"))>  
    <cfset objMatcher = objPattern.Matcher(JavaCast( "string", cfhttp.FileContent ))> 
    <cfoutput>#objMatcher#</cfoutput>

回答1:


As far as convincing the client, explain that while regular expressions are great for some jobs, they are really not the best tool for parsing html. JSoup is not an external service. It is a pre-built library designed specifically for this task (unlike regular expressions).

JSoup is very simple to use, and similar to working with javascript's DOM. Just add the JSoup jar to your class path (restart if needed) and it is ready to use.

I want the code which which is inside the body tag, and also remove the last table tag completely.

Load the html content into a Document object and grab the <body> element:

jsoup = createObject("java", "org.jsoup.Jsoup");
doc = jsoup.parse( yourHTMLContentString );
body = doc.body();

Use a selector to grab and remove the last <table> element:

elem = doc.select("table:last-of-type");
elem.remove();

That is it. Now you can print, or do whatever you want, with the <body> element's html:

writeOutput( HTMLEditFormat(body.html()) );

See their documentation for more information. In particular, the JSoup Cookbook has some very good examples.



来源:https://stackoverflow.com/questions/27282555/need-to-fetch-the-specific-data-from-external-page

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!