问题
I am making a cfhttp call and getting the data back..
Now I am getting a complete page like below:
<html><title>MyPage</title><head><link rel="stylesheet" href="style.css"></head>
<body>
<table></table>
<table></table>
<table></table>
<table></table>
<table></table>
<table></table>
</body>
</html>
Now the issue I want the code which which is inside the body tag, and also remove the last table tag completely.
I am not sure where to start [p.s JSOUP is not an option]
tried like below but it did not yielded any results:
<cfset objPattern = CreateObject("java","java.util.regex.Pattern").Compile(JavaCast("string","(?i)<table[^>]*>([\w\W](?!<table))+?</table>"))>
<cfset objMatcher = objPattern.Matcher(JavaCast( "string", cfhttp.FileContent ))>
<cfoutput>#objMatcher#</cfoutput>
回答1:
As far as convincing the client, explain that while regular expressions are great for some jobs, they are really not the best tool for parsing html. JSoup is not an external service. It is a pre-built library designed specifically for this task (unlike regular expressions).
JSoup is very simple to use, and similar to working with javascript's DOM. Just add the JSoup jar to your class path (restart if needed) and it is ready to use.
I want the code which which is inside the body tag, and also remove the last table tag completely.
Load the html content into a Document object and grab the <body> element:
jsoup = createObject("java", "org.jsoup.Jsoup");
doc = jsoup.parse( yourHTMLContentString );
body = doc.body();
Use a selector to grab and remove the last <table> element:
elem = doc.select("table:last-of-type");
elem.remove();
That is it. Now you can print, or do whatever you want, with the <body> element's html:
writeOutput( HTMLEditFormat(body.html()) );
See their documentation for more information. In particular, the JSoup Cookbook has some very good examples.
来源:https://stackoverflow.com/questions/27282555/need-to-fetch-the-specific-data-from-external-page