ColdFusion , REGEX - Given TEXT, find all items contained in SPANs

假装没事ソ 提交于 2020-01-15 03:13:47

问题


I'm looking to learn how to create a REGEX in Coldfusion that will scan through a large item of html text and create a list of items.

The items I want are contained between the following

<span class="findme">The Goods</span>

Thanks for any tips to get this going.


回答1:


You don't say what version of CF. Since v8 you can use REMatch to get an array

results = REMatch('(?i)<span[^>]+class="findme"[^>]*>(.+?)</span>', text)

Use ArrayToList to turn that into a list. For older version use REFindNoCase and use Mid() to extract substrings.

EDIT: To answer your follow-up comment the process of using REFind to return all matches is quite involved because the function only returns the FIRST match. This means you actually have to call REFind many times passing a new startpos each time. Ben Forta has written a UDF which does exactly this and will save you some time.

<!---
Returns all the matches of a regular expression within a string.
NOTE: Updated to allow subexpression selection (rather than whole match)

@param regex      Regular expression. (Required)
@param text       String to search. (Required)
@param subexnum   Sub-expression to extract (Optional)
@return Returns a structure.
@author Ben Forta (ben@forta.com)
@version 1, July 15, 2005
--->
<cffunction name="reFindAll" output="true" returnType="struct">
<cfargument name="regex" type="string" required="yes">
<cfargument name="text" type="string" required="yes">
<cfargument name="subexnum" type="numeric" default="1">

<!--- Define local variables --->    
<cfset var results=structNew()>
<cfset var pos=1>
<cfset var subex="">
<cfset var done=false>

<!--- Initialize results structure --->
<cfset results.len=arraynew(1)>
<cfset results.pos=arraynew(1)>

<!--- Loop through text --->
<cfloop condition="not done">

   <!--- Perform search --->
   <cfset subex=reFind(arguments.regex, arguments.text, pos, true)>
   <!--- Anything matched? --->
   <cfif subex.len[1] is 0>
      <!--- Nothing found, outta here --->
      <cfset done=true>
   <cfelse>
      <!--- Got one, add to arrays --->
      <cfset arrayappend(results.len, subex.len[arguments.subexnum])>
      <cfset arrayappend(results.pos, subex.pos[arguments.subexnum])>
      <!--- Reposition start point --->
      <cfset pos=subex.pos[1]+subex.len[1]>
   </cfif>
</cfloop>

<!--- If no matches, add 0 to both arrays --->
<cfif arraylen(results.len) is 0>
   <cfset arrayappend(results.len, 0)>
   <cfset arrayappend(results.pos, 0)>
</cfif>

<!--- and return results --->
<cfreturn results>
</cffunction>

This gives you the start (pos) and length of each match so to get each substring use another loop

<cfset text = '<span class="findme">The Goods</span><span class="findme">More Goods</span>' />
<cfset pattern = '(?i)<span[^>]+class="findme"[^>]*>(.+?)</span>' />
<cfset results = reFindAll(pattern, text, 2) />
<cfloop index="i" from="1" to="#ArrayLen(results.pos)#">
    <cfoutput>match #i#: #Mid(text, results.pos[i], results.len[i])#<br></cfoutput>
</cfloop>

EDIT: Updated reFindAll with subexnum argument. Setting this to 2 will capture the first subexpression. The default value 1 captures the entire match.




回答2:


Try looking into the possibility of making your HTML work with a regular DOM Parser and querying it via XPath instead of hammering this trough an regex-based abomination.

  1. to make HTML input usable, pass it through jTidy (see http://jtidy.riaforge.org/)
  2. Once you have well-formed XML/XHTML, build an XML document from it
    <cfset dom = XmlParse(scrubbedHtml, true)>
  3. query the XML document using XPath
    <cfset result = XmlSearch(dom, "//span[@class='findme']")>

Done.

EDIT: Coldfusion's XmlSearch() doesn't have great XML namespace support. If you end up producing XHTML instead of the more recommendable XML, use the following XPath (note the colon) "//:span[@class='findme']" or "//*:span[@class='findme']". See here and here for more info.

See the jTidy API documentation for a complete overview what jTidy can do.



来源:https://stackoverflow.com/questions/2414576/coldfusion-regex-given-text-find-all-items-contained-in-spans

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!