Coldfusion - How to parse and segment out data from an email file

♀尐吖头ヾ 提交于 2019-12-12 13:41:11

问题


I am trying to parse email files that will be coming periodically for data that is contained within. We plan to setup cfmail to get the email within the box within CF Admin to run every minute.

The data within the email consists of name, code name, address, description, etc. and will have consistent labels so we are thinking of performing a loop or find function for each field of data. Would that be a good start?

Here is an example of email data:

INCIDENT # 12345

LONG TERM SYS# C12345

REPORTED: 08:39:34 05/20/19 Nature: FD NEED Address: 12345 N TEST LN City: Testville

Responding Units: T12

Cross Streets: Intersection of: N Test LN & W TEST LN

Lat= 39.587453 Lon= -86.485021

Comments: This is a test post. Please disregard

Here's a picture of what the data actually looks like:

So we would like to extract the following:

  1. INCIDENT
  2. LONG TERM SYS#
  3. REPORTED
  4. Nature
  5. Address
  6. City
  7. Responding Units
  8. Cross Streets
  9. Comments

Any feedback or suggestions would be greatly appreciated!


回答1:


Someone posted this but it was apparently deleted. Whoever it was I want to thank you VERY MUCH as it worked perfectly!!!!

Here is the function:

   <!---CREATE FUNCTION [tvf-Str-Extract] (@String varchar(max),@Delimiter1   
varchar(100),@Delimiter2 varchar(100))
Returns Table 
As
Return (  

with   cte1(N)   as (Select 1 From (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))     
N(N)),
   cte2(N)   as (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By    
(Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A   
),
   cte3(N)   as (Select 1 Union All Select t.N+DataLength(@Delimiter1) From cte2 t 
Where Substring(@String,t.N,DataLength(@Delimiter1)) = @Delimiter1),
   cte4(N,L) as (Select S.N,IsNull(NullIf(CharIndex(@Delimiter1,@String,s.N),0)-
 S.N,8000) From cte3 S)


 Select RetSeq = Row_Number() over (Order By N)
  ,RetPos = N
  ,RetVal = left(RetVal,charindex(@Delimiter2,RetVal)-1) 

 From  ( Select *,RetVal = Substring(@String, N, L) From cte4 ) A

 Where charindex(@Delimiter2,RetVal)>1
 )

And here is the CF code that worked:

 <cfquery name="body" datasource="#Application.dsn#">
                    Declare @S varchar(max) ='
                    INCIDENT  12345

                    LONG TERM SYS C12345

                    REPORTED: 08:39:34 05/20/19 Nature: FD NEED Address: 12345 N TEST   
  LN City: Testville

                    Responding Units: T12

                    Cross Streets: Intersection of: N Test LN & W TEST LN

                    Lat= 39.587453 Lon= -86.485021

                    Comments: This is a test post. Please disregard
                    '

                    Select Incident = ltrim(rtrim(B.RetVal))
                          ,LongTerm = ltrim(rtrim(C.RetVal))
                          ,Reported = ltrim(rtrim(D.RetVal))
                          ,Nature   = ltrim(rtrim(E.RetVal))
                          ,Address  = ltrim(rtrim(F.RetVal))
                          ,City     = ltrim(rtrim(G.RetVal))
                          ,RespUnit = ltrim(rtrim(H.RetVal))
                          ,CrossStr = ltrim(rtrim(I.RetVal))
                          ,Comments = ltrim(rtrim(J.RetVal))
                     From (values (replace(replace(@S,char(10),''),char(13),' ')) )A(S)
                     Outer Apply [dbo].[tvf-Str-Extract](S,'INCIDENT'       ,'LONG 
 TERM'  ) B
                     Outer Apply [dbo].[tvf-Str-Extract](S,'LONG TERM SYS'   
 ,'REPORTED'   ) C
                     Outer Apply [dbo].[tvf-Str-Extract](S,'REPORTED:'        ,'Nature'     
 ) D
                     Outer Apply [dbo].[tvf-Str-Extract](S,'Nature:'          
  ,'Address'    ) E
                     Outer Apply [dbo].[tvf-Str-Extract](S,'Address:'         ,'City'       
  ) F
                     Outer Apply [dbo].[tvf-Str-Extract](S,'City:'            
  ,'Responding ') G
                     Outer Apply [dbo].[tvf-Str-Extract](S,'Responding Units:','Cross'      
   ) H
                     Outer Apply [dbo].[tvf-Str-Extract](S,'Cross Streets:'   ,'Lat'       
  ) I
                     Outer Apply [dbo].[tvf-Str-Extract](S+'|||','Comments:'  ,'|||'        
   ) J
                    </cfquery>
                    <cfoutput>
                    B. #body.Incident#<br>
                    C. #body.LongTerm#<br>
                    D. #body.Reported#<br>



回答2:


SQL tends to have limited string functions, so it isn't the best tool for parsing. If the email content is always in that exact format, you could use either plain string functions or regular expressions to parse it. However, the latter is more flexible.

I suspect the content actually does contain new lines, which would make for simpler parsing. However, if you prefer searching for content in between two labels, regular expressions would do the trick.

Build an array of the label names (only). Loop through the array, grabbing a pair of labels: "current" and "next". Use the two values in a regular expression to extract the text in between them:

label &"\s*[##:=](.*?)"& nextLabel

/* Explanation: */
label        - First label name (example: "Incident")
\s*          - Zero or more spaces 
[##:=]       - Any of these characters: pound sign, colon or equal sign 
(.*?)        - Group of zero or more characters (non-greedy) 
nextLabel    - Next label (example: "Long Term Sys")

Use reFindNoCase() to get details about the position and length of matched text. Then use those values in conjunction with mid() to extract the text.

Note, newer versions like ColdFusion 2016+ automagically extract the text under a key named MATCH

The newer CF2016+ syntax is slicker, but something along these lines works under CF10:

emailBody = "INCIDENT # 12345 ... etc.... ";
labelArray = ["Incident", "Long Term Sys", "Reported", ..., "Comments" ];

for (pos = 1; pos <= arrayLen(labelArray); pos++) {

    // get current and next label
    hasNext   = pos < arrayLen(labelArray);
    currLabel = labelArray[ pos ];
    nextLabel = (hasNext ? labelArray[ pos+1 ] : "$");

    // extract label and value
    matches   = reFindNoCase( currLabel &"\s*[##:=](.*?)"& nextLabel, emailBody, 1, true);
    if (arrayLen(matches.len) >= 2) {
        results[ currLabel ] = mid( emailBody, matches.pos[2], matches.len[2]);
    }   
}

writeDump( results );

Results:



来源:https://stackoverflow.com/questions/56277400/coldfusion-how-to-parse-and-segment-out-data-from-an-email-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!