问题
I would like to extract a combination of text and numbers from a larger string located within a column within excel.
The constants I have to work with is that each Text string will
•either start with a A, C or S, and •will always be 7 Characters long •the position of he string I would like to extract varies
The code I have been using which has been working efficiently is;
Public Function Xtractor(r As Range) As String
Dim a, ary
ary = Split(r.Text, " ")
For Each a In ary
If Len(a) = 7 And a Like "[SAC]*" Then
Xtractor = a
Exit Function
End If
Next a
Xtractor = ""
End Function
However today I have learnt that sometimes my data may include scenarios like this;

What I would like is to adapt my code so If the 8th character is "Underscore" and the 1st character of the 7 characters is either S, A or C please extract up until the "Underscore"
Secondly I would like to exclude commons words like "Support" & "Collect" from being extracted.
Finally the 7th letter should be a number
Any ideas around this would be much appreciated.
Thanks
回答1:
try this
ary = Split(Replace(r.Text, "_", " "))
or
ary = Split(Replace(r.Text, "_", " ")," ")
result will be same for both variants
test

update
Do you know how I could leave the result blank if the 7th character returned a letter?
Public Function Xtractor(r As Range) As String
Dim a, ary
ary = Split(Replace(r.Text, "_", " "))
For Each a In ary
If Len(a) = 7 And a Like "[SAC]*" And IsNumeric(Mid(a, 7, 1)) Then
Xtractor = a
Exit Function
End If
Next a
Xtractor = ""
End Function
test

回答2:
Add Microsoft VBScript Regular Expressions 5.5
to project references. Use the following code to test matching and extracting with Xtractor:
Public Function Xtractor(ByVal p_val As String) As String
Xtractor = ""
Dim ary As String, v_re As New VBScript_RegExp_55.RegExp, Matches
v_re.Pattern = "^([SAC][^_]{1,6})_?"
Set Matches = v_re.Execute(p_val)
If Matches.Count > 0 Then Xtractor = Matches(0).SubMatches(0) Else Xtractor = ""
End Function
Sub test_Xtractor(p_cur As Range, p_val As String, p_expected As String)
Dim v_cur As Range, v_res As Range
p_cur.Value = p_val
Set v_cur = p_cur.Offset(columnOffset:=1)
v_cur.FormulaR1C1 = "='" & ThisWorkbook.Name & "'!Xtractor(RC[-1])"
Set v_res = v_cur.Offset(columnOffset:=1)
v_res.FormulaR1C1 = "=RC[-1]=""" & p_expected & """"
Debug.Print p_val; "->"; v_cur.Value; ":"; v_res.Value
End Sub
Sub test()
test_Xtractor ActiveCell, "A612002_MDC_308", "A612002"
test_Xtractor ActiveCell.Offset(1), "B612002_MDC_308", ""
test_Xtractor ActiveCell.Offset(2), "SUTP038_MDC_3", "SUTP038"
test_Xtractor ActiveCell.Offset(3), "KUTP038_MDC_3", ""
End Sub
Choose the workbook and cell for writing test fixture, then run test
from the VBA Editor.
Output in the Immediate window (Ctrl+G):
A612002_MDC_308->A612002:True
B612002_MDC_308->:True
SUTP038_MDC_3->SUTP038:True
KUTP038_MDC_3->:True
UPD
Isit possible to ammend this code so if the 7th character is a letter to return blank?
Replace line with assign to v_re
by the following:
v_re.Pattern = "^([SAC](?![^_]{5}[A-Z]_?)[^_]{1,6})_?"
v_re.IgnoreCase = True
And add to the test
suite:
test_Xtractor ActiveCell.Offset(4), "SUTP03A_MDC_3", ""
Output:
A612002_MDC_308->A612002:True
B612002_MDC_308->:True
SUTP038_MDC_3->SUTP038:True
KUTP038_MDC_3->:True
SUTP03A_MDC_3->:True
I inserted negative lookahead subrule (?![^_]{5}[A-Z]_?)
to reject SUTP03A_MDC_3
. But pay attention: the rejecting rule is applied exactly to the 7th character. Now v_re.IgnoreCase
set to True
, but if only capitalized characters are allowed, set it to False
. See also Regular Expression Syntax on MSDN.
来源:https://stackoverflow.com/questions/30758254/slight-adaptation-of-a-user-defined-function