Extract pattern from column

不想你离开。 提交于 2019-12-13 01:17:27

问题


I am struggling with a huge Excel sheet (with 200K rows), where I need to extract from a certain column (B) list of all email addresses present in the string.

What I want to achieve:

  1. Extract the email from string
  2. convert (at) to @ and (dot) to .
  3. Save name and email in separate columns

Example of column B:

Shubhomoy Biswas <biswas_shubhomoy777(at)yahoo(dot)com>
Puneet Arora <ar.puneetarora(at)gmail(dot)com>
Anand Upadhyay <001.anand(at)gmail(dot)com>
Rajat Gupta <rajatgupta0889(at)gmail(dot)com>
Sarvesh Sonawane <sarvesh.s(at)suruninfocoresystems.

Although I want to be able to do it on Excel any other Windows-based utility suggestion would be helpful.


回答1:


this can be done assuming they are all in the same format and only 1 email add per cell

=SUBSTITUTE(SUBSTITUTE(MID(B1,FIND("<",B1)+1,LEN(B1)-FIND("<",B1)-1),"(at)","@"),"(dot)",".")




回答2:


Give this a try:

Sub splitter()
   Dim r As Range, v As String

   For Each r In Intersect(Range("B:B"), ActiveSheet.UsedRange)
      v = r.Text
      If v <> "" Then
         ary = Split(v, " <")
         r.Offset(0, 1).Value = ary(0)
         r.Offset(0, 2).Value = Replace(Replace(Replace(ary(1), ">", ""), "(at)", "@"), "(dot)", ".")
      End If
   Next r
End Sub

This sub uses columns C and D for the output. Modify the code to suite your needs.




回答3:


To extract the name, try =TRIM(LEFT(B1,FIND("<",B1)-1)). user3005775's answer works for the email.




回答4:


You can also do this easily a regular expression (you'll need to add a reference to Microsoft VBScript Regular Expressions):

Private Sub ExtractEmailInfo(value As String)

    Dim expr As New RegExp
    Dim result As Object
    Dim user As String
    Dim addr As String

    expr.Pattern = "(.+)(<.+>)"
    Set result = expr.Execute(value)
    If result.Count > 0 Then
        user = result(0).SubMatches(0)
        addr = result(0).SubMatches(1)
        'Strip the < and >
        addr = Mid$(addr, 2, Len(addr) - 2)
        addr = Replace$(addr, "(at)", "@")
        addr = Replace$(addr, "(dot)", ".")
    End If

    Debug.Print user
    Debug.Print addr

End Sub

Replace the Debug.Print calls with whatever you need to do to place them in cells.




回答5:


This does it for 200 K rows in less than 15 seconds:

Option Explicit

Sub extractPattern()
    Dim ws As Worksheet, ur As Range, rng As Range, t As Double
    Dim fr As Long, fc As Long, lr As Long, lc As Long

    Set ws = Application.ThisWorkbook.Worksheets("Sheet1")
    Set ur = ws.UsedRange
    fr = 1
    fc = 1
    lr = ws.Cells(ur.Row + ur.Rows.Count + 1, fc).End(xlUp).Row
    lc = ws.Cells(fr, ur.Column + ur.Columns.Count + 1).End(xlToLeft).Column

    Set rng = ws.Range(ws.Cells(fr, fc), ws.Cells(lr, fc))

    enableXL False
    t = Timer
    rng.TextToColumns Destination:=ws.Cells(fr, lc + 1), DataType:=xlDelimited, _
                      TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, _
                      Space:=True
    With ws.Columns(lc + 3)
        .Replace What:="(at)", Replacement:="@", LookAt:=xlPart
        .Replace What:="(dot)", Replacement:=".", LookAt:=xlPart
        .Replace What:="<", Replacement:=vbNullString, LookAt:=xlPart
        .Replace What:=">", Replacement:=vbNullString, LookAt:=xlPart
    End With
    ws.Range(ws.Cells(fr, lc + 1), ws.Cells(fr, lc + 3)).EntireColumn.AutoFit
    Debug.Print "Total rows: " & lr & ", Duration: " & Timer - t & " seconds"
    enableXL    'Total rows: 200,000, Duration: 14.4296875 seconds
End Sub

Private Sub enableXL(Optional ByVal opt As Boolean = True)
    Application.ScreenUpdating = opt
    Application.EnableEvents = opt
    Application.Calculation = IIf(opt, xlCalculationAutomatic, xlCalculationManual)
End Sub

It places the new data in the first unused column at the end (splits the names as well)



来源:https://stackoverflow.com/questions/32680447/extract-pattern-from-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!