Removing Any Non-Specified Characters from an Excel Spreadsheet using a Macro

允我心安 提交于 2019-12-12 18:35:14

问题


I'm trying to clean up a .CSV file in Excel by getting rid of any non-standard characters. The only characters I care about keeping are A-Z, 0-9, and a few standard punctuation marks. Any other characters, I'd like to delete.

I've gotten the following macro to delete an entire row when it finds a cell which contains any characters I haven't specified, but I'm not sure how to get it to actually delete the character itself.

Sub Replace()
Dim sCharOK As String, s As String
Dim r As Range, rc As Range
Dim j As Long

sCharOK = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789, `~!@#$%^&*()_+-=[]\{}|;':"",./<>?™®"

Set r = Worksheets("features").UsedRange.SpecialCells(xlCellTypeConstants, xlTextValues)

' loop through all the cells with text constant values and deletes the rows with characters not in sCharOK
For Each rc In r
    s = rc.Value
    For j = 1 To Len(s)
        If InStr(sCharOK, Mid(s, j, 1)) = 0 Then
            rc.EntireRow.Delete
            Exit For
        End If
    Next j
Next rc

End Sub

I assume there's a fairly simple way to adapt this code to that function, but I'm not familiar enough with VBA to really know how to go about doing that. Any insights are greatly appreciated!


回答1:


Another way would be Range.Replace like:

Sub test()
  Dim sCharOK As String
  sCharOK = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789, `~!@#$%^&*()_+-=[]\{}|;':"",./<>?™®" & Chr(1)
  Dim i As Long
  For i = 0 To 255
    If InStr(sCharOK, Chr(i)) = 0 Then
      ActiveSheet.Cells.Replace What:=Chr(i), Replacement:="", LookAt:=xlPart, MatchCase:=True, SearchFormat:=False, ReplaceFormat:=False
    End If
  Next
End Sub

EDIT

looking at @ryguy72 answer also offers another way if only non-printable characters need to be deleted (at the question something like µ²äöüßÉõ will be deleted but this code will not) also assuming that there are no formulas:

Sub test()
  With ActiveSheet.UsedRange
    .Value = Evaluate("TRIM(CLEAN(" & .Address & "))")
  End With
End Sub

Or directly run in Immediate window this one-liner:

ActiveSheet.UsedRange.Value = Evaluate("TRIM(CLEAN(" & ActiveSheet.UsedRange.Address & "))")



回答2:


You could also use regular expressions, thereby avoiding needing to examine each character in a loop. (Although the regex engine has to do that).

The Regex pattern, explained below, contains your list of characters, and the character class used says match everything that is not listed.

If speed becomes an issue, you can use vba arrays to speed things up.

Option Explicit
Sub ReplaceNonStdChars()
    Const sPat As String = "[^\x20-\x7E\x99\xAE]"
    Dim RE As Object
    Dim R As Range, C As Range

Set R = Worksheets("features").UsedRange.SpecialCells(xlCellTypeConstants, xlTextValues)

Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .Pattern = sPat
    For Each C In R
        C.Value = .Replace(C.Text, "")
    Next C
End With
End Sub

Explanation of Regex Pattern

[^\x20-\x7E\x99\xAE]

[^\x20-\x7E\x99\xAE]
  • Match any single character NOT present in the list below [^\x20-\x7E\x99\xAE]
    • A character in the range between these two characters \x20-\x7E
      • The character “ ” which occupies position 0x20 (32 decimal) in the character set \x20
      • The character “~” which occupies position 0x7E (126 decimal) in the character set \x7E
    • The character with position 0x99 (153 decimal) in the character set \x99
    • The character with position 0xAE (174 decimal) in the character set \xAE

Created with RegexBuddy




回答3:


If it were me, I would use a replace command on the original string every time I find an invalid char, changing that invalid char to null. Then replace the original cell value with the modified string. Something like this...

One possible way (tested)

Sub RemoveInvalidCharacters()
Dim sCharOK As String, s As String
Dim r As Range, rc As Range
Dim j As Long
Dim badchar As Boolean

sCharOK = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789, `~!@#$%^&*()_+-=[]\{}|;':"",./<>?™®"

Set r = Worksheets("features").UsedRange.SpecialCells(xlCellTypeConstants, xlTextValues)

' loop through all the cells with text constant values and
' deletes the invalid characters not in sCharOK from each Value property
For Each rc In r
    badchar = False
    s = rc.Value
    For j = 1 To Len(s)
        If InStr(sCharOK, Mid(s, j, 1)) = 0 Then
            badchar = True
            s = Replace(s, Mid(s, j, 1), "")
        End If
    Next j
    If badchar Then
        rc.Value = s
    End If
Next rc

End Sub



回答4:


I just had to do this today, literally. The script below worked perfect fine for me.

Sub Clean_and_Trim_Cells()
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    Dim s As String
    For Each c In ActiveSheet.UsedRange
        s = c.Value
        If Trim(Application.Clean(s)) <> s Then
            s = Trim(Application.Clean(s))
            c.Value = s
        End If
    Next
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
End Sub


来源:https://stackoverflow.com/questions/45602265/removing-any-non-specified-characters-from-an-excel-spreadsheet-using-a-macro

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!