问题
I'm trying to clean up a .CSV file in Excel by getting rid of any non-standard characters. The only characters I care about keeping are A-Z, 0-9, and a few standard punctuation marks. Any other characters, I'd like to delete.
I've gotten the following macro to delete an entire row when it finds a cell which contains any characters I haven't specified, but I'm not sure how to get it to actually delete the character itself.
Sub Replace()
Dim sCharOK As String, s As String
Dim r As Range, rc As Range
Dim j As Long
sCharOK = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789, `~!@#$%^&*()_+-=[]\{}|;':"",./<>?™®"
Set r = Worksheets("features").UsedRange.SpecialCells(xlCellTypeConstants, xlTextValues)
' loop through all the cells with text constant values and deletes the rows with characters not in sCharOK
For Each rc In r
s = rc.Value
For j = 1 To Len(s)
If InStr(sCharOK, Mid(s, j, 1)) = 0 Then
rc.EntireRow.Delete
Exit For
End If
Next j
Next rc
End Sub
I assume there's a fairly simple way to adapt this code to that function, but I'm not familiar enough with VBA to really know how to go about doing that. Any insights are greatly appreciated!
回答1:
Another way would be Range.Replace like:
Sub test()
Dim sCharOK As String
sCharOK = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789, `~!@#$%^&*()_+-=[]\{}|;':"",./<>?™®" & Chr(1)
Dim i As Long
For i = 0 To 255
If InStr(sCharOK, Chr(i)) = 0 Then
ActiveSheet.Cells.Replace What:=Chr(i), Replacement:="", LookAt:=xlPart, MatchCase:=True, SearchFormat:=False, ReplaceFormat:=False
End If
Next
End Sub
EDIT
looking at @ryguy72 answer also offers another way if only non-printable characters need to be deleted (at the question something like µ²äöüßÉõ will be deleted but this code will not) also assuming that there are no formulas:
Sub test()
With ActiveSheet.UsedRange
.Value = Evaluate("TRIM(CLEAN(" & .Address & "))")
End With
End Sub
Or directly run in Immediate window this one-liner:
ActiveSheet.UsedRange.Value = Evaluate("TRIM(CLEAN(" & ActiveSheet.UsedRange.Address & "))")
回答2:
You could also use regular expressions, thereby avoiding needing to examine each character in a loop. (Although the regex engine has to do that).
The Regex pattern, explained below, contains your list of characters, and the character class used says match everything that is not listed.
If speed becomes an issue, you can use vba arrays to speed things up.
Option Explicit
Sub ReplaceNonStdChars()
Const sPat As String = "[^\x20-\x7E\x99\xAE]"
Dim RE As Object
Dim R As Range, C As Range
Set R = Worksheets("features").UsedRange.SpecialCells(xlCellTypeConstants, xlTextValues)
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = sPat
For Each C In R
C.Value = .Replace(C.Text, "")
Next C
End With
End Sub
Explanation of Regex Pattern
[^\x20-\x7E\x99\xAE]
[^\x20-\x7E\x99\xAE]
- Match any single character NOT present in the list below
[^\x20-\x7E\x99\xAE]- A character in the range between these two characters
\x20-\x7E- The character “ ” which occupies position 0x20 (32 decimal) in the character set
\x20 - The character “~” which occupies position 0x7E (126 decimal) in the character set
\x7E
- The character “ ” which occupies position 0x20 (32 decimal) in the character set
- The character with position 0x99 (153 decimal) in the character set
\x99 - The character with position 0xAE (174 decimal) in the character set
\xAE
- A character in the range between these two characters
Created with RegexBuddy
回答3:
If it were me, I would use a replace command on the original string every time I find an invalid char, changing that invalid char to null. Then replace the original cell value with the modified string. Something like this...
One possible way (tested)
Sub RemoveInvalidCharacters()
Dim sCharOK As String, s As String
Dim r As Range, rc As Range
Dim j As Long
Dim badchar As Boolean
sCharOK = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789, `~!@#$%^&*()_+-=[]\{}|;':"",./<>?™®"
Set r = Worksheets("features").UsedRange.SpecialCells(xlCellTypeConstants, xlTextValues)
' loop through all the cells with text constant values and
' deletes the invalid characters not in sCharOK from each Value property
For Each rc In r
badchar = False
s = rc.Value
For j = 1 To Len(s)
If InStr(sCharOK, Mid(s, j, 1)) = 0 Then
badchar = True
s = Replace(s, Mid(s, j, 1), "")
End If
Next j
If badchar Then
rc.Value = s
End If
Next rc
End Sub
回答4:
I just had to do this today, literally. The script below worked perfect fine for me.
Sub Clean_and_Trim_Cells()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim s As String
For Each c In ActiveSheet.UsedRange
s = c.Value
If Trim(Application.Clean(s)) <> s Then
s = Trim(Application.Clean(s))
c.Value = s
End If
Next
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
来源:https://stackoverflow.com/questions/45602265/removing-any-non-specified-characters-from-an-excel-spreadsheet-using-a-macro