how to write a vba code to remove and replace UTF8-Characters

狂风中的少年 提交于 2020-01-03 17:18:04

问题


I have this code and I still can't seem to replace non English characters like Vietnamese or Thai from my data with a simple "placeholder".

Sub NonLatin()
Dim cell As Range
    For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp))
        s = cell.Value
            For i = 1 To Len(s)
                If Mid(s, i, 1) Like "[!A-Za-z0-9@#$%^&* * ]" Then cell.Value = "placeholder"
            Next
    Next
End Sub

Appreciate your help


回答1:


You can replace any chars that are out of e. g. ASCII range (first 128 chars) with placeholder using the below code:

Option Explicit

Sub Test()

    Dim oCell As Range

    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = "[^u0000-u00F7]"
        For Each oCell In [A1:C4]
            oCell.Value = .Replace(oCell.Value, "*")
        Next
    End With

End Sub



回答2:


See this question for details about using Regular Expressions in your VBA code.


Then use regular expressions in a function like this one to process strings. Here I am assuming you want to replace each invalid character with a placeholder, rather than the entire string. If it's the entire string then you don't need to do individual character checks, you can simply use the + or * qualifiers for multiple characters in your Regular Expression's pattern, and test the entire string together.

Function LatinString(str As String) As String
    ' After including a reference to "Microsoft VBScript Regular Expressions 5.5"
    ' Set up the regular expressions object
    Dim regEx As New RegExp
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        ' This is the pattern of ALLOWED characters. 
        ' Note that special characters should be escaped using a slash e.g. \$ not $
        .Pattern = "[A-Za-z0-9]"
    End With

    ' Loop through characters in string. Replace disallowed characters with "?"
    Dim i As Long
    For i = 1 To Len(str)
        If Not regEx.Test(Mid(str, i, 1)) Then
            str = Left(str, i - 1) & "?" & Mid(str, i + 1)
        End If
    Next i
    ' Return output
    LatinString = str
End Function

You can use this in your code by

Dim cell As Range
For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp))
    cell.Value = LatinString(cell.Value)
Next

For a byte-level method which converts a Unicode string to a UTF8 string, without using Regular Expressions, check out this article



来源:https://stackoverflow.com/questions/45544464/how-to-write-a-vba-code-to-remove-and-replace-utf8-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!