问题
I need to write a MS Word macro to count occurence of every word within a given document and print out the list like . I did the macro and it works, but it is so sloooow, it takes several hours to get results for a document of 60000 words. Could you please give me some advices/suggestions on how to make the macro run faster?
(I checked a similar question here WORD VBA Count Word Occurrences but still don't get it how to speed up and need my macro to be reviewed). Thank you.
Private Type WordStatData
WordText As String
WordCount As Integer
End Type
Option Base 1
'Check if the word is valid
Private Function IsValidWord(SomeString As String) As Boolean
Dim Retval As Boolean
Retval = True
If Not (InStr(SomeString, " ") = 0) Then Retval = False
If Not (InStr(SomeString, ".") = 0) Then Retval = False
If Not (InStr(SomeString, ",") = 0) Then Retval = False
If Not InStr(SomeString, "0") = 0 Then Retval = False
If Not InStr(SomeString, "1") = 0 Then Retval = False
If Not InStr(SomeString, "2") = 0 Then Retval = False
If Not InStr(SomeString, "3") = 0 Then Retval = False
If Not InStr(SomeString, "4") = 0 Then Retval = False
If Not InStr(SomeString, "5") = 0 Then Retval = False
If Not InStr(SomeString, "6") = 0 Then Retval = False
If Not InStr(SomeString, "7") = 0 Then Retval = False
If Not InStr(SomeString, "8") = 0 Then Retval = False
If Not InStr(SomeString, "9") = 0 Then Retval = False
IsValidWord = Retval
End Function
Private Sub CommandButton1_Click()
SpanishLCID = 3082 'The source text is in Spanish
ListBox1.Clear
Dim WordsTotal As Long
WordsTotal = ActiveDocument.Words.Count
TextBox1.Text = Str(WordsTotal)
Dim Wordfound As Boolean
Dim NewWord As String
Dim MyData() As WordStatData
ReDim Preserve MyData(1)
NewWord = ""
For i = 1 To WordsTotal
NewWord = Trim(StrConv(Trim(ActiveDocument.Words(i)), vbLowerCase, SpanishLCID))
'Check if the word is in the list
If IsValidWord(NewWord) Then
Wordfound = False
For j = 1 To UBound(MyData)
If StrComp(MyData(j).WordText, NewWord) = 0 Then
Wordfound = True: Exit For
End If
Next j
If Wordfound Then
MyData(j).WordCount = MyData(j).WordCount + 1
Else
ReDim Preserve MyData(UBound(MyData) + 1)
MyData(UBound(MyData)).WordText = NewWord
MyData(UBound(MyData)).WordCount = 1
End If
End If
Next i
'Printing out the word list
For i = 1 To UBound(MyData)
ListBox1.AddItem (MyData(i).WordText & "=" & Str(MyData(i).WordCount))
Next i
End Sub
回答1:
Add a reference to the Microsoft Scripting Runtime (Tools -> References...). Then use the following:
Private Sub CommandButton1_Click()
Const SpanishLCID = 3082
Dim dict As New Scripting.Dictionary, word As Variant, fixedWord As String
Dim key As Variant
dict.CompareMode = SpanishLCID
For Each word In ActiveDocument.Words
fixedWord = Trim(StrConv(Trim(word), vbLowerCase, SpanishLCID))
If Not dict.Exists(fixedWord) Then
dict(fixedWord) = 1
Else
dict(fixedWord) = dict(fixedWord) + 1
End If
Next
ListBox1.Clear
For Each key In dict.Keys
ListBox1.AddItem key & "=" & dict(key)
Next
End Sub
NB. Word treats each punctuation symbol or paragraph as a new word. It may be advisable to specify another Dictionary or Collection with the strings that shouldn't be added to the dictionary, and test for those strings using .Exists
before adding to the dictionary.
A more concise version of IsValidWord
without regular expressions:
Function IsValidWord(s As String) As Boolean
Const validChars As String = "abcdefghijklmnopqrstuvwxyz"
Dim i As Integer, char As String * 1
For i = 1 To Len(s)
char = Mid(s, i, 1)
If InStr(1, validChars, char, vbTextCompare) = 0 Then Exit Function
Next
IsValidWord = True
End Function
and using regular expressions (add a reference to Microsoft VBScript Regular Expressions 5.5):
Dim regex As RegExp
Function IsValidWord2(s As String) As Boolean
If regex Is Nothing Then
Set regex = New RegExp
regex.Pattern = "[^a-z]"
regex.IgnoreCase = True
End If
IsValidWord2 = Not regex.Test(s)
End Function
and using regular expressions with replacement:
Function GetValidWord(s As String) As String
'GetValidWord("Introduction.......3") will return "Introduction"
If regex2 Is Nothing Then
Set regex2 = New RegExp
regex2.Pattern = "[^a-z]"
regex2.Global = True
regex2.IgnoreCase = True
End If
GetValidWord = regex2.Replace(s, "")
End Function
and you would use it as follows:
For Each word In ActiveDocument.Words
fixedWord = Trim(StrConv(Trim(word), vbLowerCase, SpanishLCID))
fixedWord = GetValidWord(fixedWord)
If Not dict.Exists(fixedWord) Then
NB: You might combine the language conversion and Trim
into GetValidWord
.
来源:https://stackoverflow.com/questions/33637862/word-occurences-in-vba-how-to-speed-up