Highlight (not delete) repeat sentences or phrases

前端 未结 3 846
故里飘歌
故里飘歌 2020-12-28 10:03

I am getting the impression that this is not possible in word but I figure if you are looking for any 3-4 words that come in the same sequence anywhere in a very long paper

3条回答
  •  滥情空心
    2020-12-28 10:46

    I did not use my own DAWG suggestion, and I am still interested in seeing if someone else has a way to do this, but I was able to come up with this:

    Option Explicit
    
    Sub test()
    Dim ABC As Scripting.Dictionary
    Dim v As Range
    Dim n As Integer
        n = 5
        Set ABC = FindRepeatingWordChains(n, ActiveDocument)
        ' This is a dictionary of word ranges (not the same as an Excel range) that contains the listing of each word chain/phrase of length n (5 from the above example).
        ' Loop through this collection to make your selections/highlights/whatever you want to do.
        If Not ABC Is Nothing Then
            For Each v In ABC
                v.Font.Color = wdColorRed
            Next v
        End If
    End Sub
    
    ' This is where the real code begins.
    Function FindRepeatingWordChains(ChainLenth As Integer, DocToCheck As Document) As Scripting.Dictionary
    Dim DictWords As New Scripting.Dictionary, DictMatches As New Scripting.Dictionary
    Dim sChain As String
    Dim CurWord As Range
    Dim MatchCount As Integer
    Dim i As Integer
    
        MatchCount = 0
    
        For Each CurWord In DocToCheck.Words
            ' Make sure there are enough remaining words in our document to handle a chain of the length specified.
            If Not CurWord.Next(wdWord, ChainLenth - 1) Is Nothing Then
                ' Check for non-printing characters in the first/last word of the chain.
                ' This code will read a vbCr, etc. as a word, which is probably not desired.
                ' However, this check does not exclude these 'words' inside the chain, but it can be modified.
                If CurWord <> vbCr And CurWord <> vbNewLine And CurWord <> vbCrLf And CurWord <> vbLf And CurWord <> vbTab And _
                    CurWord.Next(wdWord, ChainLenth - 1) <> vbCr And CurWord.Next(wdWord, ChainLenth - 1) <> vbNewLine And _
                    CurWord.Next(wdWord, ChainLenth - 1) <> vbCrLf And CurWord.Next(wdWord, ChainLenth - 1) <> vbLf And _
                    CurWord.Next(wdWord, ChainLenth - 1) <> vbTab Then
                    sChain = CurWord
                    For i = 1 To ChainLenth - 1
                        ' Add each word from the current word through the next ChainLength # of words to a temporary string.
                        sChain = sChain & " " & CurWord.Next(wdWord, i)
                    Next i
    
                    ' If we already have our temporary string stored in the dictionary, then we have a match, assign the word range to the returned dictionary.
                    ' If not, then add it to the dictionary and increment our index.
                    If DictWords.Exists(sChain) Then
                        MatchCount = MatchCount + 1
                        DictMatches.Add DocToCheck.Range(CurWord.Start, CurWord.Next(wdWord, ChainLenth - 1).End), MatchCount
                    Else
                        DictWords.Add sChain, sChain
                    End If
                End If
            End If
        Next CurWord
    
        ' If we found any matching results, then return that list, otherwise return nothing (to be caught by the calling function).
        If DictMatches.Count > 0 Then
            Set FindRepeatingWordChains = DictMatches
        Else
            Set FindRepeatingWordChains = Nothing
        End If
    
    End Function
    

    I have tested this on a 258 page document (TheStory.txt) from this source, and it ran in just a few minutes.

    See the test() sub for usage.

    You will need to reference the Microsoft Scripting Runtime to use the Scripting.Dictionary objects. If that is undesirable, small modifications can be made to use Collections instead, but I prefer the Dictionary as it has the useful .Exists() method.

提交回复
热议问题