PowerShell - Find and replace multiple patterns to anonymize file

↘锁芯ラ 提交于 2021-01-29 04:45:56

问题


I need you help. I have a log.txt file with various data in it which I have to anonymize. I would like to retrieve all these "strings" matching a predefined patterns, and replace these by another values for each of them. What is important is that each new string from the same pattern (and with different value from the previous) should be replaced by the predefined value increased by +1 (e.g. "orderID = 123ABC" becomes "orderID = order1" and "orderID=456ABC" becomes "orderID=order2").
The patterns to search for are more than 20 so it is not possible to put them all in single line. My idea is:

  1. Define "patterns.txt" file
  2. Define "replace.txt" file ("pattern" value and replacement value)
  3. Search for all "patterns" in the log file, the result will be ARRAY
  4. Find the unique entries in that ARRAY
  5. Get the "replacement" value for each unique entry in the ARRAY
  6. Replace all occurrences in log.txt. The tricky part here is that any occurrence of the same type (but different value from the previous one) needs to be incremented by (+1) in order to be different from the one before.

Example of what I have :

requestID>qwerty1-qwerty2-qwerty3</requestID
requestID>12345a-12345b-12345c</requestID
requestID>qwerty1-qwerty2-qwerty3</requestID
requestID>qwerty1-qwerty2-qwerty3</requestID
orderID>012345ABCDE</orderID
orderID>012345ABCDE</orderID
orderID>ABCDE012345</orderID
orderID>ABCDE012345</orderID
keyId>XYZ123</keyId
keyId>ABC987</keyId
keyId>XYZ123</keyId

Desired result:

requestID>Request-1</requestID
requestID>Request-2</requestID
requestID>Request-1</requestID
requestID>Request-1</requestID
orderID>Order-1</orderID
orderID>Order-1</orderID
orderID>Order-2</orderID
orderID>Order-2</orderID
keyId>Key-1</keyId
keyId>Key-2</keyId
keyId>Key-1</keyId

For the moment I managed only to find the unique values per type:

$N = "C:\FindAndReplace\input.txt"
$Patterns = "C:\FindAndReplace\pattern.txt"
(Select-String $N -Pattern 'requestID>\w{6}-\w{6}-\w{6}</requestID>').Matches.Value | Sort-Object -Descending -Unique
(Select-String $N -Pattern '<orderID>\w{20}</orderID>').Matches.Value | Sort-Object -Descending -Unique
(Select-String $N -Pattern '<keyId>\w{8}</keyId>').Matches.Value | Sort-Object -Descending -Unique

Thanks in advance for any suggestion on how to progress.


回答1:


Your patterns don't match your sample data. I've corrected the patterns to accommodate the actual sample data.

It seems a simple hash table per type would fulfill the need to keep track of matches and counts. If we process the log file with a switch statement using the -Regex and -File parameters we can work on each line at a time. The logic for each is

  • Check if the current match exists in the specific type's match array.
    • If not, add it with it's replacement value (type-count) and increment count.
    • If it does exist, use the already defined replacement value.
  • Capture all the output in a variable and then write it out to file when done.

Create the example log file

$log = New-TemporaryFile

@'
<requestID>qwerty1-qwerty2-qwerty3</requestID> -match 
<requestID>12345a-12345b-12345c</requestID>
<requestID>qwerty1-qwerty2-qwerty3</requestID>
<requestID>qwerty1-qwerty2-qwerty3</requestID>
<orderID>012345ABCDE</orderID>
<orderID>012345ABCDE</orderID>
<orderID>ABCDE012345</orderID>
<orderID>ABCDE012345</orderID>
<keyId>XYZ123</keyId>
<keyId>ABC987</keyId>
<keyId>XYZ123</keyId>
'@ | Set-Content $log -Encoding UTF8

Define "tracker" variables for each type containing the count and a matches array

$Request = @{
    Count   = 1
    Matches = @()
}
$Order = @{
    Count   = 1
    Matches = @()
}
$Key = @{
    Count   = 1
    Matches = @()
}

Read and process the log file line by line

$output = switch -Regex -File $log {
    '<requestID>(\w{6,7}-\w{6,7}-\w{6,7})</requestID>' {
        if(!$Request.matches.($matches.1))
        {
            $Request.matches += @{$matches.1 = "Request-$($Request.count)"}
            $Request.count++
        }
        $_ -replace $matches.1,$Request.matches.($matches.1)
    }
    '<orderID>(\w{11})</orderID>' {
        if(!$Order.matches.($matches.1))
        {
            $Order.matches += @{$matches.1 = "Order-$($Order.count)"}
            $Order.count++
        }
        $_ -replace $matches.1,$Order.matches.($matches.1)
    }
    '<keyId>(\w{6})</keyId>' {
        if(!$Key.matches.($matches.1))
        {
            $Key.matches += @{$matches.1 = "Key-$($Key.count)"}
            $Key.count++
        }
        $_ -replace $matches.1,$Key.matches.($matches.1)
    }
    default {$_}
}

$output | Set-Content $log -Encoding UTF8

The $log file now contains

<requestID>Request-1</requestID>
<requestID>Request-2</requestID>
<requestID>Request-1</requestID>
<requestID>Request-1</requestID>
<orderID>Order-1</orderID>
<orderID>Order-1</orderID>
<orderID>Order-2</orderID>
<orderID>Order-2</orderID>
<keyId>Key-1</keyId>
<keyId>Key-2</keyId>
<keyId>Key-1</keyId>


来源:https://stackoverflow.com/questions/64740791/powershell-find-and-replace-multiple-patterns-to-anonymize-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!