PowerShell - Find and replace multiple patterns on the same line and store the correspondences in separate file

ε祈祈猫儿з 提交于 2021-01-05 07:32:46

问题


I have a very large file with thousands of lines, some of the lines are very, very long with various of data. I need to find and replace multiple strings inside that file and several of the strings to be replaced can be on the same line. At the same time the replaced value should increment at each occurrence. In a separate file $tmp I need to keep only the "unique" pairs of "original" value and the corresponding "replaced" value in case of need to revert back the original values. With the great help of Doug Maurer I arrived to the script below which does most of the stuff but still I don't know how to replace the 2nd, 3rd,etc string on the same line and how to keep just the "unique" pairs. Any ideas?
Input:

<requestId>qwerty-qwer12-qwer56</requestId>something here.,. reportId>plmkjh8765FGH4rt6As</msg:reportId
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>qwerty-qwer12-qwer56</requestId>something else.,.reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId>

Desired output:

<requestId>RequestId-1</requestId>something here.,. reportId>Report-1</msg:reportId
<requestId>RequestId-2</requestId>
<requestId>RequestId-1</requestId>something else.,.reportId>Report-2</msg:reportId

Desired output for $tmp:

qwerty-qwer12-qwer56 : RequestId-1
plmkjh8765FGH4rt6As : Report-1
zxcvbn-zxcv12-zxcv56 : RequestId-2
poGd56Hnm9q3Dfer6Jh : Report-2
$tmp = ".\tmp.txt"
@'
Order: Q2we45-Uj87f6-gh65De
reportId>plmkjh8765FGH4rt6As</msg:reportId>
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId>kljsldjslddsdfdsdsdfff <messageId>1234qw-12qw12-123456</msg
<requestId>1234qw-12qw12-123456</requestId>something here.,. reportId>plmkjh8765FGH4rt6As</msg:reportId
<requestId>1234qw-12qw12-123456</requestId>something else.,.reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId> uraaa 123 <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> abcdef ole ole Order: zxcvbn-zxcv12-zxcv56 abracadabra <keyID>poU6Ghk89edfTG78Jk45GrRt23HzW4pl</msgdc
reportId>plmkjh8765FGH4rt6As</msg:reportId>
<requestId>1234qw-12qw12-12qw56</requestId>
keyId>Qwd84lPhjutf7Nmwr56hJndcsjy34imNQwd84lPhjutZ7Nmwr56hJndcsjy34imNPozDr5</
keyId>Qwd84lPhjutf7Nmwr56hJndcsjy34imNQwd84lPhjutZ7Nmwr56hJndcsjy34imNPozDr5</
keyId>Zdjgi76Gho3sQw0ib5Mjk3sDyoq9zmGdZdjgi76Gho3sQw0ib5Mjk3sDyoq9zmGdLkJpQw</
reportId>plmkjh8765FGH4rt6As</msg:reportId>
reportId>plmkjh8765FGH4rt6As</msg:reportId>
reportId>poGd56Hnm9q3Dfer6Jh</msg:reportId>
'@ | Set-Content $log -Encoding UTF8

$requestId = @{
    Count   = 1
    Matches = @()
}
$keyId  = @{
    Count   = 1
    Matches = @()
}
$reportId  = @{
    Count   = 1
    Matches = @()
}

$output = switch -Regex -File $log {
    '(\w{6}-\w{6}-\w{6})' {
        if(!$requestId.matches.($matches.1))
        {
            $req = $requestId.matches += @{$matches.1 = "RequestId-$($requestId.count)"}
            $requestId.count++
            $req.keys | %{ Add-Content $tmp "$_ : $($req.$_)" }
        }
        $_ -replace $matches.1,$requestId.matches.($matches.1)               
    }
    'keyId>(\w{70})</' {
        if(!$keyId.matches.($matches.1))
        {
            $kid = $keyId.matches += @{$matches.1 = "keyId-$($keyId.count)"} 
            $keyId.count++
            $kid.keys | %{ Add-Content $tmp "$_ : $($kid.$_)" }
        }
        $_ -replace $matches.1,$keyId.matches.($matches.1)        
    }
    'reportId>(\w{19})</msg:reportId>' {
        if(!$reportId.matches.($matches.1))
        {
            $repid = $reportId.matches += @{$matches.1 = "Report-$($reportId.count)"}
            $reportId.count++
            $repid.keys | %{ Add-Content $tmp "$_ : $($repid.$_)" }
        }
        $_ -replace $matches.1,$reportId.matches.($matches.1)
    } 
    default {$_}
}

$output | Set-Content $log -Encoding UTF8

回答1:


Since you may have different mix of data on each lines, I'd recommend this approach.

$requestId = @{
    Count   = 1
    Matches = @()
}
$keyId  = @{
    Count   = 1
    Matches = @()
}
$reportId  = @{
    Count   = 1
    Matches = @()
}

$text = Get-Content $log

$tmp = ".\tmp.txt"

$output = foreach($line in $text)
{
    if($line -match '<requestID>(\w{6}-\w{6}-\w{6})</requestID>')
    {
        if(!$requestId.matches.($matches.1))
        {
            $req = $requestId.matches += @{$matches.1 = "RequestId-$($requestId.count)"}
            $requestId.count++
            $req.keys | %{ Add-Content $tmp "$_ : $($req.$_)" }
        }
        $line = $line -replace $matches.1,$requestId.matches.($matches.1)
    }
    if($line -match 'reportId>(\w{19})</msg:reportId>')
    {
        if(!$reportId.matches.($matches.1))
        {
            $repid = $reportId.matches += @{$matches.1 = "Report-$($reportId.count)"}
            $reportId.count++
            $repid.keys | %{ Add-Content $tmp "$_ : $($repid.$_)" }
        }
        $line = $line -replace $matches.1,$reportId.matches.($matches.1)
    }
    if($line -match 'keyId>(\w{70})</')
    {
        if(!$keyId.matches.($matches.1))
        {
            $kid = $keyId.matches += @{$matches.1 = "keyId-$($keyId.count)"}
            $keyId.count++
            $kid.keys | %{ Add-Content $tmp "$_ : $($kid.$_)" }
        }
        $line = $line -replace $matches.1,$keyId.matches.($matches.1)
    }
    $line
}

$output | Set-Content $log -Encoding UTF8


来源:https://stackoverflow.com/questions/65117084/powershell-find-and-replace-multiple-patterns-on-the-same-line-and-store-the-c

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!