PowerShell is slow (much slower than Python) in large Search/Replace operation?

前端 未结 5 1267
长情又很酷
长情又很酷 2021-02-02 12:41

I have 265 CSV files with over 4 million total records (lines), and need to do a search and replace in all the CSV files. I have a snippet of my PowerShell code below that does

5条回答
  •  無奈伤痛
    2021-02-02 12:50

    Give this PowerShell script a try. It should perform much better. Much less use of RAM too as the file is read in a buffered stream.

    $reader = [IO.File]::OpenText("C:\input.csv")
    $writer = New-Object System.IO.StreamWriter("C:\output.csv")
    
    while ($reader.Peek() -ge 0) {
        $line = $reader.ReadLine()
        $line2 = $line -replace $SearchStr, $ReplaceStr
        $writer.writeline($line2)
    }
    
    $reader.Close()
    $writer.Close()
    

    This processes one file, but you can test performance with it and if its more acceptable add it to a loop.

    Alternatively you can use Get-Content to read a number of lines into memory, perform the replacement and then write the updated chunk utilizing the PowerShell pipeline.

    Get-Content "C:\input.csv" -ReadCount 512 | % {
        $_ -replace $SearchStr, $ReplaceStr
    } | Set-Content "C:\output.csv"
    

    To squeeze a little more performance you can also compile the regex (-replace uses regular expressions) like this:

    $re = New-Object Regex $SearchStr, 'Compiled'
    $re.Replace( $_ , $ReplaceStr )
    

提交回复
热议问题