I have 265 CSV files with over 4 million total records (lines), and need to do a search and replace in all the CSV files. I have a snippet of my PowerShell code below that does
I don't know Python, but it looks like you are doing literal string replacements in the Python script. In Powershell, the -replace
operator is a regular expression search/replace. I would convert the Powershell to using the replace method on the string class (or to answer the original question, I think your Powershell is inefficient).
ForEach ($file in Get-ChildItem C:\temp\csv\*.csv)
{
$content = Get-Content -path $file
# look close, not much changes
$content | foreach {$_.Replace($SearchStr, $ReplaceStr)} | Set-Content $file
}
EDIT Upon further review, I think I see another (perhaps more important) difference in the versions. The Python version appears to be reading the entire file into a single string. The Powershell version on the other hand is reading into an array of strings.
The help on Get-Content
mentions a ReadCount
parameter that can affect the performance. Setting this count to -1 seems to read the entire file into a single array. This will mean that you are passing an array through the pipeline instead of individual strings, but a simple change to the code will deal with that:
# $content is now an array
$content | % { $_ } | % {$_.Replace($SearchStr, $ReplaceStr)} | Set-Content $file
If you want to read the entire file into a single string like the Python version seems to, just call the .NET method directly:
# now you have to make sure to use a FULL RESOLVED PATH
$content = [System.IO.File]::ReadAllText($file.FullName)
$content.Replace($SearchStr, $ReplaceStr) | Set-Content $file
This is not quite as "Powershell-y" since you use the .NET APIs directly instead of the similar cmdlets, but they put the ability in there for times when you need it.