问题
I currently have the following line of code.
(Get-Content 'file.txt') |
ForEach-Object {$_ -replace '"', ''} |
Set-Content 'file.txt'
This worked when testing, but now I am trying to use it on real data files (13 GB) and this process of using Get-Content is causing Powershell to consume a large amount of RAM and ultimately all of the available RAM on the machine.
Is there a better way that I can achieve the same result without the same amount of overhead?
Seems I am doing the opposite of best practice but not sure what else would be cleaner/ less RAM intensive than the above.
回答1:
Use a stream to read the file, then it won't put it all into memory, you can also use a stream to write the output. This should perform pretty well, and keep memory usage down:
$file = New-Object System.IO.StreamReader -Arg "c:\test\file.txt"
$outstream = [System.IO.StreamWriter] "c:\test\out.txt"
while ($line = $file.ReadLine()) {
$s = $line -replace '"', ''
$outstream.WriteLine($s)
}
$file.close()
$outstream.close()
回答2:
Your problem isn't caused by Get-Content
, but by the fact that you're running the statement in an expression (i.e. in parentheses). Running Get-Content
like that is a convenient way of allowing a pipeline to write data back to the same file. However, the downside of this approach is that the entire file is read into memory before the data is passed into the pipeline (otherwise the file would still be open for reading when Set-Content
tries to write data back to it).
For processing large files you must remove the parentheses and write the output to a temporary file that you rename afterwards.
Get-Content 'C:\path\to\file.txt' |
ForEach-Object {$_ -replace '"', ''} |
Set-Content 'C:\path\to\temp.txt'
Remove-Item 'C:\path\to\file.txt'
Rename-Item 'C:\path\to\temp.txt' 'file.txt'
Doing this avoids the memory exhaustion you observed. The processing can be sped up further by increasing the read count as @mjolinor suggested (cut execution time down to approximately 40% in my tests).
For even better performance use the approach with a StreamReader
and a StreamWriter
that @campbell.rw suggested:
$reader = New-Object IO.StreamReader 'C:\path\to\file.txt'
$writer = New-Object IO.StreamWriter 'C:\path\to\temp.txt'
while ($reader.Peek() -ge 0) {
$line = $reader.ReadLine().Replace('"', '')
$writer.WriteLine($line)
}
$reader.Close(); $reader.Dispose()
$writer.Close(); $writer.Dispose()
Remove-Item 'C:\path\to\file.txt'
Rename-Item 'C:\path\to\temp.txt' 'file.txt'
回答3:
This should be faster than line-by-line processing, and still keep your memory consumption under control:
Get-content 'file.txt' -ReadCount 5000 |
foreach-object {$_ -replace '"', '' |
add-content 'newfile.txt' }
来源:https://stackoverflow.com/questions/32336756/alternative-to-get-content