How can I make this PowerShell script parse large files faster?

后端未结

关注

 3  644

名媛妹妹 2020-12-01 12:58

I have the following PowerShell script that will parse some very large file for ETL purposes. For starters my test file is ~ 30 MB. Larger files around 200 MB are

3条回答

天涯浪人 (楼主)

2020-12-01 13:33
Your script reads one line at a time (slow!) and stores almost the entire file in memory (big!).

Try this (not tested extensively):
```
$path = "E:\Documents\Projects\ESPS\Dev\DataFiles\DimProductionOrderOperation"
$infile = "14SEP11_ProdOrderOperations.txt"
$outfile = "PROCESSED_14SEP11_ProdOrderOperations.txt"

$batch = 1000

[regex]$match_regex = '^\|.+\|.+\|.+'
[regex]$replace_regex = '^\|(.+)\|$'

$header_line = (Select-String -Path $path\$infile -Pattern $match_regex -list).line

[regex]$header_regex = [regex]::escape($header_line)

$header_line.trim('|') | Set-Content $path\$outfile

Get-Content $path\$infile -ReadCount $batch |
    ForEach {
             $_ -match $match_regex -NotMatch $header_regex -Replace $replace_regex ,'$1' | Out-File $path\$outfile -Append
    }
```
That's a compromise between memory usage and speed. The -match and -replace operators will work on an array, so you can filter and replace an entire array at once without having to foreach through every record. The -readcount will cause the file to be read in chunks of $batch records, so you're basically reading in 1000 records at a time, doing the match and replace on that batch then appending the result to your output file. Then it goes back for the next 1000 records. Increasing the size of $batch should speed it up, but it will make it use more memory. Adjust that to suit your resources.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...