PowerShell get number of lines of big (large) file

前端 未结 8 1195
小鲜肉
小鲜肉 2020-12-23 16:35

One of the ways to get number of lines from a file is this method in PowerShell:

PS C:\\Users\\Pranav\\Desktop\\PS_Test_Scripts> $a=Get-Content .\\sub.ps1         


        
8条回答
  •  借酒劲吻你
    2020-12-23 17:09

    Here's a PowerShell script I cobbled together which demonstrates a few different methods of counting lines in a text file, along with the time and memory required for each method. The results (below) show clear differences in the time and memory requirements. For my tests, it looks like the sweet spot was Get-Content, using a ReadCount setting of 100. The other tests required significantly more time and/or memory usage.

    #$testFile = 'C:\test_small.csv' # 245 lines, 150 KB
    #$testFile = 'C:\test_medium.csv' # 95,365 lines, 104 MB
    $testFile = 'C:\test_large.csv' # 285,776 lines, 308 MB
    
    # Using ArrayList just because they are faster than Powershell arrays, for some operations with large arrays.
    $results = New-Object System.Collections.ArrayList
    
    function AddResult {
    param( [string] $sMethod, [string] $iCount )
        $result = New-Object -TypeName PSObject -Property @{
            "Method" = $sMethod
            "Count" = $iCount
            "Elapsed Time" = ((Get-Date) - $dtStart)
            "Memory Total" = [System.Math]::Round((GetMemoryUsage)/1mb, 1)
            "Memory Delta" = [System.Math]::Round(((GetMemoryUsage) - $dMemStart)/1mb, 1)
        }
        [void]$results.Add($result)
        Write-Output "$sMethod : $count"
        [System.GC]::Collect()
    }
    
    function GetMemoryUsage {
        # return ((Get-Process -Id $pid).PrivateMemorySize)
        return ([System.GC]::GetTotalMemory($false))
    }
    
    # Get-Content -ReadCount 1
    [System.GC]::Collect()
    $dMemStart = GetMemoryUsage
    $dtStart = Get-Date
    $count = 0
    Get-Content -Path $testFile -ReadCount 1 |% { $count++ }
    AddResult "Get-Content -ReadCount 1" $count
    
    # Get-Content -ReadCount 10,100,1000,0
    # Note: ReadCount = 1 returns a string.  Any other value returns an array of strings.
    # Thus, the Count property only applies when ReadCount is not 1.
    @(10,100,1000,0) |% {
        $dMemStart = GetMemoryUsage
        $dtStart = Get-Date
        $count = 0
        Get-Content -Path $testFile -ReadCount $_ |% { $count += $_.Count }
        AddResult "Get-Content -ReadCount $_" $count
    }
    
    # Get-Content | Measure-Object
    $dMemStart = GetMemoryUsage
    $dtStart = Get-Date
    $count = (Get-Content -Path $testFile -ReadCount 1 | Measure-Object -line).Lines
    AddResult "Get-Content -ReadCount 1 | Measure-Object" $count
    
    # Get-Content.Count
    $dMemStart = GetMemoryUsage
    $dtStart = Get-Date
    $count = (Get-Content -Path $testFile -ReadCount 1).Count
    AddResult "Get-Content.Count" $count
    
    # StreamReader.ReadLine
    $dMemStart = GetMemoryUsage
    $dtStart = Get-Date
    $count = 0
    # Use this constructor to avoid file access errors, like Get-Content does.
    $stream = New-Object -TypeName System.IO.FileStream(
        $testFile,
        [System.IO.FileMode]::Open,
        [System.IO.FileAccess]::Read,
        [System.IO.FileShare]::ReadWrite)
    if ($stream) {
        $reader = New-Object IO.StreamReader $stream
        if ($reader) {
            while(-not ($reader.EndOfStream)) { [void]$reader.ReadLine(); $count++ }
            $reader.Close()
        }
        $stream.Close()
    }
    
    AddResult "StreamReader.ReadLine" $count
    
    $results | Select Method, Count, "Elapsed Time", "Memory Total", "Memory Delta" | ft -auto | Write-Output
    

    Here are results for text file containing ~95k lines, 104 MB:

    Method                                    Count Elapsed Time     Memory Total Memory Delta
    ------                                    ----- ------------     ------------ ------------
    Get-Content -ReadCount 1                  95365 00:00:11.1451841         45.8          0.2
    Get-Content -ReadCount 10                 95365 00:00:02.9015023         47.3          1.7
    Get-Content -ReadCount 100                95365 00:00:01.4522507         59.9         14.3
    Get-Content -ReadCount 1000               95365 00:00:01.1539634         75.4         29.7
    Get-Content -ReadCount 0                  95365 00:00:01.3888746          346        300.4
    Get-Content -ReadCount 1 | Measure-Object 95365 00:00:08.6867159         46.2          0.6
    Get-Content.Count                         95365 00:00:03.0574433        465.8        420.1
    StreamReader.ReadLine                     95365 00:00:02.5740262         46.2          0.6
    

    Here are results for a larger file (containing ~285k lines, 308 MB):

    Method                                    Count  Elapsed Time     Memory Total Memory Delta
    ------                                    -----  ------------     ------------ ------------
    Get-Content -ReadCount 1                  285776 00:00:36.2280995         46.3          0.8
    Get-Content -ReadCount 10                 285776 00:00:06.3486006         46.3          0.7
    Get-Content -ReadCount 100                285776 00:00:03.1590055         55.1          9.5
    Get-Content -ReadCount 1000               285776 00:00:02.8381262         88.1         42.4
    Get-Content -ReadCount 0                  285776 00:00:29.4240734        894.5        848.8
    Get-Content -ReadCount 1 | Measure-Object 285776 00:00:32.7905971         46.5          0.9
    Get-Content.Count                         285776 00:00:28.4504388       1219.8       1174.2
    StreamReader.ReadLine                     285776 00:00:20.4495721           46          0.4
    

提交回复
热议问题