Powershell 2 and .NET: Optimize for extremely large hash tables?

后端 未结 3 690
情书的邮戳
情书的邮戳 2021-01-14 12:15

I am dabbling in Powershell and completely new to .NET.

I am running a PS script that starts with an empty hash table. The hash table will grow to at least 15,000 to

3条回答
  •  青春惊慌失措
    2021-01-14 12:44

    I performed some basic tests using Measure-Command, using a set of 20 000 random words.

    The individual results are shown below, but in summary it appears that adding to one hashtable by first allocating a new hashtable with a single entry is incredibly inefficient :) Although there were some minor efficiency gains among options 2 through 5, in general they all performed about the same.

    If I were to choose, I might lean toward option 5 for its simplicity (just a single Add call per string), but all the alternatives I tested seem viable.

    $chars = [char[]]('a'[0]..'z'[0])
    $words = 1..20KB | foreach {
      $count = Get-Random -Minimum 15 -Maximum 35
      -join (Get-Random $chars -Count $count)
    }
    
    # 1) Original, adding to hashtable with "+=".
    #     TotalSeconds: ~800
    Measure-Command {
      $h = @{}
      $words | foreach { if( $h[$_] -ne $true ) { $h += @{ $_ = $true } } }
    }
    
    # 2) Using sharding among sixteen hashtables.
    #     TotalSeconds: ~3
    Measure-Command {
      [hashtable[]]$hs = 1..16 | foreach { @{} }
      $words | foreach {
        $h = $hs[$_.GetHashCode() % 16]
        if( -not $h.ContainsKey( $_ ) ) { $h.Add( $_, $null ) }
      }
    }
    
    # 3) Using ContainsKey and Add on a single hashtable.
    #     TotalSeconds: ~3
    Measure-Command {
      $h = @{}
      $words | foreach { if( -not $h.ContainsKey( $_ ) ) { $h.Add( $_, $null ) } }
    }
    
    # 4) Using ContainsKey and Add on a hashtable constructed with capacity.
    #     TotalSeconds: ~3
    Measure-Command {
      $h = New-Object Collections.Hashtable( 21KB )
      $words | foreach { if( -not $h.ContainsKey( $_ ) ) { $h.Add( $_, $null ) } }
    }
    
    # 5) Using HashSet and Add.
    #     TotalSeconds: ~3
    Measure-Command {
      $h = New-Object Collections.Generic.HashSet[string]
      $words | foreach { $null = $h.Add( $_ ) }
    }
    

提交回复
热议问题