Optimize Get-ADUser filter

爷,独闯天下 提交于 2020-12-13 03:40:32

问题


In AD, I'm trying to identify user accounts where the same EmployeeID value is populated in 2 or more records. Below is my piece of code (Credit: I'm using a Show-Progress function defined here) and the Get-ADUser command alone has taken more than 2 hours to fetch all the records. The other steps (2 to 5) have been pretty quick. While I've completed the work, I'm trying to know if this could've been done more efficiently with PowerShell.

Get-ADUser -LDAPFilter "(&(ObjectCategory=Person)(objectclass=user)(employeeid=*))" -Properties $properties -Server $server_AD_GC -ResultPageSize 1000 | 
    # *ISSUE HERE*
    #    The Get-ADUser extract process seems to work very slow.
    #    However, it is important to note that the above command will be retrieving more than 200K records
    # NOTE: I've inferred that employeeid is an indexed attribute and is replicated to GlobalCatalogs and hence have used it in the filter
    Show-Progress -Activity "(1/5) Getting AD Users ..." |
select $selectPropsList -OutVariable results_UsersBaseSet |
Group-Object EmployeeID | 
    Show-Progress -Activity "(2/5) Grouping on EmployeeID ..." | 
? { $_.Count -gt 1 } | 
    Show-Progress -Activity "(3/5) Filtering only dup EmpID records ..." | 
select -Exp Group | 
    Show-Progress -Activity "(4/5) UnGrouping ..." | 
Export-Csv "C:\Users\me\op_GetADUser_w_EmpID_Dupes_EntireForest - $([datetime]::Now.ToString("MM-dd-yyyy_hhmmss")).csv" -NoTypeInformation |
    Show-Progress -Activity "(5/5) Exporting ..." | 
Out-Null

PS: I've also tried to first export all the user accounts to a csv file and then post-process with Excel but I had to frown because of the size of the dataset and it was both time and memory crunching.

Any suggestion is highly appreciated.


回答1:


Since we don't know what is in $properties or $selectPropsList, your question is really only about finding out to which users the same EmployeeID has been issued, right?
By default, Get-ADUser already returns these properties:

DistinguishedName, Enabled, GivenName, Name, ObjectClass, ObjectGUID, SamAccountName, SID, Surname, UserPrincipalName

So all you need extra is the EmployeeID I guess. Trying to collect LOTS of properties does slow down, so keeping this to a bare minimum helps to speed things up.

Next, by using the Show-Progress script you have linked to, you will slow down the execution of the script considerably. Do you really need to have a progress bar? Why not simply write the lines with activity steps directly to the console?

Also, piping everything together doesn't help in the speed department either..

$server_AD_GC    = 'YourServer'
$selectPropsList = 'EmployeeID', 'Name', 'SamAccountName', 'Enabled'
$outFile         = "C:\Users\me\op_GetADUser_w_EmpID_Dupes_EntireForest - $([datetime]::Now.ToString("MM-dd-yyyy_hhmmss")).csv"

Write-Host "Step (1/4) Getting AD Users ..." 
$users = Get-ADUser -Filter "EmployeeID -like '*'" -Properties EmployeeID -Server $server_AD_GC -ResultPageSize 1000

Write-Host "Step (2/4) Grouping on EmployeeID ..."
$dupes = $users | Group-Object -Property EmployeeID | Where-Object { $_.Count -gt 1 }

Write-Host "Step (3/4) Collecting duplicates ..."
$result = foreach ($group in $dupes) {
    $group.Group | Select-Object $selectPropsList
}

Write-Host "Step (4/4) Exporting ..."
$result | Export-Csv -Path $outFile -NoTypeInformation

Write-Host  "All done" -ForegroundColor Green

P.S. Get-ADUser already returns user objects only, so there is no need for the LDAP filter (ObjectCategory=Person)(objectclass=user). Using -Filter "EmployeeID -like '*'" is probably faster




回答2:


This answer complements Theo's helpful answer and focuses on showing progress during the operation:

  • The linked Show-Progress function, which is the latest as of this writing:

    • has an outright bug, in that it doesn't pass pipeline input through (the relevant line is accidentally commented out)

    • is conceptually flawed in that it doesn't use a process block, which means that all pipeline input is collected first, before it is processed - which defeats the idea of a progress bar.

  • Therefore, you Show-Progress calls won't show progress until the previous command in the pipeline has output all of its output. A simple alternative is to break the pipeline into separate commands and to simply emit one progress message before each command, announcing the next stage of processing (rather than per-object progress) as shown in Theo's answer.

  • Generally, there is no way to show the progress of command-internal processing, only the progress of a command's (multi-object) output.

    • The simplest way to do this via a ForEach-Object call in which you call
      Write-Progress, but that comes with two challenges:

      • In order to show a percent-complete progress bar, you need to know how many objects there will be in total, which you must determine ahead of time, because a pipeline cannot know how many objects it will receive; your only option is to collect all output first (or find some other way to count it) and then use the collected output as pipeline input, using the count of objects as the basis for calculating the value to pass to Write-Progress -PerCentComplete.

      • Calling Write-Progress for each object received will result in a significant slowdown of overall processing; a compromise is to only call it for every N objects, as shown in this answer; the approach there could be wrapped in a properly implemented function a la Show-Progress that requires passing the total object count as an argument and performs proper streaming input-object processing (via a process block); that said, the mere act of using PowerShell code for passing input objects through is costly.


Conclusion:

Percent-complete progress displays have two inherent problems:

  • They require you to know the total number of objects to process beforehand (a pipeline has no way of knowing how many objects will pass through it):

    • Either: Collect all objects to process in memory, beforehand, if feasible; the count of elements in the collection can then serve as the basis for the percent-complete calculations. This may not be an option with very large input sets.

    • Or: Perform an extra processing step beforehand that merely counts all objects without actually retrieving them. This may not be practical in terms of the additional processing time added.

  • Object-by-object processing in PowerShell code - either via ForEach-Object or an advanced script/function - is inherently slow.

    • You can mitigate that somewhat by limiting Write-Progress calls to every N objects, as shown in this answer

Overall it's a tradeoff between processing speed and the ability to show percent-complete progress to the end user.



来源:https://stackoverflow.com/questions/65061705/optimize-get-aduser-filter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!