Powershell csv row column transpose and manipulation

寵の児 提交于 2020-01-11 13:31:31

问题


I'm newbie in Powershell. I tried to process / transpose row-column against a medium size csv based record (around 10000 rows). The original CSV consist of around 10000 rows with 3 columns ("Time","Id","IOT") as below:

"Time","Id","IOT" 
"00:03:56","23","26" 
"00:03:56","24","0" 
"00:03:56","25","0" 
"00:03:56","26","1" 
"00:03:56","27","0" 
"00:03:56","28","0" 
"00:03:56","29","0" 
"00:03:56","30","1953" 
"00:03:56","31","22" 
"00:03:56","32","39" 
"00:03:56","33","8" 
"00:03:56","34","5" 
"00:03:56","35","269" 
"00:03:56","36","5" 
"00:03:56","37","0" 
"00:03:56","38","0" 
"00:03:56","39","0" 
"00:03:56","40","1251" 
"00:03:56","41","103" 
"00:03:56","42","0" 
"00:03:56","43","0" 
"00:03:56","44","0" 
"00:03:56","45","0" 
"00:03:56","46","38" 
"00:03:56","47","14" 
"00:03:56","48","0" 
"00:03:56","49","0" 
"00:03:56","2013","0" 
"00:03:56","2378","0" 
"00:03:56","2380","32" 
"00:03:56","2758","0" 
"00:03:56","3127","0" 
"00:03:56","3128","0" 
"00:09:16","23","22" 
"00:09:16","24","0" 
"00:09:16","25","0" 
"00:09:16","26","2" 
"00:09:16","27","0" 
"00:09:16","28","0" 
"00:09:16","29","21" 
"00:09:16","30","48" 
"00:09:16","31","0" 
"00:09:16","32","4" 
"00:09:16","33","4" 
"00:09:16","34","7" 
"00:09:16","35","382" 
"00:09:16","36","12" 
"00:09:16","37","0" 
"00:09:16","38","0" 
"00:09:16","39","0" 
"00:09:16","40","1882" 
"00:09:16","41","42" 
"00:09:16","42","0" 
"00:09:16","43","3" 
"00:09:16","44","0" 
"00:09:16","45","0" 
"00:09:16","46","24" 
"00:09:16","47","22" 
"00:09:16","48","0" 
"00:09:16","49","0" 
"00:09:16","2013","0" 
"00:09:16","2378","0" 
"00:09:16","2380","19" 
"00:09:16","2758","0" 
"00:09:16","3127","0" 
"00:09:16","3128","0" 
... 
... 
... 

I tried to do the transpose using code based from powershell script downloaded from https://gallery.technet.microsoft.com/scriptcenter/Powershell-Script-to-7c8368be
Basically my powershell code is as below:

$b = @() 
    foreach ($Time in $a.Time | Select -Unique) { 
        $Props = [ordered]@{ Time = $time } 
        foreach ($Id in $a.Id | Select -Unique){ 
            $IOT = ($a.where({ $_.Id -eq $Id -and $_.time -eq $time })).IOT 
            $Props += @{ $Id = $IOT } 
        } 
        $b += New-Object -TypeName PSObject -Property $Props 
    } 
$b | FT -AutoSize 
$b | Out-GridView 

Above code could give me the result as I expected which are all "Id" values will become column headers while all "Time" values will become unique row and "IOT" values as the intersection from "Id" x "Time" as below:

"Time","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","2013","2378","2380","2758","3127","3128" 
"00:03:56","26","0","0","1","0","0","0","1953","22","39","8","5","269","5","0","0","0","1251","103","0","0","0","0","38","14","0","0","0","0","32","0","0","0" 
"00:09:16","22","0","0","2","0","0","21","48","0","4","4","7","382","12","0","0","0","1882","42","0","3","0","0","24","22","0","0","0","0","19","0","0","0" 

While it only involves a few hundreds rows, the result comes out quickly as expected, but the problem now when processing the whole csv file with 10000 rows, the script above 'keep executing' and doesn't seem able to finish for long time (hours) and couldn't spit out any results. So probably if some powershell experts from stackoverflow could help to asses the code above and probably could help to modify to speed up the results?

Many thanks for the advise


回答1:


10000 records is a lot but I don't think it is enough to advise streamreader* and manually parsing the CSV. The biggest thing going against you though is the following line:

$b += New-Object -TypeName PSObject -Property $Props 

What PowerShell is doing here is making a new array and appending that element to it. This is a very memory intensive operation that you are repeating 1000's of times. Better thing to do in this case is use the pipeline to your advantage.

$data = Import-Csv -Path "D:\temp\data.csv"
$headers = $data.ID  | Sort-Object {[int]$_}  -Unique

$data | Group-Object Time | ForEach-Object{
    $props = [ordered]@{Time = $_.Name}
    foreach($header in $headers){
        $props."$header" = ($_.Group | Where-Object{$_.ID -eq $header}).IOT
    }
    [pscustomobject]$props
} |  export-csv d:\temp\testing.csv -NoTypeInformation

$data will be your entire file in memory as an object. Need to get all the $headers that will be the column headers.

Group the data by each Time. Then inside each time object we get the value for every ID. If the ID does not exist during that time then the entry will show as null.

This is not the best way but should be faster than yours. I ran 10000 records in under a minute (51 second average over 3 passes). Will benchmark to show you if I can.

I just ran your code once with my own data and it took 13 minutes. I think it is safe to say that mine performs faster.


Dummy data was made with this logic FYI

1..100 | %{
 $time = get-date -Format "hh:mm:ss"
 sleep -Seconds 1
    1..100 | % {

        [pscustomobject][ordered]@{
            time = $time 
            id = $_
            iot = Get-Random -Minimum 0 -Maximum 7
        } 
    }
} | Export-Csv d:\temp\data.csv -notypeinformation

* Not a stellar example for your case of streamreader. Just pointing it out to show that it is the better way to read large files. Just need to parse string line by line.



来源:https://stackoverflow.com/questions/33906500/powershell-csv-row-column-transpose-and-manipulation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!