Powershell to count columns in a file

醉酒当歌 提交于 2020-07-10 11:07:33

问题


I need to test the integrity of file before importing to SQL. Each row of the file should have the exact same amount of columns.

These are "|" delimited files. I also need to ignore the first line as it is garbage.

If every row does not have the same number of columns, then I need to write an error message.

I have tried using something like the following with no luck:

$colCnt = "c:\datafeeds\filetoimport.txt"
$file = (Get-Content $colCnt -Delimiter "|") 
$file = $file[1..($file.count - 1)]
Foreach($row in $file){
    $row.Count
}

Counting rows is easy. Columns is not. Any suggestions?


回答1:


Yep, read the file skipping the first line. For each line split it on the pipe, and count the results. If it isn't the same as the previous throw an error and stops.

$colCnt = "c:\datafeeds\filetoimport.txt"
[int]$LastSplitCount = $Null
Get-Content $colCnt | ?{$_} | Select -Skip 1 | %{if($LastSplitCount -and !($_.split("|").Count -eq $LastSplitCount)){"Process stopped at line number $($_.psobject.Properties.value[5]) for column count mis-match.";break}elseif(!$LastSplitCount){$LastSplitCount = $_.split("|").Count}}

That should do it, and if it finds a bad column count it will stop and output something like:

Process stopped at line number 5 for column count mis-match.

Edit: Added a Where catch to skip blank lines ( ?{$_} )
Edit2: Ok, if you know what the column count should be then this is even easier.

Get-Content $colCnt | ?{$_} | Select -Skip 1 | %{if(!($_.split("|").Count -eq 210)){"Process stopped at line number $($_.psobject.Properties.value[5]), incorrect column count of: $($_.split("|").Count).";break}}

If you want it to return all lines that don't have 210 columns just remove the ;break and let it run.




回答2:


A more generic approach, including a RegEx filter:

$path = "path\to\folder"
$regex = "regex"
$expValue = 450

$files= Get-ChildItem $path | Where-Object {$_.Name -match $regex}
Foreach( $f in $files) {
    $filename = $f.Name
    echo $filename
    $a = Get-Content $f.FullName;
    $i = 1;
    $e = 0;
    echo "Starting...";
    foreach($line in $a)
    {
        if ($line.length -ne $expValue){
            echo $filename
            $a | Measure-Object -Line
            echo "Long:"
            echo $line.Length;
            echo "Line Nº: "
            echo $i;
            $e = $e + 1;       
        }
        $i = $i+1;
    }
    echo "Finished";
    if ($e -ne 0){
        echo $e "errors found";
    }else{
        echo "No errors"
        echo ""
    }
}
echo "All files examined"



回答3:


Another possibility:

$colCnt = "c:\datafeeds\filetoimport.txt"

$DataLine = (Get-Content $colCnt -TotalCount 2)[1]
$DelimCount = ([char[]]$DataLine -eq '|').count
$MatchString = '.*' + ('|.*' * $DelimCount )

$test = Select-String -Path $colCnt -Pattern $MatchString -NotMatch |
  where { $_.linenumber -ne 1 }

That will find the number of delimiter characters in the second line, and build a regex pattern that can be used with Select-String.

The -NotMatch switch will make it return any lines that don't match that pattern as MatchInfo objects that will have the filename, line number and content of the problem lines.

Edit: Since the first line is "garbage" you probably don't care if it didn't match so I added a filter to the result to drop that out.



来源:https://stackoverflow.com/questions/23119014/powershell-to-count-columns-in-a-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!