What tools deal with spaces in columnar data well?

瘦欲@ 提交于 2019-12-25 18:34:32

问题


Let's start with an example that I ran into recently:

C:\>net user

User accounts for \\SOMESYSTEM

-------------------------------------------------------------------------------
ASPNET                   user1                    AnotherUser123
Guest                    IUSR_SOMESYSTEM          IWAM_SOMESYSTEM
SUPPORT_12345678         test userrrrrrrrrrrr     test_userrrrrrrrrrrr
The command completed successfully.

In the third row, second column there is a login with a space. This causes many of the tools that separate fields based on white space to treat this field as two fields.

How would you deal with data formatted this way using today's tools?

Here is an example in pure** Windows batch language on the command prompt that I would like to have replicated in other modern cross-platform text processing tool sets:

C:\>cmd /v:on
Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.

C:\>echo off

for /f "skip=4 tokens=*" %g in ('net user ^| findstr /v /c:"The command completed successfully."') do (
More? set record=%g
More? echo !record:~0,20!
More? echo !record:~25,20!
More? echo !record:~50,20!
More? )
ASPNET
user1
AnotherUser123
Guest
IUSR_SOMESYSTEM
IWAM_SOMESYSTEM
SUPPORT_12345678
test userrrrrrrrrrrr
test_userrrrrrrrrrrr


echo on
C:\>

** Using variable delayed expansion (cmd /v:on or setlocal enabledelayedexpansion in a batch file), the for /f command output parser, and variable substring syntax... none of which are well documented except for at the wonderful website http://ss64.com/nt/syntax.html

Looking into AWK, I didn't see a way to deal with the 'test userrrrrrrrrrrr' login field without using substr() in a similar method to the variable substring syntax above. Is there another language that makes text wrangling easy and is not write-only like sed?


回答1:


PowerShell:

Native user list example, no text matching needed

Get-WmiObject Win32_UserAccount | Format-Table -Property Caption -HideTableHeaders

Or, if you want to use "NET USER":

$out = net user     # Send stdout to $out
$out = $out[4..($out.Length-3)]     # Skip header/tail
[regex]::split($out, "\s{2}") | where { $_.Length -ne 0 }   
# Split on double-space and skip empty lines



回答2:


Just do a direct query for user accounts, using vbscript (or powershell if your system supports)

strComputer = "."
Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
Set colItems = objWMIService.ExecQuery("Select * from Win32_UserAccount",,48)
For Each objItem in colItems
    Wscript.Echo objItem.Name
Next

This will show you a list of users, one per line. If your objective is just to show user names, there is no need to use other tools to process thee data.




回答3:


Awk isn't so great for that problem because awk is focused on lines as records with a recognizable field separator, while the example file uses fixed-width fields. You could, e.g., try to use a regular expression for the field separator, but that can go wrong. The right way would be to use that fixed width to clean the file up into something easier to work with; awk can do this, but it is inelegant.

Essentially, the example is difficult because it doesn't follow any clear rules. The best approach is a quite general one: write data to files in a well-defined format with a library function, read files by using a complementary library function. Specific language doesn't matter so much with this strategy. Not that that helps when you already have a file like the example.




回答4:


TEST

 printf "
User accounts for \\SOMESYSTEM

-------------------------------------------------------------------------------
ASPNET                   user1                    AnotherUser123
Guest                    IUSR_SOMESYSTEM          IWAM_SOMESYSTEM
SUPPORT_12345678         test userrrrrrrrrrrr     test_userrrrrrrrrrrr
The command completed successfully.
\n" | awk 'BEGIN{
        colWidth=25
       }
       /-----/ {next}
       /^[[:space:]]*$/{next}
       /^User accounts/{next}
       /^The command completed/{next}
       {
        col1=substr($0,1,colWidth)
        col2=substr($0,1+colWidth,colWidth)
        col3=substr($0,1+(colWidth*2),colWidth)
        printf("%s\n%s\n%s\n", col1, col2, col3)
       }' 

There's probably a better way than the 1+(colWidth*2) but I'm out of time for right now.

If you try to execute code as is, you'll have to remove the leading spaces at the front of each line in the printf statement.

I hope this helps.




回答5:


For this part:

set record=%g
More? echo !record:~0,20!
More? echo !record:~25,20!
More? echo !record:~50,20! 

I would use:

for /f "tokens=1-26 delims= " %a in (%g%) do (
if not "%a" = "" echo %a
if not "%b" = "" echo %b
if not "%c" = "" echo %c
rem ... and so on...
if not "%y" = "" echo %y
if not "%z" = "" echo %z
)

That is if I had to do this using batch. But I wouldn't dare to call this "modern" as per your question.




回答6:


perl is really the best choice for your case, and millions of others. It is very common and the web is ripe with examples and documentation. Yes it is cross platform, extremely stable, and nearly perfectly consistent across platforms. I say nearly because nothing is perfect and I doubt in your lifetime that you would encounter an inconsistency.

It is a language interpreter but supports a rich command-line interface as well.



来源:https://stackoverflow.com/questions/7195851/what-tools-deal-with-spaces-in-columnar-data-well

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!