How to use regex with cut at the command line?

爱⌒轻易说出口 提交于 2020-01-24 10:09:27

问题


I have some output like this from ls -alth:

drwxr-xr-x    5 root    admin   170B Aug  3  2016 ..
drwxr-xr-x    5 root    admin    70B Aug  3  2016 ..
drwxr-xr-x    5 root    admin     3B Aug  3  2016 ..
drwxr-xr-x    5 root    admin     9M Aug  3  2016 ..

Now, I want to parse out the 170B part, which is obviously the size in human readable format. I wanted to do this using cut or sed, because I don't want to use tools that are any more complicated/difficult to use than necessary.

Ideally I want it to be robust enough to handle the B, M or K suffix that comes with the size, and multiply accordingly by 1, 1000000 and 1000 accordingly. I haven't found a good way to do that, though.

I've tried a few things without really knowing the best approach:

ls -alth | cut -f 5 -d \s+

I was hoping that would work because I'd be able to just delimit it on one or more spaces.

But that doesn't work. How do I supply cut with a regex delimiter? or is there an easier way to extract only the size of the file from ls -alth?

I'm using CentOS6.4


回答1:


This answer tackles the question as asked, but consider George Vasiliou's helpful find solution as a potentially superior alternative.

  • cut only supports a single, literal character as the delimiter (-d), so it isn't the right tool to use.

  • For extracting tokens (fields) that are separated with a variable amount of whitespace per line, awk is the best tool, so the solution proposed by George Vasiliou is the simplest one:
    ls -alth | awk '{print $5}'
    extracts the 5th whitespace-separated field ($5), which is the size.

  • Rather than use -h first and then reconvert the human-readable suffixes (such as B, M, and G) back to the mere byte counts (incidentally, the multipliers must be multiples of 1024, not 1000), simply omit -h from the ls command, which outputs the raw byte counts by default:
    ls -alt | awk '{print $5}'




回答2:


Alternative to the awk solution that will treat whitespace correctly , one can also use the find utility that can provide results similar to ls.

Actually you can use find to display directly size of the results without the need of any other tool/pipe like cut or awk.

So, to list mere bytes you can use:

$ find . -maxdepth 1 -printf %s\\n
173
3
684

You can combine filename + bytes in find with

$ find . -maxdepth 1 -printf %f-%s\\n
bsd.txt-173
file4-3
shellcolors.sh-684

You can consult man find to see a lot of available options under -printf.

Moreover, by removing -maxdepth option you can also have a listing of all the files in the subdirectories.

One more alternative is to use du utility, that is capable to provide results in human readable format:

$ du -a -b -h -d1
1.9M    ./appsfiles
173 ./bsd.txt
3   ./file4
684 ./shellcolors.sh

-a : all files and directories. Remove this option to get only directories size
-b : Reports the real size of file - Removing this option will report the disk size occupied by this file (i.e a file of 3 kB occupies 4K in reality)
-h : human readable size
-d1 : depth1

You can further parse the results of du with |cut -d" " -f1 or with |awk '{print $1}'




回答3:


I was getting annoyed with having to look up awk(ward) syntax and wrote my own:

https://www.npmjs.com/package/cutr

Install with

npm i -g cutr
ls --full-time | cutr -d ' +' -f 6-

or run with something like

ls --full-time | npx cutr -d ' +' -f 6-

Your command could be

ls -alth | cutr -f 5 -d '\s+'


来源:https://stackoverflow.com/questions/43312360/how-to-use-regex-with-cut-at-the-command-line

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!