问题
I encountered a problem in my program. I have a list of files and I sort them with this code to find out the 10 most frequent file types in the list.
find $DIR -type f | file -b $SAVEFILES | cut -c1-40 | sort -n | uniq -c | sort -nr | head -10
My output looks like this
168 HTML document, ASCII text
114 C source, ASCII text
102 ASCII text
33 ASCII text, with very long lines
30 HTML document, UTF-8 Unicode text, with
26 HTML document, ASCII text, with very lon
21 C source, UTF-8 Unicode text
20 LaTeX document, UTF-8 Unicode text, with
15 SVG Scalable Vector Graphics image
12 LaTeX document, ASCII text, with very lo
What I want to do is to access the values before the file types and replace them #. I can fdo that with a for loop but first I have somehow access them.
the expected output is something like this:
__HTML document, ASCII text : ################
__C source, ASCII text : ###########
__ASCII text : ##########
__ASCII text, with very long lines : ########
__HTML document, UTF-8 Unicode text, with : #######
__HTML document, ASCII text, with very lon: ####
__C source, UTF-8 Unicode text : ####
__LaTeX document, UTF-8 Unicode text, with: ###
__SVG Scalable Vector Graphics image : #
__LaTeX document, ASCII text, with very lo: #
EDIT: The # are not representing the exect number in my example. First line should have 168 #, second 114 # and so on
回答1:
Append this:
| while read -r n text; do printf "__%s%$((48-${#text}))s: " "$text"; for ((i=0;i<$n;i++)); do printf "%s" "#"; done; echo; done
Change 48
according to your needs.
Output with your input:
__HTML document, ASCII text : ######################################################################################################################################################################## __C source, ASCII text : ################################################################################################################## __ASCII text : ###################################################################################################### __ASCII text, with very long lines : ################################# __HTML document, UTF-8 Unicode text, with : ############################## __HTML document, ASCII text, with very lon : ########################## __C source, UTF-8 Unicode text : ##################### __LaTeX document, UTF-8 Unicode text, with : #################### __SVG Scalable Vector Graphics image : ############### __LaTeX document, ASCII text, with very lo : ############
回答2:
A shell loop is never the right way to manipulate text, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice.
You can do what you asked for with this awk command:
$ awk '{printf "%-40s: %s\n", substr($0,9), gensub(/ /,"#","g",sprintf("%*s",$1,""))}' file
HTML document, ASCII text : ########################################################################################################################################################################
C source, ASCII text : ##################################################################################################################
ASCII text : ######################################################################################################
ASCII text, with very long lines : #################################
HTML document, UTF-8 Unicode text, with : ##############################
HTML document, ASCII text, with very lon: ##########################
C source, UTF-8 Unicode text : #####################
LaTeX document, UTF-8 Unicode text, with: ####################
SVG Scalable Vector Graphics image : ###############
LaTeX document, ASCII text, with very lo: ############
but the right way to do this is to get rid of everything from cut
on and just do something like this:
find "$DIR" -type f | file -b "$SAVEFILES" |
awk '
{ types[substr($0,1,40)]++ }
END {
PROCINFO["sorted_in"] = "@ind_num_desc"
for (type in types) {
printf "%-*s: %s\n", 40, type, gensub(/ /,"#","g",sprintf("%*s",cnt[type],""))
if (++cnt == 10) {
break
}
}
}
'
The above use GNU awk for sorted_in and gensub() and the 2nd one is untested since you only provided sample input for the last part, printing the "#"s
回答3:
The perl approach, add:
| perl -lpE 's/\s*(\d+)\s(.*)/sprintf "__%-40s: %s", $2, "#"x$1/e'
output
__HTML document, ASCII text : ########################################################################################################################################################################
__C source, ASCII text : ##################################################################################################################
__ASCII text : ######################################################################################################
__ASCII text, with very long lines : #################################
__HTML document, UTF-8 Unicode text, with : ##############################
__HTML document, ASCII text, with very lon: ##########################
__C source, UTF-8 Unicode text : #####################
__LaTeX document, UTF-8 Unicode text, with: ####################
__SVG Scalable Vector Graphics image : ###############
__LaTeX document, ASCII text, with very lo: ############
following @Ed's approach, just using perl
find "$DIR" -type f | file -b "$SAVEFILES" |\
perl -lnE '$s{substr$_,0,40}++;}{printf"__%-40s: %s\n",$_,"#"x$s{$_}for(splice@{[sort{$s{$b}<=>$s{$a}}keys%s]},0,9)'
readable:
perl -lnE '
$seen{ substr $_,0,40 }++;
END {
printf"__%-40s: %s\n", $_, "#" x $seen{$_}
for( splice @{[sort { $seen{$b} <=> $seen{$a} } keys %seen]},0,9 )
}'
Ps: Just note, the file utility just will test the files in the $SAVEFILES
so, the find ... | file -b $SAVEFILES
is pointless
来源:https://stackoverflow.com/questions/43028856/how-to-access-the-prefix-when-using-uniq-c