find directories but exclude list where directories have a space in name

五迷三道 提交于 2019-12-11 06:52:36

问题


I have a process that audits files from one day to the next on a large file system. I want to exclude some directories from consideration by using a list of directories to exclude. I can do that just fine, but I'm having trouble if an exclude directory has a space in the name.

For simplicity's sake, I'm only going to list four sub-directories, but in reality there are many more directories I want to search vs exclude. There's also the chance that a new directory gets added and I want to automatically include new directories, hence the exclude list vs using an include list.

base_dir/
├── sub_dir1
├── sub_dir2
├── sub dir3
└── sub_dir4

I have a shell script and an exclude list

$ cat exclude.txt
sub_dir2
sub dir3

The shell script uses find and printf along with awk and sort to get a list of directories to audit.

$ find ./base_dir -maxdepth 1 -type d $(printf "! -iname %s " $(cat exclude.txt)) | awk -F/ '{print $NF}' | sort
sub_dir1
sub dir3
sub_dir4

As you can probably guess and see above, this works except that it's not ignoring sub dir3. I've tried a few combinations of double quotes inside exclude list and using %q vs %s vs %a, but can't seem to find the correct combination.

My desired output is

sub_dir1
sub_dir4

I realize I could do something like:

find ./base_dir -maxdepth 1 -type d \
    ! -iname "sub dir3" $(printf "! -iname %s " $(cat exclude.txt)) \
    | awk -F/ '{print $NF}' | sort

and get my expected output, but I want to only use the exclude.txt list.

EDIT After reading some replies I tried using an array and thought that would work, now it's even more obscure to me why this option doesn't work. printf appears to produce a string that would work if I strictly typed it into the command line, but when trying to run it as a one-liner still giving me errors.

$cat exclude.txt
base_dir
sub_dir2
"sub dir3"

$ mapfile -t exclude < exclude.txt

$printf "! -iname %s " "${exclude[@]}"
! -iname base_dir ! -iname sub_dir2 ! -iname "sub dir3"

$find ./base_dir -maxdepth 1 -type d $(printf "! -iname %s " "${exclude[@]}")
find: paths must precede expression: dir3"

$ find ./base_dir -maxdepth 1 -type d ! -iname base_dir ! -iname sub_dir2 ! -iname "sub dir3"
./base_dir/sub_dir1
./base_dir/sub_dir4

回答1:


You could read the exclude file into a Bash array and then craft a find command like this:

mapfile -t exclude < exclude.txt
find ./base_dir \
    -mindepth 1 \          # Exclude the current directory
    -type d \
    -regextype egrep \     # Make sure alternation "|" does not have to be escaped
    ! -iregex ".*/($(IFS='|'; echo "${exclude[*]}"))" \
    -printf '%f\n'         # Print just filename without leading directories

resulting in

sub_dir1
sub_dir4

For your example input, the -iregex test expands like this:

$ IFS='|'
$ echo "${exclude[*]}")
sub_dir2|sub dir3

so the regular expression for paths to exclude becomes

.*/(sub_dir2|sub dir3)

The change to IFS is limited to the command substitution.

The limitation to this is if the directories to be excluded contain characters that are special to regexes, you have to escape those, which can get messy. If you wanted to escape, for example, pipes, you could use

echo "${exclude[*]//|/\\|}"

in the command substitution, resulting in

sub_dir2|sub dir3|has\|pipe

where the directory has|pipe with a | in its name has its pipe properly escaped.




回答2:


edited to include new info, in case it's useful later

Don't embed printf/cat. The interpreter parser is working against you. Stack the exclusion filters with paste -s into a tempfile to build your command dynamically, then execute it.

$: find ./base_dir
./base_dir
./base_dir/sub dir1
./base_dir/sub dir3
./base_dir/sub_dir1
./base_dir/sub_dir3

$: tmpfile=/tmp/xFinder
$: printf "find ./base_dir -maxdepth 1 -type d ! -iname base_dir " > $tmpfile
$: { sed -E 's/^(.*)/! -iname \"\1\"/' exclude.txt; 
     printf " | xargs -I R basename R "; } | paste -s >> $tmpfile
$: cat $tmpfile
find ./base_dir -maxdepth 1 -type d ! -iname base_dir ! -iname "sub_dir1"    ! -iname "sub dir3"     ! -iname "sub_dir4"      | xargs -I R basename R

The xargs call to basname strips the path info, and ! -iname base_dir keeps it out of the find output as a dir of it's own.

$: . $tmpfile
./base_dir
./base_dir/sub dir1
./base_dir/sub_dir3

Apologies for the earlier incomplete version.




回答3:


Since you only want to limit to a single subdirectory, without recursion, you can use a for loop with whildcards:

$ find base_dir/
base_dir/
base_dir/sub_dir2
base_dir/sub_dir1
base_dir/sub_dir4
base_dir/sub dir3

$ cat exclude.txt 
sub_dir2
sub dir3

$ cat script.sh 
#!/bin/bash
for dir in base_dir/*
do
  ! [ -d "$dir" ] || 
    grep -qFx -- "$(basename -- "$dir")" exclude.txt &&
    continue
  echo "$dir" # or do somthing else
done

$ ./script.sh 
base_dir/sub_dir1
base_dir/sub_dir4


来源:https://stackoverflow.com/questions/51307638/find-directories-but-exclude-list-where-directories-have-a-space-in-name

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!