Bash Directory Sorting Issue - Removing Duplicate Lines?

本秂侑毒 提交于 2019-12-12 02:56:28

问题


I'm using this command to merge multiple identical directories and to remove duplicate lines from each of the corresponding files:

for f in app1/*; do 
   bn="$(basename "$f")"
   sort -u "$f" "app2/$bn" > "app/$bn"
done

Is there a way to edit this so that it checks the lines of all the files and removes all the duplicates as well? I do need to keep the existing file structure with individual files.

The end result creates a directory with 300 text files that's no larger than 30mb.

Example:

**Directory app1**
*1.txt*       
a
b
c

*2.txt*
d
e
f

**Directory app2**
*1.txt*
a
b
c
g

*2.txt*
a
b
c
d
e
f

**Results in Directory app**
*1.txt*
a
b
c
g

*2.txt*
a
b
c
d
e
f

Desired Result in Directory app Should Be:
*1.txt*
a
b
c
g

*2.txt*
d
e
f

As you can see it's not removing the duplicate "A B C" lines from 2.txt when it's also found in 1.txt. All lines in each file should remain unique and all duplicates should be removed.


回答1:


This should probably be done with perl -i:

perl -i -n -e 'print unless $h{$_};++$h{$_}' app1/*

This seems to create .bak files in app1 (despite man page saying it won't) which you may want to eliminate after verifying the result with rm app1/*.bak.




回答2:


As you can see it's not removing the duplicate "A B C" lines from 2.txt when it's also found in 1.txt. All lines in each file should remain unique and all duplicates should be removed.

You can accomplish this goal by applying 7171u's answer to your other question "Unix Bash Remove Duplicate Lines From Directory Files?" to the result of your command above (after having changed the tmp/* in his script to app/*, which should be trivial).



来源:https://stackoverflow.com/questions/34022822/bash-directory-sorting-issue-removing-duplicate-lines

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!