I have a bunch of uploaded .root files on my laptop, but I need just specific ones

僤鯓⒐⒋嵵緔 提交于 2019-12-13 02:41:54

问题


I have a directory with 10000 .root files (each looks like hists11524_blinded.root or hists9899_blinded.root) in it and need to run some macros for my data analysis purposes. However, I don't need all of the files (just 4000 out of all) to be in the directory. I have a list of needed runs (these 4000 numbers) in thebest.txt file. This file is also in that directory with histograms.

I want to delete the files which are not needed for the processing before running macros by using the info from a .txt file.

That is how the thebest.txt file looks like:

   09769 
   09772 
   09773 
   09776 
   09777 
   09781 
   09782  
   09785  
   09786  
   09789  
   09790
   09793
    ...

My guess is to work with the command:

-comm -2 -3 <(ls) <(sort thebest) | tail +2 | xargs -p rm

I get 2 errors:

tail: invalid option -- 'p'
sort: cannot read: No such file or directory 

The file thebest.txt contains only numbers with 5 digits like 09999 or 11256, the directory contains files with names like hists9999_blinded.root or hists11256_blinded.root.

The number of digits in both lists are different - that is the main issue.


回答1:


One option is to remove the leading 0s from the numbers to match the file names. To avoid matching substrings you can prepend and append the corresponding file name parts. (In your case with the number in the middle of the file name.)

As it is not clear if the leading spaces in the sample file thebest.txt are intentional or only a formatting issue, leading spaces will be removed as well.

As deleting the wrong files may lead to data loss you may also consider processing the matching files only instead of deleting the non-matching files.

# remove leading spaces followed by leading zeros and prepend/append file name parts
sed 's/ *0*\([1-9][0-9]*\)/hists\1_blinded.root/' thebest.txt > thebestfiles.txt

# get matching files and process
find . -name 'hists*_blinded.root' | fgrep -f thebestfiles.txt | xargs process_matching

# or get non-matching files and remove
find . -name 'hists*_blinded.root' | fgrep -v -f thebestfiles.txt | xargs rm

The find command searches recursively in the current directory. If you want to exclude subdirectories you can use -maxdepth 1. To avoid processing directory names you might also add -type f.



来源:https://stackoverflow.com/questions/56853464/i-have-a-bunch-of-uploaded-root-files-on-my-laptop-but-i-need-just-specific-on

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!