问题
I'm trying to filter through the historical content of a file in my git repository. There is a line in some of the files that contains the string 'BEAM:A_BOOK', and in the 7th comma separated value of this line is a value I want to retrieve for further processing. I think, ideally, I'd end up with something like a dictionary with the SHA-1 hash of the commit, and this A_BOOK value for the past versions of this file.
Example of first few lines of a File. Note the value I'd hope to retrieve from this version of the file would be '56.0':
# Date: 2018-12-21 01:49:16.888
PV,SELECTED,TIMESTAMP,STATUS,SEVERITY,VALUE_TYPE,VALUE,READBACK,READBACK_VALUE,DELTA,READ_ONLY
REA_EXP:LINE,0,1544047322.881066957,NO_ALARM,NONE,enum,"JENSA~[UDF;AT-TPC;GPL;JENSA]",,"---",,true
REA_BTS19:BEAM:OPTICSFILE,0,1541798820.065952460,NO_ALARM,NONE,string,"BTS19_test3.data",,"---",,true
REA_BTS19:BEAM:A_BOOK,0,1545322510.562031883,NO_ALARM,NONE,double,"56.0",,"---",,true
Ultimately, I'll extend this to retrieve a couple values and do some math to perform more complicated filtering. More background: we store the Atomic Mass and Charge values for ion beams we deliver for nuclear physics experiments in text files under version control. These text files act as our 'save sets', and are filled with more than this mass and charge info, as they also include machine values we would restore if we wanted to run that beam again. My goal is to filter these files by the Charge:Mass ratio of the beams we ran with them.
So far, this seems to get me most of my information:
git grep 'BTS19:BEAM:A_BOOK' $(git rev-list --all) | grep RFQ-JENSA_Setpoint.snp
Which spits outsomething like this:
16eca44985214b790eb6ca8241ad86728b4fd3ae:RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1531323944.085330133,NO_ALARM,NONE,double,"2.0",,"---",,true
6e585c905444f25e18edfe1eeb32ced2de72ed7c:RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1531323944.085330133,NO_ALARM,NONE,double,"2.0",,"---",,true
bc202d5f21f9829fa3701ca636657ee1b0a73e25:RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1531323944.085330133,NO_ALARM,NONE,double,"2.0",,"---",,true
etc...
However, I'd like to see something like:
<hash>:<Retrieved A_BOOK Value>
Or, based on the output I just showed, I'd hope to see something like this:
16eca44985214b790eb6ca8241ad86728b4fd3ae:2.0
6e585c905444f25e18edfe1eeb32ced2de72ed7c:2.0
bc202d5f21f9829fa3701ca636657ee1b0a73e25:2.0
etc...
And eventually include some math to show something more meaningful:
<hash>:<Retrieved Q_BOOK Value>/<Retrieved A_BOOK Value>
Is there a better way to go about this? What's a good way to retrieve this information?
Thank you!
回答1:
Given that you're interested in a particular file within each revision, consider adding -- <pathspec>
to the git grep
invocation. That is, instead of:
git grep 'BTS19:BEAM:A_BOOK' $(git rev-list --all) | grep RFQ-JENSA_Setpoint.snp
you could start with:
git grep 'BTS19:BEAM:A_BOOK' $(git rev-list --all) -- RFQ-JENSA_Setpoint.snp
You will still get the lines, but faster, since git grep
can skip all the files that don't have RFQ-JENSA_Setpoint.snp
in their names. (Note that a <pathspec>
is not the same as a regular expression: if you really wanted to allow any character, e.g., RFQ-JENSA_SetpointXsnp
and RFQ-JENSA_SetpointYsnp
as file names, you'd have to use -- 'RFQ-JENSA_Setpoint?snp'
here. I'm guessing your second grep was overly permissive. REs are more expressive in general than path globs, but for this particular case, even if you really did mean "any character", glob has ?
to allow that.)
Complicating matters, you may find that in a large repository, $(git rev-list --all)
produces enough strings to overflow argv limits. (What the argv limits are on your system is not something I can guess.) In that case, you may need to pipe git rev-list --all
through xargs
:
git rev-list --all | xargs -I % git grep 'BTS19:BEAM:A_BOOK' % -- RFQ-JENSA_Setpoint.snp
Annoyingly, this spawns one separate git grep
for each revision, which will slow you right back down. (If you have a BSD-style xargs
you can use -J
instead of -I
; or consider the GNU parallel command.)
To break these up and extract the 7th comma-separated value, consider replacing the :
with ,
and using awk
:
... | sed 's/:/,/' | awk -F, '{print $1 ":" $8}'
although if you need proper CSV quote handling, a separate tool is probably more appropriate. (Given your example this would print <hash>:"2.0"
, too, with the quotes.)
来源:https://stackoverflow.com/questions/53951431/searching-and-handling-git-objects