问题
I'm developing a git post-receive hook in Python. Data is supplied on stdin with lines similar to
ef4d4037f8568e386629457d4d960915a85da2ae 61a4033ccf9159ae69f951f709d9c987d3c9f580 refs/heads/master
The first hash is the old-ref, the second the new-ref and the third column is the reference being updated.
I want to split this into 3 variables, whilst also validating input. How do I validate the branch name?
I am currently using the following regular expression
^([0-9a-f]{40}) ([0-9a-f]{40}) refs/heads/([0-9a-zA-Z]+)$
This doesn't accept all possible branch names, as set out by man git-check-ref-format. For example, it excludes a branch by the name of build-master, which is valid.
Bonus marks
I actually want to exclude any branch that starts with "build-". Can this be done in the same regex?
Tests
Given the great answers below, I wrote some tests, which can be found at https://github.com/alexchamberlain/githooks/blob/master/miscellaneous/git-branch-re-test.py.
Status: All the regexes below are failing to compile. This could indicate there's a problem with my script or incompatible syntaxes.
回答1:
Let's dissect the various rules and build regex parts from them:
They can include slash
/for hierarchical (directory) grouping, but no slash-separated component can begin with a dot.or end with the sequence.lock.# must not contain /. (?!.*/\.) # must not end with .lock (?<!\.lock)$They must contain at least one
/. This enforces the presence of a category like heads/, tags/ etc. but the actual names are not restricted. If the--allow-oneleveloption is used, this rule is waived..+/.+ # may get more precise laterThey cannot have two consecutive dots
..anywhere.(?!.*\.\.)They cannot have ASCII control characters (i.e. bytes whose values are lower than
\040, or\177 DEL), space, tilde~, caret^, or colon:anywhere.[^\000-\037\177 ~^:]+ # pattern for allowed charactersThey cannot have question-mark
?, asterisk*, or open bracket[anywhere. See the--refspec-patternoption below for an exception to this rule.[^\000-\037\177 ~^:?*[]+ # new pattern for allowed charactersThey cannot begin or end with a slash
/or contain multiple consecutive slashes (see the--normalizeoption below for an exception to this rule)^(?!/) (?<!/)$ (?!.*//)They cannot end with a dot
..(?<!\.)$They cannot contain a sequence
@{.(?!.*@\{)They cannot be the single character
@.(?!@$)They cannot contain a
\.(?!.*\\)
Piecing it all together we arrive at the following monstrosity:
^(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!@$)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$
And if you want to exclude those that start with build- then just add another lookahead:
^(?!build-)(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!@$)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$
This can be optimized a bit as well by conflating a few things that look for common patterns:
^(?!@$|build-|/|.*([/.]\.|//|@\{|\\))[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock|[/.])$
回答2:
git check-ref-format <ref> with subprocess.Popen is a possibility:
import subprocess
process = subprocess.Popen(["git", "check-ref-format", ref])
exit_status = process.wait()
Advantages:
- if the algorithm ever changes, the check will update automatically
- you are sure to get it right, which is way harder with a monster Regex
Disadvantages:
- slower because subprocess. But premature optimization is the root of all evil.
- requires Git as a binary dependency. But in the case of a hook it will always be there.
pygit2, which uses C bindings to libgit2, would be an even better possibility if check-ref-format is exposed there, as it would be faster than Popen, but I haven't found it.
回答3:
There's no need to write monstrosities in Perl. Just use /x:
# RegExp rules based on git-check-ref-format
my $valid_ref_name = qr%
^
(?!
# begins with
/| # (from #6) cannot begin with /
# contains
.*(?:
[/.]\.| # (from #1,3) cannot contain /. or ..
//| # (from #6) cannot contain multiple consecutive slashes
@\{| # (from #8) cannot contain a sequence @{
\\ # (from #9) cannot contain a \
)
)
# (from #2) (waiving this rule; too strict)
[^\040\177 ~^:?*[]+ # (from #4-5) valid character rules
# ends with
(?<!\.lock) # (from #1) cannot end with .lock
(?<![/.]) # (from #6-7) cannot end with / or .
$
%x;
foreach my $branch (qw(
master
.master
build/master
ref/HEAD/blah
/HEAD/blah
HEAD/blah/
master.lock
head/@{block}
master.
build//master
build\master
build\\master
),
'master blaster',
) {
print "$branch --> ".($branch =~ $valid_ref_name)."\n";
}
Joey++ for some of the code, though I made some corrections.
回答4:
Taking the rules directly from the linked page, the following regular expression should match only valid branch names in refs/heads not starting with "build-":
refs/heads/(?!.)(?!build-)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+))(?<!\.)(?<!\.lock)
This starts with refs/heads as yours does.
Then (?!build-) checks that the next 6 characters are not build- and (?!.) checks that the branch does not start with a ..
The entire group (((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+) matches the branch name.
(?!\.\.) checks that there are no instances of two periods in a row, and (?!@{) checks that the branch does not contain @{.
Then [^\cA-\cZ ~^:?*[\\] matches any of the allowed characters by excluding control characters \cA-\cZ and all of the rest of the characters that are specifically forbidden.
Finally, (?<!\.) makes sure that the branch name did not end with a period and (?<!.lock) checks that it did not end with .\lock.
This can be extended to similarly match valid branch names in arbitrary folders, you can use
(?!.)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+))(/(?!.)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+)))*?/(?!.)(?!build-)((?!\.\.)(?!@{)[^\cA-\cZ ~^:?*[\\])+))(?<!\.)(?<!\.lock)
This applies basically the same rules to each piece of the branch name, but only checks that the last one does not start with build-
回答5:
For anyone coming to this question looking for the PCRE regular expression to match a valid Git branch name, it is the following:
^(?!/|.*([/.]\.|//|@\{|\\\\))[^\040\177 ~^:?*\[]+(?<!\.lock|[/.])$
This is an amended version of the regular expression written by Joey. In this version, however, an oblique is not required (it is for matching branchName rather than refs/heads/branchName).
Please refer to his correct answer to this question. He provides a full breakdown of each part of the regex, and how it relates to each requirement specified on the git-check-ref-format(1) manual pages.
来源:https://stackoverflow.com/questions/12093748/how-do-i-check-for-valid-git-branch-names