Regex to match SSH url parts

ぃ、小莉子 提交于 2019-11-27 08:41:40

问题


Given the following SSH urls:

git@github.com:james/example
git@github.com:007/example
git@github.com:22/james/example
git@github.com:22/007/example

How can I pull the following:

{user}@{host}:{optional port}{path (user/repo)}

As you can see in the example, one of the usernames is numeric and NOT a port. I can't figure out how to workaround that. A port isn't always in the URL too.

My current regex is:

^(?P<user>[^@]+)@(?P<host>[^:\s]+)?:(?:(?P<port>\d{1,5})\/)?(?P<path>[^\\].*)$

Not sure what else to try.


回答1:


Lazy quantifiers to the rescue!

This seems to work well and satisfies the optional port:

^
(?P<user>.*?)@
(?P<host>.*?):
(?:(?P<port>.*?)/)?
(?P<path>.*?/.*?)
$

The line breaks are not part of the regex because the /x modifier is enabled. Remove all line breaks if you are not using /x.

https://regex101.com/r/wdE30O/5


Thank you @Jan for the optimizations.




回答2:


If you're on Python, you could write your very own parser:

from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

data = """git@github.com:james/example
git@github.com:007/example
git@github.com:22/james/example
git@github.com:22/007/example"""

class GitVisitor(NodeVisitor):
    grammar = Grammar(
        r"""
        expr        = user at domain colon rest

        user        = word+
        domain      = ~"[^:]+"
        rest        = (port path) / path

        path        = word slash word
        port        = digits slash

        slash       = "/"
        colon       = ":"
        at          = "@"
        digits      = ~"\d+"
        word        = ~"\w+"

        """)

    def generic_visit(self, node, visited_children):
        return visited_children or node

    def visit_user(self, node, visited_children):
        return {"user": node.text}

    def visit_domain(self, node, visited_children):
        return {"domain": node.text}

    def visit_rest(self, node, visited_children):
        child = visited_children[0]
        if isinstance(child, list):
            # first branch, port and path
            return {"port": child[0], "path": child[1]}
        else:
            return {"path": child}

    def visit_path(self, node, visited_children):
        return node.text

    def visit_port(self, node, visited_children):
        digits, _ = visited_children
        return digits.text

    def visit_expr(self, node, visited_children):
        out = {}
        _ = [out.update(child) for child in visited_children if isinstance(child, dict)]
        return out

gv = GitVisitor()
for line in data.split("\n"):
    result = gv.parse(line)
    print(result)

Which would yield

{'user': 'git', 'domain': 'github.com', 'path': 'james/example'}
{'user': 'git', 'domain': 'github.com', 'path': '007/example'}
{'user': 'git', 'domain': 'github.com', 'port': '22', 'path': 'james/example'}
{'user': 'git', 'domain': 'github.com', 'port': '22', 'path': '007/example'}

A parser allows for some ambiguity which you obviously have here.



来源:https://stackoverflow.com/questions/57698369/regex-to-match-ssh-url-parts

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!