How can I normalize/collapse paths or URLs in Python in OS independent way?

时光怂恿深爱的人放手 提交于 2019-12-01 18:15:18
sorin

Here is how to do it

>>> import urlparse
>>> urlparse.urljoin("ftp://domain.com/a/b/c/d/", "../..")
'ftp://domain.com/a/b/'
>>> urlparse.urljoin("ftp://domain.com/a/b/c/d/e.txt", "../..")
'ftp://domain.com/a/b/'    

Remember that urljoin consider a path/directory all until the last / - after this is the filename, if any.

Also, do not add a leading / to the second parameter, otherwise you will not get the expected result.

os.path module is platform dependent but for file paths using only slashes but not-URLs you could use posixpath,normpath.

obskyr

Neither urljoin nor posixpath.normpath do the job properly. urljoin forces you to join with something, and doesn't handle absolute paths or excessive ..s correctly. posixpath.normpath collapses multiple slashes and removes trailing slashes, both of which are things URLs shouldn't do.


The following function resolves URLs completely, handling both .s and ..s, in a correct way according to RFC 3986.

try:
    # Python 3
    from urllib.parse import urlsplit, urlunsplit
except ImportError:
    # Python 2
    from urlparse import urlsplit, urlunsplit

def resolve_url(url):
    parts = list(urlsplit(url))
    segments = parts[2].split('/')
    segments = [segment + '/' for segment in segments[:-1]] + [segments[-1]]
    resolved = []
    for segment in segments:
        if segment in ('../', '..'):
            if resolved[1:]:
                resolved.pop()
        elif segment not in ('./', '.'):
            resolved.append(segment)
    parts[2] = ''.join(resolved)
    return urlunsplit(parts)

You can then call it on a complete URL as following.

>>> resolve_url("http://example.com/dir/../../thing/.")
'http://example.com/thing/'

For more information on the considerations that have to be made when resolving URLs, see a similar answer I wrote earlier on the subject.

adopted from os module " - os.path is one of the modules posixpath, or ntpath", in your case explicitly using posixpath.

   >>> import posixpath
    >>> posixpath.normpath("/a/b/../c")
    '/a/c'
    >>> 
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!