How can I normalize/collapse paths or URLs in Python in OS independent way?

I tried to use os.normpath in order to convert http://example.com/a/b/c/../ to http://example.com/a/b/ but it doesn't work on Windows because it does convert the slash to backslash.

sorin

Here is how to do it

>>> import urlparse
>>> urlparse.urljoin("ftp://domain.com/a/b/c/d/", "../..")
'ftp://domain.com/a/b/'
>>> urlparse.urljoin("ftp://domain.com/a/b/c/d/e.txt", "../..")
'ftp://domain.com/a/b/'

Remember that urljoin consider a path/directory all until the last / - after this is the filename, if any.

Also, do not add a leading / to the second parameter, otherwise you will not get the expected result.

os.path module is platform dependent but for file paths using only slashes but not-URLs you could use posixpath,normpath.

obskyr

Neither urljoin nor posixpath.normpath do the job properly. urljoin forces you to join with something, and doesn't handle absolute paths or excessive ..s correctly. posixpath.normpath collapses multiple slashes and removes trailing slashes, both of which are things URLs shouldn't do.

The following function resolves URLs completely, handling both .s and ..s, in a correct way according to RFC 3986.

try:
    # Python 3
    from urllib.parse import urlsplit, urlunsplit
except ImportError:
    # Python 2
    from urlparse import urlsplit, urlunsplit

def resolve_url(url):
    parts = list(urlsplit(url))
    segments = parts[2].split('/')
    segments = [segment + '/' for segment in segments[:-1]] + [segments[-1]]
    resolved = []
    for segment in segments:
        if segment in ('../', '..'):
            if resolved[1:]:
                resolved.pop()
        elif segment not in ('./', '.'):
            resolved.append(segment)
    parts[2] = ''.join(resolved)
    return urlunsplit(parts)

You can then call it on a complete URL as following.

>>> resolve_url("http://example.com/dir/../../thing/.")
'http://example.com/thing/'

For more information on the considerations that have to be made when resolving URLs, see a similar answer I wrote earlier on the subject.

adopted from os module " - os.path is one of the modules posixpath, or ntpath", in your case explicitly using posixpath.

   >>> import posixpath
    >>> posixpath.normpath("/a/b/../c")
    '/a/c'
    >>>

来源：https://stackoverflow.com/questions/2131290/how-can-i-normalize-collapse-paths-or-urls-in-python-in-os-independent-way

标签

python

url

path

normalize