问题
A subsystem which I have no control over insists on providing filesystem paths in the form of a uri. Is there a python module/function which can convert this path into the appropriate form expected by the filesystem in a platform independent manner?
回答1:
The urlparse module provides the path from the URI:
import os, urlparse
p = urlparse.urlparse('file://C:/test/doc.txt')
finalPath = os.path.abspath(os.path.join(p.netloc, p.path))
回答2:
For future readers. The solution from @Jakob Bowyer doesn't convert URL characters to ascii. After a bit of digging I found this solution:
>>> import urllib, urlparse
>>> urllib.url2pathname(urlparse.urlparse('file:///home/user/some%20file.txt').path)
'/home/user/some file.txt'
EDIT:
Here's what I ended up using:
>>> import urllib
>>> urllib.unquote('file:///home/user/some%20file.txt')[7:]
'/home/user/some file.txt'
回答3:
To convert a file uri to a path with python (specific to 3, I can make for python 2 if someone really wants it):
- Parse the uri with
urllib.parse.urlparse
Unquote the path component of the parsed uri with
urllib.parse.unquote
then ...
a. If path is a windows path and starts with
/
: strip the first character of unquoted path component (path component offile:///C:/some/file.txt
is/C:/some/file.txt
which is not interpreted to be equivalent toC:\some\file.txt
bypathlib.PureWindowsPath
)b. Otherwise just use the unquoted path component as is.
Here is a function that does this:
import urllib
import pathlib
def file_uri_to_path(file_uri, path_class=pathlib.PurePath):
"""
This function returns a pathlib.PurePath object for the supplied file URI.
:param str file_uri: The file URI ...
:param class path_class: The type of path in the file_uri. By default it uses
the system specific path pathlib.PurePath, to force a specific type of path
pass pathlib.PureWindowsPath or pathlib.PurePosixPath
:returns: the pathlib.PurePath object
:rtype: pathlib.PurePath
"""
windows_path = isinstance(path_class(),pathlib.PureWindowsPath)
file_uri_parsed = urllib.parse.urlparse(file_uri)
file_uri_path_unquoted = urllib.parse.unquote(file_uri_parsed.path)
if windows_path and file_uri_path_unquoted.startswith("/"):
result = path_class(file_uri_path_unquoted[1:])
else:
result = path_class(file_uri_path_unquoted)
if result.is_absolute() == False:
raise ValueError("Invalid file uri {} : resulting path {} not absolute".format(
file_uri, result))
return result
Usage examples (ran on linux):
>>> file_uri_to_path("file:///etc/hosts")
PurePosixPath('/etc/hosts')
>>> file_uri_to_path("file:///etc/hosts", pathlib.PurePosixPath)
PurePosixPath('/etc/hosts')
>>> file_uri_to_path("file:///C:/Program Files/Steam/", pathlib.PureWindowsPath)
PureWindowsPath('C:/Program Files/Steam')
>>> file_uri_to_path("file:/proc/cpuinfo", pathlib.PurePosixPath)
PurePosixPath('/proc/cpuinfo')
>>> file_uri_to_path("file:c:/system32/etc/hosts", pathlib.PureWindowsPath)
PureWindowsPath('c:/system32/etc/hosts')
This function works for windows and posix file URIs and it will handle file URIs without an authority section. It will however NOT do validation of the URI's authority so this will not be honoured:
IETF RFC 8089: The "file" URI Scheme / 2. Syntax
The "host" is the fully qualified domain name of the system on which the file is accessible. This allows a client on another system to know that it cannot access the file system, or perhaps that it needs to use some other local mechanism to access the file.
Validation (pytest) for the function:
import os
import pytest
def validate(file_uri, expected_windows_path, expected_posix_path):
if expected_windows_path is not None:
expected_windows_path_object = pathlib.PureWindowsPath(expected_windows_path)
if expected_posix_path is not None:
expected_posix_path_object = pathlib.PurePosixPath(expected_posix_path)
if expected_windows_path is not None:
if os.name == "nt":
assert file_uri_to_path(file_uri) == expected_windows_path_object
assert file_uri_to_path(file_uri, pathlib.PureWindowsPath) == expected_windows_path_object
if expected_posix_path is not None:
if os.name != "nt":
assert file_uri_to_path(file_uri) == expected_posix_path_object
assert file_uri_to_path(file_uri, pathlib.PurePosixPath) == expected_posix_path_object
def test_some_paths():
validate(pathlib.PureWindowsPath(r"C:\Windows\System32\Drivers\etc\hosts").as_uri(),
expected_windows_path=r"C:\Windows\System32\Drivers\etc\hosts",
expected_posix_path=r"/C:/Windows/System32/Drivers/etc/hosts")
validate(pathlib.PurePosixPath(r"/C:/Windows/System32/Drivers/etc/hosts").as_uri(),
expected_windows_path=r"C:\Windows\System32\Drivers\etc\hosts",
expected_posix_path=r"/C:/Windows/System32/Drivers/etc/hosts")
validate(pathlib.PureWindowsPath(r"C:\some dir\some file").as_uri(),
expected_windows_path=r"C:\some dir\some file",
expected_posix_path=r"/C:/some dir/some file")
validate(pathlib.PurePosixPath(r"/C:/some dir/some file").as_uri(),
expected_windows_path=r"C:\some dir\some file",
expected_posix_path=r"/C:/some dir/some file")
def test_invalid_url():
with pytest.raises(ValueError) as excinfo:
validate(r"file://C:/test/doc.txt",
expected_windows_path=r"test\doc.txt",
expected_posix_path=r"/test/doc.txt")
assert "is not absolute" in str(excinfo.value)
def test_escaped():
validate(r"file:///home/user/some%20file.txt",
expected_windows_path=None,
expected_posix_path=r"/home/user/some file.txt")
validate(r"file:///C:/some%20dir/some%20file.txt",
expected_windows_path="C:\some dir\some file.txt",
expected_posix_path=r"/C:/some dir/some file.txt")
def test_no_authority():
validate(r"file:c:/path/to/file",
expected_windows_path=r"c:\path\to\file",
expected_posix_path=None)
validate(r"file:/path/to/file",
expected_windows_path=None,
expected_posix_path=r"/path/to/file")
This contribution is licensed (in addition to any other licenses which may apply) under the Zero-Clause BSD License (0BSD) license
Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
To the extent possible under law, Iwan Aucamp has waived all copyright and related or neighboring rights to this stackexchange contribution. This work is published from: Norway.
回答4:
The solution from @colton7909 is mostly correct and helped me get to this answer, but has some import errors with Python 3. That and I think this is a better way to deal with the 'file://'
part of the URL than simply chopping off the first 7 characters. So I feel this is the most idiomatic way to do this using the standard library:
import urllib.parse
url_data = urllib.parse.urlparse('file:///home/user/some%20file.txt')
path = urllib.parse.unquote(url_data.path)
This example should produce the string '/home/user/some file.txt'
来源:https://stackoverflow.com/questions/5977576/is-there-a-convenient-way-to-map-a-file-uri-to-os-path