Pathlib 'normalizes' UNC paths with “$”

感情迁移 提交于 2021-02-16 14:23:51

问题


On Python3.8, I'm trying to use pathlib to concatenate a string to a UNC path that's on a remote computer's C drive.
It's weirdly inconsistent.
For example:

>>> remote = Path("\\\\remote\\", "C$\\Some\\Path")
>>> remote
WindowsPath('//remote//C$/Some/Path')

>>> remote2 = Path(remote, "More")
>>> remote2
WindowsPath('/remote/C$/Some/Path/More')

Notice how the initial // is turned into /?
Put the initial path in one line though, and everything is fine:

>>> remote = Path("\\\\remote\\C$\\Some\\Path")
>>> remote
WindowsPath('//remote/C$/Some/Path')

>>> remote2 = Path(remote, "more")
>>> remote2
WindowsPath('//remote/C$/Some/Path/more')

This works as a workaround, but I suspect I'm misunderstanding how it's supposed to work or doing it wrong.
Anyone got a clue what's happening?


回答1:


tldr: you should give the entire UNC share (\\\\host\\share) as a single unit, pathlib has special-case handling of UNC paths but it needs specifically this prefix in order to recognize a path as UNC. You can't use pathlib's facilities to separately manage host and share, it makes pathlib blow a gasket.

The Path constructor normalises (deduplicates) path separators:

>>> PPP('///foo//bar////qux')
PurePosixPath('/foo/bar/qux')
>>> PWP('///foo//bar////qux')
PureWindowsPath('/foo/bar/qux')

PureWindowsPath has a special case for paths recognised as UNC, that is //host/share... which avoids collapsing leading separators.

However your initial concatenation puts it in a weird funk because it creates a path of the form //host//share... then the path gets converted back to a string when passed to the constructor, at which point it doesn't match a UNC anymore and all the separators get collapsed:

>>> PWP("\\\\remote\\", "C$\\Some\\Path")
PureWindowsPath('//remote//C$/Some/Path')
>>> str(PWP("\\\\remote\\", "C$\\Some\\Path"))
'\\\\remote\\\\C$\\Some\\Path'
>>> PWP(str(PWP("\\\\remote\\", "C$\\Some\\Path")))
PureWindowsPath('/remote/C$/Some/Path')

the issue seems to be specifically the presence of a trailing separator on a UNC-looking path, I don't know if it's a bug or if it's matching some other UNC-style (but not UNC) special case:

>>> PWP("//remote")
PureWindowsPath('/remote')
>>> PWP("//remote/")
PureWindowsPath('//remote//') # this one is weird, the trailing separator gets doubled which breaks everything
>>> PWP("//remote/foo")
PureWindowsPath('//remote/foo/')
>>> PWP("//remote//foo")
PureWindowsPath('/remote/foo')

These behaviours don't really seem documented, the pathlib doc specifically notes that it collapses path separators, and has a few examples of UNC which show that it doesn't, but I don't really know what's supposed to happen exactly. Either way it only seems to handle UNC paths somewhat properly if the first two segments are kept as a single "drive" unit, and that the share-path is considered a drive is specifically documented.

Of note: using joinpath / / doesn't seem to trigger a re-normalisation, your path remains improper (because the second pathsep between host and share remains doubled) but it doesn't get completely collapsed.



来源:https://stackoverflow.com/questions/60074886/pathlib-normalizes-unc-paths-with

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!