Windows directory that will never contain non-ASCII characters for temp file?

余生长醉 提交于 2020-01-06 14:27:37

问题


Using MinGW 7.3.0 on Windows, Hunspell can't load the dictionary files from locations that have non-ASCII characters because of Windows limitations. I've tried everything[1] and I'm now resorting to copying the file to a path without ASCII characters before giving it to Hunspell. What is a good location to copy it to?

[1]

  1. Windows requires wchar_t support for std::iostream.open() to work right, which MinGW does not implement
  2. std::filesystem can solve this, but only available in GCC 8
  3. Hunspell insists on loading files on its own, it is not possible to pass the read files as strings to it

回答1:


The "natural" fit would be the use the user's choosen temporary directory (or subdirectory thereof) (see %temp% or GetTempPath()). However, that defaults to something that contains the user name (which can contain "non-ASCII" characters; e.g. c:\users\Ø¥Ć¼\AppData\LocalLow\Temp) or something arbitrary (regarding character set) all together.

So you're most likely best off to choose some directory that

a) does not contain off-limits characters from the get do. For example, a directory underneat C:\ProgramData that you choose yourself (e.g. the application name) that you know does not contain non-ASCII characters.

b) let the user decide where to put these files and make sure it is not permissible to enter a path that contains only allowed characters.

c) Pass the "short path name" to Hunspell, which should not contain non-ASCII characters for compatibility with FAT file system traits. For example, the short path name for c:\temp\Ø¥Ć¼ is c:\temp\571D~1.

You can see the short names for directories using cmd.exe /c dir /x:

C:\temp>dir /x
...    
19.07.2019  15:30    <DIR>                       .
19.07.2019  15:30    <DIR>                       ..
19.07.2019  15:30    <DIR>          571D~1       Ø¥Ć¼

How you can invoke the GetShortPathName Win32 API from MinGW I don't know, but I would assume that it is possible.

Also make sure to review the MSDN page for the above function for traitoffs, e.g. short names are not supported everywhere (e.g. SMB + see comments below).




回答2:


From this bug tracker:

In WIN32 environment, use UTF-8 encoded paths started with the long path prefix \\?\ to handle system-independent character encoding and very long path names (without the long path prefix Hunspell will use fopen() with system-dependent character encoding instead of _wfopen()).

So the actual solution seems to be:

  1. Call GetFullPathNameW to normalize the path. Required because paths with long path prefix \\?\ are passed to the NT API unchanged.
  2. Prepend L"\\\\?\\" to the normalized path (backslashes doubled because of C string literal requirements).
  3. For a UNC path, you have to use the "UNC" device directly (i. e. L"\\\\server\\share"L"\\\\?\\UNC\\server\\share" (thanks eryksun)
  4. Encode the path in UTF-8, e. g. using WideCharToMultiByte() with CP_UTF8.
  5. Pass the final UTF-8 encoded path to Hunspell.



回答3:


It looks like C:\Windows\Temp is still a valid path you can write to yourself.



来源:https://stackoverflow.com/questions/57112274/windows-directory-that-will-never-contain-non-ascii-characters-for-temp-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!