Translate url to a valid file name and back to url

佐手、 提交于 2019-12-10 16:41:48

问题


I need to store some information that is unique for each site that my users accesses. (It is actually a thumbnail of the site that he has looked at.)
This thumbnail (jpeg file) needs to have a name indicating which site it represents so that it can be viewed later on.

Can you recommend a simple translation from url to a valid file name and back?

Example: www.ibm.com could be mapped to www_ibm_com.

I am not sure that this will always work with all valid urls in some cases urls have very complex query strings.

Is there a good regex or c# library that can be used?

Thanks in advance and be happy.


回答1:


Firstly it's worth pointing out that "." is perfectly legal in file names, but "/" isn't, so while the example you quote doesn't need translating, "www.ibm.com/path1/file1.jpg" would.

A simple string.Replace would be the best solution here - assuming you can find a character that's legal in a file name but illegal in a url.

Assuming that the illegal URL character is "§" (which may be legal in a URL), then you've got:

string.Replace("/", "§");

to translate to a file name and:

string.Replace("§", "/");

to translate back.

This page on URL Encoding defines what are valid, invalid and unsafe (valid but with special meaning) characters for URLS. Characters in the "top half" of the ISO-Latin set 80-FF hex (128-255 decimal.) are not legal but might be OK in file names.

You will need to do this for each character in the URL that is in the set of invalid file name characters. You can get this using GetInvalidFileNameChars.

UPDATE

Assuming that you can't find suitable character pairs, then another solution would be to use a lookup table. One column holds the URL the other the generated filename. As long as the generated name is unique (a GUID would do), you can do a two way lookup to get from one to the other.




回答2:


www.ibm.com is actually a valid filename. More problematic are slashes. So if the URL contains subdirectories, you'll need to translate the slashes.

The main problem then is possible duplicates. For example, both ibm.com/path1_path2 and ibm.com/path1/path2 would translate to the same value.

I like ChrisF's suggestion of find a character that is legal in filenames but not in URLs, although I don't even know which character, if any, that would be off the top of my head.

If you don't find such a character, then you may need to stick with an unlikely character instead.



来源:https://stackoverflow.com/questions/4423200/translate-url-to-a-valid-file-name-and-back-to-url

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!