System.Uri drops Unicode RLM (Right-to-Left Mark; U+200F) character in .NET 4.5+

為{幸葍}努か 提交于 2021-02-08 13:47:35

问题


using System;

namespace UnicodeRlm
{
    class Program
    {
        static void Main(string[] args)
        {
            var uri = new Uri(
                "https://example.com/attachments/The title is \"مفتاح معايير الويب!‏\" in Arabic.pdf");
            Console.WriteLine(uri.AbsolutePath);
            Console.WriteLine(uri.AbsolutePath.Length);
        }
    }
}

Under .NET 4.0, this produces

/attachments/The%20title%20is%20%22%D9%85%D9%81%D8%AA%D8%A7%D8%AD%20%D9%85%D8%B9%D8%A7%D9%8A%D9%8A%D8%B1%20%D8%A7%D9%84%D9%88%D9%8A%D8%A8!%E2%80%8F%22%20in%20Arabic.pdf
168

Under .NET 4.5+, this produces

/attachments/The%20title%20is%20%22%D9%85%D9%81%D8%AA%D8%A7%D8%AD%20%D9%85%D8%B9%D8%A7%D9%8A%D9%8A%D8%B1%20%D8%A7%D9%84%D9%88%D9%8A%D8%A8!%22%20in%20Arabic.pdf
159

.NET 4.5 drops the %E2%80%8F part, which is the RLM character:

...!%E2%80%8F%22%20in%20Arabic.pdf
...!%22%20in%20Arabic.pdf

I have a hypothesis that this is caused by System.Uri escaping now supports RFC 3986, but my RFC-fu and Unicode-fu are failing me as to whether this RFC requires RLM to be dropped or wither this RLM character is placed correctly at all in the original string.

I'm not entirely sure whether this is the correct behavior standards-wise, but for me it's certainly not since I cannot download a file with an RLM character in the name in .NET 4.5 neither with WebClient nor with HttpWebRequest.

Is there any way to work around this quirk?


回答1:


In .Net 4.5 International Resource Identifier support was enabled by default. When targeting .Net 4.7.2 the right-to-left mark seems to be honored again, this could indicate there was a bug.

If the project needs to target .Net 4.5, the method ToggleIDNIRISupport in this post can help to overcome the issue.

Call the method like this:

ToggleIDNIRISupport(false);

When constructing the URI after this method call, it contains the right-to-left mark.



来源:https://stackoverflow.com/questions/65805812/system-uri-drops-unicode-rlm-right-to-left-mark-u200f-character-in-net-4-5

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!