C# Encode connection string from win1251 to utf8 and back

孤街浪徒 提交于 2019-12-11 07:31:33

问题


I'm trying to work around the problem with connection string encoding in Firebird .net provider ver >= 5.6.0.0 (current is 5.8.0.0). The full description of the problem (if you are interested in) is here, but I think I could explain it briefly. So let's start! I have a system default encoding win1251 and a connection string that contains a param calls "DbPath" with value

   "F:\\Рабочая\\БД\\2.14.1\\January_2017\\MYDB.IB" 

When I pass this connection string to firebird .net provider it takes "DbPath" param from connection string and get bytes from its value using Encoding.UTF-8. That's how it looks in their code:

protected virtual void SendAttachToBuffer(DatabaseParameterBuffer dpb, string database)
{
    XdrStream.Write(IscCodes.op_attach);
    XdrStream.Write(0);
    if (!string.IsNullOrEmpty(Password))
    {
      dpb.Append(IscCodes.isc_dpb_password, Password);
    }

    //database is DbPath
    XdrStream.WriteBuffer(Encoding.UTF8.GetBytes(database)); 

    XdrStream.WriteBuffer(dpb.ToArray());
}

As you see they don't convert encoding from win1251 to utf-8, they just get bytes using Encoding.UTF8.GetBytes();

And later in their code I see that they just get a string using current Encoding (Encoding.Default):

public string GetString(byte[] buffer, int index, int count)
{
  //_encoding is Encoding.Default == win1251
  return _encoding.GetString(buffer, index, count);
}

And the result of this lines of code is that I get an I/O Exception cause my DbPath becomes to

"F:\\Рабочая\\БД\\2.14.1\\January_2017\\MYDB.IB" 

So the first thing that I've tried is to convert my connection string to utf-8 using this lines of code:

 private static string Win1251ToUTF8(string source)
 {
   Encoding utf8 = Encoding.GetEncoding("utf-8");
   Encoding win1251 = Encoding.GetEncoding("windows-1251");
   byte[] win1251Bytes = win1251.GetBytes(source);
   byte[] utf8bytes = Encoding.Convert(win1251, utf8, win1251Bytes);
   source = utf8.GetString(utf8bytes);
   return source;
   //Actually I'm not sure that I'm converting Encoding correctly

 }

But it didn't affect. I've tried many variants with Encoding.Convert but I've not a solution yet. Can someone tell me please what I'm doing wrong and how I can solve the problem. Regards.


回答1:


I recommend you to try the following code, maybe it helps you. Create a new C# WindowsFormApplication, put a BIG multiline texBox "textBox1" and a button "button1" on it. In the button click handler put this code:

    // ----- The work -------------------------------------------------
    string source = "F:\\\\Рабочая\\\\БД\\\\2.14.1\\\\January_2017\\\\MYDB.IB";
    Encoding utf8 = Encoding.UTF8;
    Encoding unicode = Encoding.Unicode;
    Encoding win1251 = Encoding.GetEncoding("windows-1251");
    byte[] utf8Bytes = utf8.GetBytes(source);
    byte[] win1251Bytes = win1251.GetBytes(source);
    byte[] utf8ofwinBytes = Encoding.Convert(win1251, utf8, win1251Bytes);
    string unicodefromutf8 = utf8.GetString(utf8Bytes);
    string unicodefromwin1251 = win1251.GetString(win1251Bytes);

    // ----- The show -------------------------------------------------

    textBox1.Text = "";

    textBox1.Text += "Literal Unicode soource" + Environment.NewLine;
    textBox1.Text += source + Environment.NewLine + Environment.NewLine;

    string s1 = "";
    textBox1.Text += "UTF8" + Environment.NewLine;
    for (int i = 0; i < utf8Bytes.Length; i++)
    {
        s1 += utf8Bytes[i].ToString() + ", ";
    }
    textBox1.Text += s1 + Environment.NewLine + Environment.NewLine;

    s1 = "";
    textBox1.Text += "WIN 1251" + Environment.NewLine;
    for (int i = 0; i < win1251Bytes.Length; i++)
    {
        s1 += win1251Bytes[i].ToString() + ", ";
    }
    textBox1.Text += s1 + Environment.NewLine + Environment.NewLine;

    s1 = "";
    textBox1.Text += "UTF8 of WIN 1251" + Environment.NewLine;
    for (int i = 0; i < utf8ofwinBytes.Length; i++)
    {
        s1 += utf8ofwinBytes[i].ToString() + ", ";
    }
    textBox1.Text += s1 + Environment.NewLine + Environment.NewLine;


    textBox1.Text += "Unicode string of UTF8 bytes" + Environment.NewLine;
    textBox1.Text += unicodefromutf8 + Environment.NewLine + Environment.NewLine;

    textBox1.Text += "Unicode string of WIN 1251 bytes" + Environment.NewLine;
    textBox1.Text += unicodefromwin1251 + Environment.NewLine + Environment.NewLine;

Run it, click the button and you will see, all converting, encoding is done as it should.

You asked for a way to convert Unicode to UTF8 to WIN1251 to UTF8 to UNICODE - here it is.

Your misunderstanding may be:

source = utf8.GetString(utf8bytes);
return source;

This will convert the created UTF8 byte sequence array to an Unicode string. So you return an Unicode string, not a UTF8-byte-sequence of your win-1251 string. Exactly, you return the same string you get.

You have to push the (proper zero terminated) UTF8-byte-sequence to the .Net provider.




回答2:


Use Encoding.Convert to convert charsets:

Encoding utf8 = Encoding.UTF8;
Encoding win = Encoding.GetEncoding("windows-1251");
byte[] winBytes = win.GetBytes(source);
byte[] utfBytes = Encoding.Convert(win, utf8, winBytes);
string result = utf8.GetString(utfBytes);


来源:https://stackoverflow.com/questions/42884025/c-sharp-encode-connection-string-from-win1251-to-utf8-and-back

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!