问题
In my program I'm reading data from a HTML file, in this file however there are sometimes pieces of text-data containing unicode characters wich get converted back to UTF-8 :
Michèle --> Michèle
I'm using the following code to retrieve the data from the file :
string ConnectionString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"HTML import;CharacterSet=UNICODE;HDR=NO;IMEX=1;MaxScanRows=0;TypeGuessRows=0\";", fileName);
using (OleDbConnection cn = new OleDbConnection(ConnectionString))
{
cn.Open();
using (OleDbCommand cm = cn.CreateCommand())
{
string pageName = "bestelformulier";
cm.CommandText = string.Format("Select * from [{0}]", pageName);
cm.CommandType = CommandType.Text;
var da = new OleDbDataAdapter(cm);
var ds = new DataSet();
da.Fill(ds);
var dt = ds.Tables[0];
ProcessRows(dt);
}
}
I've searched google trying to find out how to 'tell' OleDb to use Unicode when reading the file but I haven't found a solution yet (using extended property CharacterSet=UNICODE doesn't seem to work). Can anyone here help me out ?
thanks, Jurjen.
20-mrt-2012, 15.41
I have not found a solution but I created a small function te restore the string to it's original state, so that's what I'm using now for all string fields after reading the data.
private static Encoding iso = Encoding.GetEncoding("ISO-8859-1");
public static string RepairUTF8(this string value)
{
byte[] bytes = iso.GetBytes(value);
if (bytes.Any(o => o.Equals(195)))
{
return Encoding.UTF8.GetString(bytes);
}
return value;
}
来源:https://stackoverflow.com/questions/9773538/oledbconnection-string-and-unicode