问题
I have a data with chinese characters as field names and data, I have imported them from xls to access 2007 and export them to ODBC. Then I use RODBC to read them in R, the field names is OK, but for the data, all of the chinese characters are shown as ?
.
I have read the RODBC manual and it said:
If it is possible to set the DBMS or ODBC driver to communicate in the character set of the R session then this should be done. For example, MySQL can set the communication character set via SQL, e.g. SET NAMES 'utf8'.
I guess this is the problem, but how can I provide this command to MySQL via RODBC? Thanks!
回答1:
I'm not familiar with ODBC and RODBC
, but my reading of the above snippet of documentation is that SET NAMES 'utf8';
is part of MySQL's SQL dialect, so you run that as you would any other SQL statement that you might use to retrieve data from your data base.
Something like (not tested):
sqlQuery(myChannel, query = "SET NAMES 'utf8';")
where myChannel
is the connection handle returned by odbcConnect()
.
Is there a reason you are using RODBC over the RMySQL package? I've had good experience using RMySQL for extensive data processing and retrieval of complex sets of data all from within R.
Update:
There is some evidence that, at least at one point, that SET NAMES
has been deactivated in the MySQL ODBC driver. If you are confident you can read the characters via direct access to the database (via mysql
or one of MySQL's GUI front ends), then you could try to replicate what SET NAMES
does. The following is from the MySQL manual:
A SET NAMES 'x' statement is equivalent to these three statements:
SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;
You could try executing those three SQL statements in place of SET NAMES
and see if that works.
The same manual also documents SET CHARACTER SET
, which can be used in the same way as SET NAMES
:
SET CHARACTER SET charset_name
SET CHARACTER SET
is similar to SET NAMES
but sets character_set_connection
and collation_connection
to character_set_database
and collation_database
. A SET CHARACTER SET x
statement is equivalent to these three statements:
SET character_set_client = x;
SET character_set_results = x;
SET collation_connection = @@collation_database;
Setting collation_connection
also sets character_set_connection
to the character set associated with the collation (equivalent to executing SET character_set_connection = @@character_set_database
). It is not necessary to set character_set_connection
explicitly.
You could try using SET CHARACTER SET 'utf8'
instead.
Finally, what character set / locale are you running in? It looks like you are on windows - is this a UTF8 locale? I also note some confusion in your Q. You say you have imported your data to MS Access, and then export it to ODBC. Do you mean you exported it to MySQL? I though ODBC was a connection driver to allow communication with/between a range of databases, not something you could "export to".
Are you data really in MySQL? Could you not connect to MS Access via RODBC to read the data from there?
If the data are in MySQL, try using the RMySQL package to connect to the database and read the data.
回答2:
I just found the cure. Don't know if I can post.
Set up the MySQL database to be UTF-8 based;
Set up the ODBC DSN and do NOT set the "character set" option.
ch<-odbcConnect("mydb",DBMSencoding="UTF-8");
That's it.
来源:https://stackoverflow.com/questions/4653430/how-to-set-charset-for-mysql-in-rodbc