Writing Unicode from R to SQL Server

我的未来我决定 提交于 2020-06-11 05:12:09

问题


I'm trying to write Unicode strings from R to SQL, and then use that SQL table to power a Power BI dashboard. Unfortunately, the Unicode characters only seem to work when I load the table back into R, and not when I view the table in SSMS or Power BI.

require(odbc)
require(DBI)
require(dplyr)
con <- DBI::dbConnect(odbc::odbc(),
                      .connection_string = "DRIVER={ODBC Driver 13 for SQL Server};SERVER=R9-0KY02L01\\SQLEXPRESS;Database=Test;trusted_connection=yes;")
testData <- data_frame(Characters = "❤")
dbWriteTable(con,"TestUnicode",testData,overwrite=TRUE)
result <- dbReadTable(con, "TestUnicode")
result$Characters

Successfully yields:

> result$Characters
[1] "❤"

However, when I pull that table in SSMS:

SELECT * FROM TestUnicode

I get two different characters:

Characters
~~~~~~~~~~
â¤

Those characters are also what appear in Power BI. How do I correctly pull the heart character outside of R?


回答1:


It turns out this is a bug somewhere in R/DBI/the ODBC driver. The issue is that R stores strings as UTF-8 encoded, while SQL Server stores them as UTF-16LE encoded. Also, when dbWriteTable creates a table, it by default creates a VARCHAR column for strings which can't even hold Unicode characters. Thus, you need to both:

  1. Change the column in the R data frame from being a string column to a list column of UTF-16LE raw bytes.
  2. When using dbWriteTable, specify the field type as being NVARCHAR(MAX)

This seems like something that should still be handled by either DBI or ODBC or something though.

require(odbc)
require(DBI)

# This function takes a string vector and turns it into a list of raw UTF-16LE bytes. 
# These will be needed to load into SQL Server
convertToUTF16 <- function(s){
  lapply(s, function(x) unlist(iconv(x,from="UTF-8",to="UTF-16LE",toRaw=TRUE)))
}

# create a connection to a sql table
connectionString <- "[YOUR CONNECTION STRING]"
con <- DBI::dbConnect(odbc::odbc(),
                      .connection_string = connectionString)

# our example data
testData <- data.frame(ID = c(1,2,3), Char = c("I", "❤","Apples"), stringsAsFactors=FALSE)

# we adjust the column with the UTF-8 strings to instead be a list column of UTF-16LE bytes
testData$Char <- convertToUTF16(testData$Char)

# write the table to the database, specifying the field type
dbWriteTable(con, 
             "UnicodeExample", 
             testData, 
             append=TRUE, 
             field.types = c(Char = "NVARCHAR(MAX)"))

dbDisconnect(con)



回答2:


Inspired by last answer and github: r-dbi/DBI#215: Storing unicode characters in SQL Server

Following field.types = c(Char = "NVARCHAR(MAX)") but with vector and compute of max because of the error dbReadTable/dbGetQuery returns Invalid Descriptor Index .... :


vector_nvarchar<-c(Filter(Negate(is.null), 
                              (
                                lapply(testData,function(x){
                                  if (is.character(x) ) c(
                                    names(x),
                                    paste0("NVARCHAR(", 
                                           max(
                                             # nvarchar(max) gave error dbReadTable/dbGetQuery returns Invalid Descriptor Index error on SQL server 
                                             # https://github.com/r-dbi/odbc/issues/112  
                                             # so we compute the max                                           
                                             nchar(
                                               iconv( #nchar doesn't work for UTF-8 :  help (nchar)
                                                 Filter(Negate(is.null),x)
                                                 ,"UTF-8","ASCII",sub ="x" 
                                               )
                                             )
                                             ,na.rm = TRUE)
                                           ,")"
                                    )
                                  )
                                })
                              )
    ))

con= DBI::dbConnect(odbc::odbc(),.connection_string=xxxxt, encoding = 'UTF-8')

DBI::dbWriteTable(con,"UnicodeExample",testData, overwrite= TRUE, append=FALSE, field.types= vector_nvarchar)

 DBI::dbGetQuery(con,iconv('select * from UnicodeExample'))



回答3:


Inspired by the last answer I also tried to find an automated way for writing data frames to SQL server. I can not confirm the nvarchar(max) errors, so I ended up with these functions:

convertToUTF16_df <- function(df){
  output <- cbind(df[sapply(df, typeof) != "character"]
    , list.cbind(apply(df[sapply(df, typeof) == "character"], 2, function(x){
      return(lapply(x, function(y) unlist(iconv(y, from = "UTF-8", to = "UTF-16LE", toRaw = TRUE))))
    }))

  )[colnames(df)]

  return(output)
}

field_types <- function(df){

  output <- list()
  output[colnames(df)[sapply(df, typeof) == "character"]] <- "nvarchar(max)"

  return(output)
}

DBI::dbWriteTable(odbc_connect
                  , name = SQL("database.schema.table")
                  , value = convertToUTF16_df(df)
                  , overwrite = TRUE
                  , row.names = FALSE
                  , field.types = field_types(df)
)


来源:https://stackoverflow.com/questions/48105277/writing-unicode-from-r-to-sql-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!