问题
I have an external table with below DDL
CREATE EXTERNAL TABLE `table_1`(
`name` string COMMENT 'from deserializer',
`desc1` string COMMENT 'from deserializer',
`desc2` string COMMENT 'from deserializer',
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'quoteChar'='\"',
'separatorChar'='|',
'skip.header.line.count'='1')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://temp_loc/temp_csv/'
TBLPROPERTIES (
'classification'='csv',
'compressionType'='none',
'typeOfData'='file')
The csv files that this table reads are UTF-16 LE encoded when trying to render the output using Athena the special characters are being displayed as question marks in the output. Is there any way to set encoding in Athena or to fix this.
回答1:
The solution, as Philipp Johannis mentions in a comment, is to set the serialization.encoding
table property to "UTF-16LE". As far as I can see LazySimpleSerde
uses java.nio.charset.Charset.forName, so any encoding/charset name accepted by Java should work.
来源:https://stackoverflow.com/questions/64166953/athena-displays-special-characters-as