Apache Avro: map uses CharSequence as key

前端 未结 6 1435
别跟我提以往
别跟我提以往 2020-12-05 15:42

I am using Apache Avro.

My schema has map type:

{\"name\": \"MyData\", 
  \"type\" :  {\"type\": \"map\", 
              \"values\":{
                        


        
相关标签:
6条回答
  • 2020-12-05 16:24

    Apparently, by default, Avro uses CharSequence. I found a way to configure it to convert to String

    From Avro 1.6.0 onward, there is an option to have Avro always perform the conversion to String. There are a couple of ways to achieve this. The first is to set the avro.java.string property in the schema to String:

             { "type": "string", "avro.java.string": "String" }
    

    I have not tested this.

    0 讨论(0)
  • 2020-12-05 16:29

    Apparently, there is a workaround for this problem in Avro 1.6. You specify the string type in your project's POM file:

      <stringType>String</stringType>
    

    This is mentioned in this issue is AVRO-803 ... though the plugin's web documentation doesn't reflect this.

    0 讨论(0)
  • 2020-12-05 16:29

    a quick solution(the value type could be other Objects, now I am):

    Map<String, String> convertToStringMap(Map<CharSequence, CharSequence> map){
        if (null == map){
            return null;
        }
        HashMap<String, String> result = new  HashMap<String, String>();
        for(CharSequence key: map.keySet()){
            CharSequence k_value = map.get(key);
            String s_key = key.toString();
            String s_value = k_value.toString();
            result.put(s_key, s_value);
        }
        return result;
    }
    
    0 讨论(0)
  • 2020-12-05 16:36

    I think explicitly convert String to Utf8 will work. "some_key" -> new Utf8("some_key") and use this as your key for the map.

    0 讨论(0)
  • 2020-12-05 16:39

    This JIRA discussion is relevant. The main point of CharSequence still being used is backwards-compatability.

    And like Charles Forsythe pointed out, there has been added a workaround for when String is necessary, by setting the string property in the schema.

     { "type": "string", "avro.java.string": "String" }
    

    The default type here is their own Utf8 class. In addition to manual specification and the pom.xml setting, there is even an avro-tools compile option for it, the -string option:

    java -jar avro-tools.1.7.5.jar compile -string schema /path/to/schema .
    
    0 讨论(0)
  • 2020-12-05 16:39

    Regardless of whether it's possible to force Avro to use a String, using CharSequence directly is a bad implementation because CharSequence isn't Comparable<CharSequence> and doesn't even specify equality of two identical sequences. I suggest filing this as a bug against Avro.

    0 讨论(0)
提交回复
热议问题