How to compare two StructType sharing same contents?

北城以北 提交于 2020-12-13 03:30:24

问题


It seems like StructType preserves order, so two StructType containing same StructFields are not considered equivalent.

For example:

val st1 = StructType(
StructField("ii",StringType,true) :: 
StructField("i",StringType,true) :: Nil)

val st2 = StructType(
StructField("i",StringType,true) :: 
StructField("ii",StringType,true) :: Nil)

println(st1 == st2)

returns false even though they both have StructField("i",StringType,true) and StructField("ii",StringType,true), just in different order.

I need a test that can say that these two are equivalent because for my purpose, these two are not different.

val schema1 = StructType(StructField("A",ArrayType(st1,true),true) :: Nil)

val schema2 = StructType(StructField("A",ArrayType(st2,true),true) :: Nil)

val final_schema = StructType((schema1 ++ schema2).distinct)

The result of final_schmea should only have one StructType of A instead of two, but distinct considers these two StructType as different, so I end up getting two different StructField named A. So my question is, is there a way to compare two StructTypes based on their contents, not on orders?

EDIT:

After further investigation, since StructType is basically Seq<StructField>, I can do content comparison for that works for Seq, but I am trying to think of a way I can do comparison for embedded StructType most efficiently.


回答1:


This can probably be cleaned up, but it works and handles nested StructType:

def isEqual(struct1: StructType, struct2: StructType): Boolean = {
  struct1.headOption match {
    case Some(field) => {
      if(field.dataType.typeName != "struct") {
        struct2.find(_ == field) match {
         case Some(matchedField) => isEqual(StructType(struct1.filterNot(_ == field)), StructType(struct2.filterNot(_ == field)))
         case None => false
        }
      } else {
        val isEqualContents = struct2.find(x => x.name == field.name && x.nullable == field.nullable && x.dataType.typeName == "struct") match {
          case Some(matchedField) => isEqual(field.dataType.asInstanceOf[StructType], matchedField.dataType.asInstanceOf[StructType])
          case None => false
        }
        if(isEqualContents) isEqual(StructType(struct1.filterNot(_ == field)), StructType(struct2.filterNot(_ == field))) else false
      }
    }
    case None => struct2.size == 0
  }
}

val st1 = StructType(
StructField("ii",StringType,true) :: 
StructField("i",StringType,true) :: 
StructField("iii", StructType(StructField("iv", StringType, true) :: Nil), true) :: Nil)

val st2 = StructType(
StructField("i",StringType,true) :: 
StructField("ii",StringType,true) :: 
StructField("iii", StructType(StructField("v", StringType, true) :: Nil), true) :: Nil)

isEqual(st1, st2)

It could also use a little more love to become tail recursive, too.




回答2:


I compare schemas as follows :

assert(structType1 == structType2, "not equal schemas")

Even in Spark's code, they compare StructTypes using ' == '

You can checkout the TableScanSuite.scala under org.apache.spark.sql.sources

https://github.com/apache/spark/blob/8b7d4f842fdc90b8d1c37080bdd9b5e1d070f5c0/sql/core/src/test/scala/org/apache/spark/sql/sources/TableScanSuite.scala#L249

assert(expectedSchema == spark.table("tableWithSchema").schema)

I hope it helps



来源:https://stackoverflow.com/questions/41372002/how-to-compare-two-structtype-sharing-same-contents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!