String similarity with OR condition in MinHash Spark ML
问题 I have two datasets, first one is large reference dataset and from second dataset will find best match from first dataset through MinHash algorithm. val dataset1 = +-------------+----------+------+------+-----------------------+ | x'| y'| a'| b'| dataString(x'+y'+a')| +-------------+----------+------+------+-----------------------+ | John| Smith| 55649| 28200| John|Smith|55649| | Emma| Morales| 78439| 34200| Emma|Morales|78439| | Janet| Alvarado| 89488| 29103| Janet|Alvarado|89488| |