Jaro-Winkler score calculation in Apache Spark
问题 We need to implement Jaro-Winkler distance calculation across string in Apache Spark Dataset . We are new to spark and after searching in web we are not able to find much. It would be great if you can guide us. We thought of using flatMap then realized it won’t help, then we tried to use couple of foreach loops but not able to figure how to go forward. As each of the string has to be compared against all. Like in the below dataset. RowFactory.create(0, "Hi I heard about Spark"), RowFactory