filter dataframe from external file

百般思念 提交于 2019-12-12 05:39:53

问题


i want to filter my dataframe from an external file. this is how my dataframe look like:

val Insert=Append_Ot.filter(col("Name2").equalTo("brazil") || col("Name2").equalTo("france") || col("Name2").equalTo("algeria")|| col("Name2").equalTo("tunisia") || col("Name2").equalTo("egypte")  )

The number of countries that i want to filter them is changeable, so created an external this file:

 1  brazil
 2  france
 3  algeria
 4  tunisia
 5  egypte

i want to create UDF to filter my dataframe from this file.

Thank you


回答1:


You need to create a Seq from the file with which you want to filter. Something that looks like this:

val l = List("Brasil", "Algeria", "Tunisia", "Egypt")

You can use textFile method. Suppose your file contains:

1 Algeria
2 Tunisia
3 Brasil
4 Egypt

You can use:

val countries = sc.textFile("hdfs://namenode/user/cloudera/file").map(_.split(" ")(1)).collect

which will give you:

countries : Array[String] = Array(Algeria, Tunisia, Brasil, Egypt)

And then, use the isin function on your column Name2

val Insert = Append_Ot.where($"Name2".isin( countries : _* ) )


来源:https://stackoverflow.com/questions/45078066/filter-dataframe-from-external-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!