How to replace null values with a specific value in Dataframe using spark in Java?

后端 未结 4 647
抹茶落季
抹茶落季 2020-12-05 14:00

I am trying improve the accuracy of Logistic regression algorithm implemented in Spark using Java. For this I\'m trying to replace Null or invalid values present in a column

相关标签:
4条回答
  • 2020-12-05 14:23

    In order to replace the NULL values with a given string I've used fill function present in Spark for Java. It accepts the word to be replaced with and a sequence of column names. Here is how I have implemented that:-

    List<String> colList = new ArrayList<String>();
    colList.add(cols[i]);
    Seq<String> colSeq = scala.collection.JavaConverters.asScalaIteratorConverter(colList.iterator()).asScala().toSeq();
    data=data.na().fill(word, colSeq);
    
    0 讨论(0)
  • 2020-12-05 14:35

    You can use .na.fill function (it is a function in org.apache.spark.sql.DataFrameNaFunctions).

    Basically the function you need is: def fill(value: String, cols: Seq[String]): DataFrame

    You can choose the columns, and you choose the value you want to replace the null or NaN.

    In your case it will be something like:

    val df2 = df.na.fill("a", Seq("Name"))
                .na.fill("a2", Seq("Place"))
    
    0 讨论(0)
  • 2020-12-05 14:42

    You can use DataFrame.na.fill() to replace the null with some value To update at once you can do as

    val map = Map("Name" -> "a", "Place" -> "a2")
    
    df.na.fill(map).show()
    

    But if you want to replace a bad record too then you need to validate the bad records first. You can do this by using regular expression with like function.

    0 讨论(0)
  • 2020-12-05 14:43

    You'll want to use the fill(String value, String[] columns) method of your dataframe, which automatically replaces Null values in a given list of columns with the value you specified.

    So if you already know the value that you want to replace Null with...:

    String[] colNames = {"Name"}
    dataframe = dataframe.na.fill("a", colNames)
    

    You can do the same for the rest of your columns.

    0 讨论(0)
提交回复
热议问题