How to replace null values with a specific value in Dataframe using spark in Java?

后端未结

关注

 4  647

I am trying improve the accuracy of Logistic regression algorithm implemented in Spark using Java. For this I\'m trying to replace Null or invalid values present in a column

相关标签:

4条回答

天命终不由人

2020-12-05 14:23
In order to replace the NULL values with a given string I've used fill function present in Spark for Java. It accepts the word to be replaced with and a sequence of column names. Here is how I have implemented that:-
```
List<String> colList = new ArrayList<String>();
colList.add(cols[i]);
Seq<String> colSeq = scala.collection.JavaConverters.asScalaIteratorConverter(colList.iterator()).asScala().toSeq();
data=data.na().fill(word, colSeq);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-12-05 14:35
You can use .na.fill function (it is a function in org.apache.spark.sql.DataFrameNaFunctions).

Basically the function you need is: def fill(value: String, cols: Seq[String]): DataFrame

You can choose the columns, and you choose the value you want to replace the null or NaN.

In your case it will be something like:
```
val df2 = df.na.fill("a", Seq("Name"))
            .na.fill("a2", Seq("Place"))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
清歌不尽

2020-12-05 14:42
You can use DataFrame.na.fill() to replace the null with some value To update at once you can do as
```
val map = Map("Name" -> "a", "Place" -> "a2")

df.na.fill(map).show()
```
But if you want to replace a bad record too then you need to validate the bad records first. You can do this by using regular expression with like function.
0 讨论(0)
发布评论:

提交评论
- 加载中...
青春惊慌失措

2020-12-05 14:43
You'll want to use the fill(String value, String[] columns) method of your dataframe, which automatically replaces Null values in a given list of columns with the value you specified.

So if you already know the value that you want to replace Null with...:
```
String[] colNames = {"Name"}
dataframe = dataframe.na.fill("a", colNames)
```
You can do the same for the rest of your columns.
0 讨论(0)
发布评论:

提交评论
- 加载中...