Extract last non-missing value in row with data.table

前端未结

关注

 5  882

囚心锁ツ 2021-01-04 04:03

I have a data.table of factor columns, and I want to pull out the label of the last non-missing value in each row. It\'s kindof a typical max.col situation, bu

5条回答

甜味超标 (楼主)

2021-01-04 04:38
We convert the 'data.frame' to 'data.table' and create a row id column (setDT(df1, keep.rownames=TRUE)). We reshape the 'wide' to 'long' format with melt. Grouped by 'rn', if there is no NA element in 'value' column, we get the last element of 'value' (value[.N]) or else, we get the element before the first NA in the 'value' to get the 'V1' column, which we extract ($V1).
```
melt(setDT(df1, keep.rownames=TRUE), id.var='rn')[,
     if(!any(is.na(value))) value[.N] 
     else value[which(is.na(value))[1]-1], by =  rn]$V1
#[1] "u" "q" "w" "h" "r" "t" "e" "t"
```
In case, the data is already a data.table
```
dat[, rn := 1:.N]#create the 'rn' column
melt(dat, id.var='rn')[, #melt from wide to long format
     if(!any(is.na(value))) value[.N] 
     else value[which(is.na(value))[1]-1], by =  rn]$V1
#[1] "u" "q" "w" "h" "r" "t" "e" "t"
```
Here is another option
```
dat[, colInd := sum(!is.na(.SD)), by=1:nrow(dat)][
   , as.character(.SD[[.BY[[1]]]]), by=colInd]
```
Or as @Frank mentioned in the comments, we can use na.rm=TRUE from melt and make it more compact
```
 melt(dat[, r := .I], id="r", na.rm=TRUE)[, value[.N], by=r]
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...