Using lapply to create new columns based on old columns

倾然丶 夕夏残阳落幕 提交于 2021-02-05 09:34:31

问题


My data looks as follows:

DF <- structure(list(No_Adjusted_Gross_Income = c(183454, 241199, 249506
), NoR_from_1_to_5000 = c(1035373, 4272260, 1124098), NoR_from_5000_to_10000 = c(319540, 
4826042, 1959866)), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"))
val <- c(2500.5, 7500)
vn <- c("AGI_from_1_to_5000", "AGI_from_5000_to_10000")

   No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000
1:                   183454            1035373                 319540
2:                   241199            4272260                4826042
3:                   249506            1124098                1959866

I would like to create new columns, based on column 2 and 3, multiplied with the values from val, using the names from vn. I tried to do it as follows:

DF[,2:3] <- lapply(DF[,2:3], vn := val*DF[,2:3])

But this does not work..

Desired output:

DF <- setDT(DF)[, vn[1]:=val[1]*DF[,2]]
DF <- setDT(DF)[, vn[2]:=val[2]*DF[,3]]

DFout <- structure(list(No_Adjusted_Gross_Income = c(183454, 241199, 249506
), NoR_from_1_to_5000 = c(1035373, 4272260, 1124098), NoR_from_5000_to_10000 = c(319540, 
4826042, 1959866), AGI_from_1_to_5000 = c(2588950186.5, 10682786130, 
2810807049), AGI_from_5000_to_10000 = c(2396550000, 36195315000, 
14698995000)), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"))

   No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000
1:                   183454            1035373                 319540         2588950187             2396550000
2:                   241199            4272260                4826042        10682786130            36195315000
3:                   249506            1124098                1959866         2810807049            14698995000

回答1:


This should work.. lapply() is not needed

library( data.table )
setDT( DF )
DF[, (var) := as.data.table ( t( t( DF[, 2:3] ) * val ) ) ][]


#    No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000
# 1:                   183454            1035373                 319540         2588950187             2396550000
# 2:                   241199            4272260                4826042        10682786130            36195315000
# 3:                   249506            1124098                1959866         2810807049            14698995000



回答2:


you can use apply to get your values, then use cbind if you want to combine with your original DF

t(apply(DF[,2:3],1, function(x) x*val ))

 NoR_from_1_to_5000 NoR_from_5000_to_10000
[1,]         2588950187             2396550000
[2,]        10682786130            36195315000
[3,]         2810807049            14698995000



回答3:


The OP has asked in a comment for a grouping variable.

Although the accepted answer apparently does what the OP initially has asked for I would like to suggest a completey different approach where the data is stored and processed in tidy (long) format. IMHO, processing data in long format is much more straightforward and flexible (which includes aggregation & grouping).

For this, the dataset is reshaped from wide, Excel-style format to long, SQL-style format by

library(data.table)
col <- "NoR"
long <- melt(DF, measure.vars = patterns(col), value.name = col, variable.name = "range")
long[, range := stringr::str_remove(range, paste0(col, "_"))]
long
   No_Adjusted_Gross_Income              range     NoR
1:                   183454     from_1_to_5000 1035373
2:                   241199     from_1_to_5000 4272260
3:                   249506     from_1_to_5000 1124098
4:                   183454 from_5000_to_10000  319540
5:                   241199 from_5000_to_10000 4826042
6:                   249506 from_5000_to_10000 1959866

In tidy (long) format there is one row for each observation and one column for each variable (see Chapter 12.2 of Hadley Wickham's book R for Data Science.

The vector of multipliers val also needs to be reshaped from wide to long format:

valDF <- long[, .(range = unique(range), val)]
valDF
                range    val
1:     from_1_to_5000 2500.5
2: from_5000_to_10000 7500.0

Now, valDF is also in tidy format as there is one row for each range.

Finally, we can add a new column AGI to DF by an update join:

long[valDF, on = "range", AGI := val * NoR][]
   No_Adjusted_Gross_Income              range     NoR         AGI
1:                   183454     from_1_to_5000 1035373  2588950187
2:                   241199     from_1_to_5000 4272260 10682786130
3:                   249506     from_1_to_5000 1124098  2810807049
4:                   183454 from_5000_to_10000  319540  2396550000
5:                   241199 from_5000_to_10000 4826042 36195315000
6:                   249506 from_5000_to_10000 1959866 14698995000

If required for presentation, the dataset can be reshaped back from long to wide format:

dcast(long, No_Adjusted_Gross_Income ~ range, value.var = c("NoR", "AGI"))
   No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000
1:                   183454            1035373                 319540         2588950187             2396550000
2:                   241199            4272260                4826042        10682786130            36195315000
3:                   249506            1124098                1959866         2810807049            14698995000

which reproduces OP's expected result. Note that the variable names vn are created automagically.


Aggregation and grouping can be performed while reshaping

dcast(long, No_Adjusted_Gross_Income ~ range, sum, value.var = c("NoR", "AGI"))
   No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000
1:                   183454            1035373                 319540         2588950187             2396550000
2:                   241199            4272260                4826042        10682786130            36195315000
3:                   249506            1124098                1959866         2810807049            14698995000

or

dcast(long, No_Adjusted_Gross_Income ~ ., sum, value.var = c("NoR", "AGI"))
   No_Adjusted_Gross_Income     NoR         AGI
1:                   183454 1354913  4985500187
2:                   241199 9098302 46878101130
3:                   249506 3083964 17509802049

Alternatively, aggregation & grouping can be performed in long format:

long[, lapply(.SD, sum), .SDcols = c("NoR", "AGI"), by = No_Adjusted_Gross_Income]
   No_Adjusted_Gross_Income     NoR         AGI
1:                   183454 1354913  4985500187
2:                   241199 9098302 46878101130
3:                   249506 3083964 17509802049


来源:https://stackoverflow.com/questions/62004219/using-lapply-to-create-new-columns-based-on-old-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!