Maintain attributes of data frame columns after merge

点点圈 提交于 2021-02-04 17:56:25

问题


It seems that merge causes columns of a data frame to lose their attributes:

attr(mtcars$mpg, "units") <- "miles.per.gallon"
new.df <- data.frame(gear=3:5, my.opinion=c("not enough", "just right", "too many"))
merged.df <- merge(new.df, mtcars)

attr(merged.df$mpg, "units") returns NULL.

Is there a way to get merge to preserve attributes of columns?

(A workaround would be to query the attributes of each column of each data frame before the merge, and then to re-assign them after the merge. However that seems inelegant.)


回答1:


If you don't mind using dplyr, this one seems to work.

Your data:

attr(mtcars$mpg, "units") <- "miles.per.gallon"
new.df <- data.frame(gear=3:5, my.opinion=c("not enough", "just right", "too many"))

> attr(mtcars$mpg, "units")
[1] "miles.per.gallon"

Function inner_join from dplyr.

inner.df<-dplyr::inner_join(new.df, mtcars,"gear")

The resulting data frame is as follows:

> inner.df
    gear my.opinion  mpg cyl  disp  hp drat    wt  qsec vs am carb
1     3 not enough 21.4   6 258.0 110 3.08 3.215 19.44  1  0    1
2     3 not enough 18.7   8 360.0 175 3.15 3.440 17.02  0  0    2
3     3 not enough 18.1   6 225.0 105 2.76 3.460 20.22  1  0    1
4     3 not enough 14.3   8 360.0 245 3.21 3.570 15.84  0  0    4
5     3 not enough 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3
6     3 not enough 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3
7     3 not enough 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3
8     3 not enough 10.4   8 472.0 205 2.93 5.250 17.98  0  0    4
9     3 not enough 10.4   8 460.0 215 3.00 5.424 17.82  0  0    4
10    3 not enough 14.7   8 440.0 230 3.23 5.345 17.42  0  0    4
11    3 not enough 21.5   4 120.1  97 3.70 2.465 20.01  1  0    1
12    3 not enough 15.5   8 318.0 150 2.76 3.520 16.87  0  0    2
13    3 not enough 15.2   8 304.0 150 3.15 3.435 17.30  0  0    2
14    3 not enough 13.3   8 350.0 245 3.73 3.840 15.41  0  0    4
15    3 not enough 19.2   8 400.0 175 3.08 3.845 17.05  0  0    2
16    4 just right 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4
17    4 just right 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4
18    4 just right 22.8   4 108.0  93 3.85 2.320 18.61  1  1    1
19    4 just right 24.4   4 146.7  62 3.69 3.190 20.00  1  0    2
20    4 just right 22.8   4 140.8  95 3.92 3.150 22.90  1  0    2
21    4 just right 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4
22    4 just right 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4
23    4 just right 32.4   4  78.7  66 4.08 2.200 19.47  1  1    1
24    4 just right 30.4   4  75.7  52 4.93 1.615 18.52  1  1    2
25    4 just right 33.9   4  71.1  65 4.22 1.835 19.90  1  1    1
26    4 just right 27.3   4  79.0  66 4.08 1.935 18.90  1  1    1
27    4 just right 21.4   4 121.0 109 4.11 2.780 18.60  1  1    2
28    5   too many 26.0   4 120.3  91 4.43 2.140 16.70  0  1    2
29    5   too many 30.4   4  95.1 113 3.77 1.513 16.90  1  1    2
30    5   too many 15.8   8 351.0 264 4.22 3.170 14.50  0  1    4
31    5   too many 19.7   6 145.0 175 3.62 2.770 15.50  0  1    6
32    5   too many 15.0   8 301.0 335 3.54 3.570 14.60  0  1    8

Where the attribute is mantained:

> attr(inner.df$mpg, "units")
[1] "miles.per.gallon"



回答2:


There's also data.table

library(data.table)

dt1 = as.data.table(mtcars)
dt2 = as.data.table(new.df)

inner.dt <- dt1[dt2, on = "gear"]

attr(inner.dt$mpg, "units")

...

> attr(inner.dt$mpg, "units")
[1] "miles.per.gallon"

but...

library(microbenchmark)
microbenchmark(dplyr::inner_join(new.df, mtcars,"gear"),
               dt1[dt2, on = "gear"])

...

> microbenchmark(dplyr::inner_join(new.df, mtcars,"gear"),
+                    dt1[dt2, on = "gear"])
Unit: microseconds
             expr     min       lq     mean  median      uq      max neval
 dplyr            544.877 568.5840 625.6442 606.319 658.870 1005.197   100
 data.table       860.186 892.1915 961.2788 938.618 979.711 1510.166   100



回答3:


You can write a method for merge and arrange for that to preserve the attributes:

merge.foo <- function(...) {
  args <- list(...)
  attr <- lapply(args[[1]], function(x) lapply(x, attributes))
  attr <- unlist(attr, recursive=F)
  out <- Reduce(merge, args[[1]])
  for (col in names(attr)) attributes(out[,col]) <- attr[[col]]
  out
}

You need to create a list of data frames and pass it as argument to the merge function. You also need a class attribute (i.e. 'foo') added to the list.



来源:https://stackoverflow.com/questions/20306853/maintain-attributes-of-data-frame-columns-after-merge

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!