Extract cells and adjacent cells that may appear anywhere within a dataframe in R

问题

I have a dataframe that contains information about vegetation cover and percent coverage, collected using a quadrat. The dataframe is set up so that each row represents a single quadrat. If there are multiple species within one quadrat, they are all listed within the same row with respective % coverage always following in next column. Here is an example, species are represented as 4 letter codes:

The problem is that the species were not recorded in any particular order, and not all species occur in every quadrat. There can also be any number of species per quadrat. I need to be able to extract each species AND it’s respective coverage, and place them into another dataframe for further analysis. For Example, species “bope” from above example data would look like this:

Any help greatly appreciated. Brian

回答1:

You could accomplish this by reshaping the data into a long format and then filtering by row values.

df = data.frame(Quadrat = 1:6, Date = seq.Date(as.Date("2014-01-01"), by = 1, length = 6), Species_1 = c("unk1", "bope", "bope", "stgu", "bg","bope"),
                covrage = sample(1:100,6), Species_2 = c("bope", "bial", "stgu", "bg","unk1", "bg"), covrage2 = sample(1:100,6))

> df
  Quadrat       Date Species_1 covrage Species_2 covrage2
1       1 2014-01-01      unk1      76      bope       63
2       2 2014-01-02      bope      82      bial       33
3       3 2014-01-03      bope      41      stgu        5
4       4 2014-01-04      stgu       6        bg       45
5       5 2014-01-05        bg      65      unk1       21
6       6 2014-01-06      bope      15        bg       96

df$Species_1 = as.character(df$Species_1)
df$Species_2 = as.character(df$Species_2)


df2 = reshape(df, varying = list(c("Species_1", "Species_2"), c("covrage", "covrage2")), v.names = c("Species", "Covrage"), direction = "long")

> df2
    Quadrat       Date time Species Covrage id
1.1       1 2014-01-01    1    unk1      76  1
2.1       2 2014-01-02    1    bope      82  2
3.1       3 2014-01-03    1    bope      41  3
4.1       4 2014-01-04    1    stgu       6  4
5.1       5 2014-01-05    1      bg      65  5
6.1       6 2014-01-06    1    bope      15  6
1.2       1 2014-01-01    2    bope      63  1
2.2       2 2014-01-02    2    bial      33  2
3.2       3 2014-01-03    2    stgu       5  3
4.2       4 2014-01-04    2      bg      45  4
5.2       5 2014-01-05    2    unk1      21  5
6.2       6 2014-01-06    2      bg      96  6

> df2[df2$Species == "bope", colnames(df2) %in% c("Quadrat", "Covrage")]
    Quadrat Covrage
2.1       2      82
3.1       3      41
6.1       6      15
1.2       1      63

回答2:

If you don't want it datewise, let me ignore the date columns,

here is an example dataframe

species1 = replicate(20, paste(sample(LETTERS, 5), collapse = ""))
coverage1 = rnorm(20, 50, 30)
species2 = sample(species1)
coverage2 = sample(1:100, 20,replace = TRUE)
df = data.frame(species1, coverage1, species2, coverage2)

df
   species1   coverage1 species2 coverage2
1     KIRGD  -6.1879727    OBHTY        96
2     SXKAB  70.4472228    GUROP        40
3     LSWME  59.4121446    OMABR        29
4     KVSRD  53.8434373    PAQCJ        12
5     KHRUD  62.8253485    SXKAB        57
6     FOAGY  83.1087433    WUEMQ         4
7     QHYZL  52.2393233    KIRGD        47
8     EGDHA  82.2169139    RJKUS        72
9     GXFAR  58.1819166    SXNGO        16
10    SXNGO  -0.6093836    QHYZL         2
11    ZJQOA  99.1073472    KHRUD        28
12    PAQCJ  -1.0029008    TEPIZ        40
13    TEPIZ  55.5824570    WNLYJ        31
14    RJKUS  55.7524571    GDQOV        27
15    WUEMQ   9.4777950    LSWME         9
16    GDQOV  31.9365398    KVSRD        28
17    OBHTY   5.8709309    GXFAR        89
18    OMABR -20.5623502    ZJQOA        85
19    WNLYJ  75.9212241    FOAGY        11
20    GUROP  60.7119029    EGDHA        38

To get coverage for each species

species = unique(df$species1, df$species2)
sapply(species, function(x)df[grep(x, df$species1),]$coverage2)
cols = grep("coverage", colnames(df))
coverage = lapply(cols, function(y)sapply(species, function(x)df[grep(x, df[,(y-1)]),][,y]) )
df2 = data.frame(species, coverage1 = coverage[[1]], coverage2 = coverage[[2]])

df2
   species   coverage1 coverage2
1    KIRGD  -6.1879727        47
2    SXKAB  70.4472228        57
3    LSWME  59.4121446         9
4    KVSRD  53.8434373        28
5    KHRUD  62.8253485        28
6    FOAGY  83.1087433        11
7    QHYZL  52.2393233         2
8    EGDHA  82.2169139        38
9    GXFAR  58.1819166        89
10   SXNGO  -0.6093836        16
11   ZJQOA  99.1073472        85
12   PAQCJ  -1.0029008        12
13   TEPIZ  55.5824570        40
14   RJKUS  55.7524571        72
15   WUEMQ   9.4777950         4
16   GDQOV  31.9365398        27
17   OBHTY   5.8709309        96
18   OMABR -20.5623502        29
19   WNLYJ  75.9212241        31
20   GUROP  60.7119029        40

来源：https://stackoverflow.com/questions/43860974/extract-cells-and-adjacent-cells-that-may-appear-anywhere-within-a-dataframe-in

标签

dataframe

extract