问题
I have a dataframe that contains information about vegetation cover and percent coverage, collected using a quadrat. The dataframe is set up so that each row represents a single quadrat. If there are multiple species within one quadrat, they are all listed within the same row with respective % coverage always following in next column. Here is an example, species are represented as 4 letter codes:

The problem is that the species were not recorded in any particular order, and not all species occur in every quadrat. There can also be any number of species per quadrat. I need to be able to extract each species AND it’s respective coverage, and place them into another dataframe for further analysis. For Example, species “bope” from above example data would look like this:

Any help greatly appreciated. Brian
回答1:
You could accomplish this by reshaping the data into a long format and then filtering by row values.
df = data.frame(Quadrat = 1:6, Date = seq.Date(as.Date("2014-01-01"), by = 1, length = 6), Species_1 = c("unk1", "bope", "bope", "stgu", "bg","bope"),
covrage = sample(1:100,6), Species_2 = c("bope", "bial", "stgu", "bg","unk1", "bg"), covrage2 = sample(1:100,6))
> df
Quadrat Date Species_1 covrage Species_2 covrage2
1 1 2014-01-01 unk1 76 bope 63
2 2 2014-01-02 bope 82 bial 33
3 3 2014-01-03 bope 41 stgu 5
4 4 2014-01-04 stgu 6 bg 45
5 5 2014-01-05 bg 65 unk1 21
6 6 2014-01-06 bope 15 bg 96
df$Species_1 = as.character(df$Species_1)
df$Species_2 = as.character(df$Species_2)
df2 = reshape(df, varying = list(c("Species_1", "Species_2"), c("covrage", "covrage2")), v.names = c("Species", "Covrage"), direction = "long")
> df2
Quadrat Date time Species Covrage id
1.1 1 2014-01-01 1 unk1 76 1
2.1 2 2014-01-02 1 bope 82 2
3.1 3 2014-01-03 1 bope 41 3
4.1 4 2014-01-04 1 stgu 6 4
5.1 5 2014-01-05 1 bg 65 5
6.1 6 2014-01-06 1 bope 15 6
1.2 1 2014-01-01 2 bope 63 1
2.2 2 2014-01-02 2 bial 33 2
3.2 3 2014-01-03 2 stgu 5 3
4.2 4 2014-01-04 2 bg 45 4
5.2 5 2014-01-05 2 unk1 21 5
6.2 6 2014-01-06 2 bg 96 6
> df2[df2$Species == "bope", colnames(df2) %in% c("Quadrat", "Covrage")]
Quadrat Covrage
2.1 2 82
3.1 3 41
6.1 6 15
1.2 1 63
回答2:
If you don't want it datewise, let me ignore the date columns,
here is an example dataframe
species1 = replicate(20, paste(sample(LETTERS, 5), collapse = ""))
coverage1 = rnorm(20, 50, 30)
species2 = sample(species1)
coverage2 = sample(1:100, 20,replace = TRUE)
df = data.frame(species1, coverage1, species2, coverage2)
df
species1 coverage1 species2 coverage2
1 KIRGD -6.1879727 OBHTY 96
2 SXKAB 70.4472228 GUROP 40
3 LSWME 59.4121446 OMABR 29
4 KVSRD 53.8434373 PAQCJ 12
5 KHRUD 62.8253485 SXKAB 57
6 FOAGY 83.1087433 WUEMQ 4
7 QHYZL 52.2393233 KIRGD 47
8 EGDHA 82.2169139 RJKUS 72
9 GXFAR 58.1819166 SXNGO 16
10 SXNGO -0.6093836 QHYZL 2
11 ZJQOA 99.1073472 KHRUD 28
12 PAQCJ -1.0029008 TEPIZ 40
13 TEPIZ 55.5824570 WNLYJ 31
14 RJKUS 55.7524571 GDQOV 27
15 WUEMQ 9.4777950 LSWME 9
16 GDQOV 31.9365398 KVSRD 28
17 OBHTY 5.8709309 GXFAR 89
18 OMABR -20.5623502 ZJQOA 85
19 WNLYJ 75.9212241 FOAGY 11
20 GUROP 60.7119029 EGDHA 38
To get coverage for each species
species = unique(df$species1, df$species2)
sapply(species, function(x)df[grep(x, df$species1),]$coverage2)
cols = grep("coverage", colnames(df))
coverage = lapply(cols, function(y)sapply(species, function(x)df[grep(x, df[,(y-1)]),][,y]) )
df2 = data.frame(species, coverage1 = coverage[[1]], coverage2 = coverage[[2]])
df2
species coverage1 coverage2
1 KIRGD -6.1879727 47
2 SXKAB 70.4472228 57
3 LSWME 59.4121446 9
4 KVSRD 53.8434373 28
5 KHRUD 62.8253485 28
6 FOAGY 83.1087433 11
7 QHYZL 52.2393233 2
8 EGDHA 82.2169139 38
9 GXFAR 58.1819166 89
10 SXNGO -0.6093836 16
11 ZJQOA 99.1073472 85
12 PAQCJ -1.0029008 12
13 TEPIZ 55.5824570 40
14 RJKUS 55.7524571 72
15 WUEMQ 9.4777950 4
16 GDQOV 31.9365398 27
17 OBHTY 5.8709309 96
18 OMABR -20.5623502 29
19 WNLYJ 75.9212241 31
20 GUROP 60.7119029 40
来源:https://stackoverflow.com/questions/43860974/extract-cells-and-adjacent-cells-that-may-appear-anywhere-within-a-dataframe-in