问题
I have unsuccessfully tried to do the task described below, so any help will be much appreciated.
The largest table below contains data of quota ownership of fishers (and other variable, ’cpue’) across the time. I categorized fishers according the number of quotas that they own (‘category’). Fishers may increase or reduce the number of owned quotas; therefore, their ownership category also may change. I need extract the information every time when fishers change their ownership. It is the row of the year before when the number of quota was already increased or decreased. For instance, if the number of quotas was 20 and 45 during the years 2000 and 2001 respectively, I need the information (row) of the year 2000. Additionally, I need a new column with a category to indicate amongst what ownership levels fishers are moving. The second table below shows the new data frame that I need create with the extracted rows.
My data:
ID fisher year qtty category cpue
1 1 1998 13 1 0.5994452
2 1 1999 13 1 0.6176183
3 1 2000 13 1 0.6871764
4 1 2001 20 2 0.3228005
5 1 2002 20 2 0.6505336
6 1 2003 20 2 0.8615834
7 1 2004 20 2 0.6871764
8 1 2005 20 2 0.7469739
9 1 2006 20 2 0.7380952
10 1 2007 45 3 0.7516396
11 1 2008 45 3 0.6808454
12 1 2009 45 3 0.6734158
13 1 2010 45 3 0.70367
14 1 2011 45 3 0.5434572
15 1 2012 45 3 0.6181238
16 2 2000 50 3 0.5191856
17 2 2001 50 3 0.6098226
18 2 2002 50 3 1.0018519
19 2 2003 50 3 1.2049724
20 2 2004 50 3 0.5857708
21 2 2005 10 1 0.6744186
22 2 2006 10 1 0.8123333
23 2 2007 10 1 0.3228005
24 2 2008 10 1 0.6505336
25 2 2009 10 1 0.8615834
26 2 2010 0 4 0
27 3 1998 25 2 0.7469739
28 3 1999 25 2 0.7380952
29 3 2000 25 2 0.7516396
30 3 2001 25 2 0.6808454
31 3 2002 10 1 0.6734158
32 3 2003 10 1 0.70367
33 3 2004 10 1 0.5434572
34 3 2005 10 1 0.6181238
35 3 2006 45 3 0.4698849
36 3 2007 45 3 1.0714286
37 3 2008 45 3 1.242439
38 3 2009 45 3 1.0614261
39 3 2010 45 3 0.9761391
40 3 2011 45 3 1.0041898
41 3 2012 45 3 0.9429851
42 4 2005 45 3 0.9310958
43 4 2006 50 3 0.8932985
44 4 2007 50 3 0.7867613
45 4 2008 20 2 0.7994713
46 4 2009 20 2 0.9368927
47 4 2010 10 1 0.8123333
48 4 2011 0 4 0
49 5 1998 20 2 0.4698849
50 5 1999 20 2 1.0714286
51 5 2000 20 2 1.242439
52 5 2001 20 2 1.0614261
53 5 2002 20 2 0.9761391
54 5 2003 20 2 1.0041898
55 5 2004 20 2 0.7469739
56 5 2005 0 4 0.7380952
57 6 2000 55 3 0.7516396
58 6 2001 55 3 0.6808454
59 6 2002 55 3 0.6734158
60 6 2003 55 3 0.6505336
61 6 2004 55 3 0.8615834
62 6 2005 55 3 0.6871764
63 6 2006 55 3 0.6181238
64 6 2007 0 4 0
This is what I need:
ID fisher year qtty category cpue category2
3 1 2000 13 1 0.6871764 1
25 2 2009 10 1 0.8615834 1
34 3 2005 10 1 0.6181238 1
47 4 2010 10 1 0.8123333 1
9 1 2006 20 2 0.7380952 2
30 3 2001 25 2 0.6808454 3
46 4 2009 20 2 0.9368927 3
44 4 2007 50 3 0.7867613 4
20 2 2004 50 3 0.5857708 5
25 2 2009 10 1 0.8615834 6
47 4 2010 10 1 0.8123333 6
55 5 2004 20 2 0.7469739 7
63 6 2006 55 3 0.6181238 8
The ownership categories are 1 (1-15 quotas), 2 (16-40 quotas), 3(>40 quotas) and 4(0 quotas, those who exited the fishery). The new category that I need should show the transition amongst the different ownership categories (e.g. category 1 is the transition from the ownership level 1 to the ownership level 2). Full details in the following table:
From to category2
1 2 1
2 3 2
2 1 3
3 2 4
3 1 5
1 0 6
2 0 7
3 0 8
Thanks!!
回答1:
With data
as your first data frame and cats
as the category table:
> w<-which(diff(data$fisher)==0 & diff(data$category)!= 0)
> merge(data.frame(data[w,],From=data$category[w],to=data$category[w+1]),cats,all.x=T)[,-(1:2)]
ID fisher year qtty category cpue category2
1 3 1 2000 13 1 0.6871764 1
2 34 3 2005 10 1 0.6181238 NA
3 25 2 2009 10 1 0.8615834 6
4 47 4 2010 10 1 0.8123333 6
5 46 4 2009 20 2 0.9368927 3
6 30 3 2001 25 2 0.6808454 3
7 9 1 2006 20 2 0.7380952 2
8 55 5 2004 20 2 0.7469739 7
9 20 2 2004 50 3 0.5857708 5
10 44 4 2007 50 3 0.7867613 4
11 63 6 2006 55 3 0.6181238 8
回答2:
This should work for you, if I understood your problem correctly. df
is the big dataset you've shown in your question -
library(data.table)
dt <- data.table(df)
dt[,qttychange := diff(qtty), by = "fisher"]
categorychanges <- dt[qttychange != 0]
dt[,nextcategory := c(tail(category,-1),NA)]
dt[qttychange == 0 ,nextcategory := NA]
categorytable <- dt[!is.na(nextcategory),list(freq = .N), by = c("category","nextcategory")]
Output -
> categorychanges
ID fisher year qtty category cpue qttychange
1: 3 1 2000 13 1 0.6871764 7
2: 9 1 2006 20 2 0.7380952 25
3: 20 2 2004 50 3 0.5857708 -40
4: 25 2 2009 10 1 0.8615834 -10
5: 30 3 2001 25 2 0.6808454 -15
6: 34 3 2005 10 1 0.6181238 35
7: 42 4 2005 45 3 0.9310958 5
8: 44 4 2007 50 3 0.7867613 -30
9: 46 4 2009 20 2 0.9368927 -10
10: 47 4 2010 10 1 0.8123333 -10
11: 48 4 2011 0 4 0.0000000 5
12: 55 5 2004 20 2 0.7469739 -20
13: 63 6 2006 55 3 0.6181238 -55
> categorytable
category nextcategory freq
1: 1 2 1
2: 2 3 1
3: 3 1 1
4: 1 4 2
5: 2 1 2
6: 1 3 1
7: 3 3 1
8: 3 2 1
9: 4 2 1
10: 2 4 1
11: 3 4 1
回答3:
The output you provide is a bit inconsistent, i.e. there are some duplicate rows and some mismatches between the category2
you provide and the category2
you output.
Also, the last dataframe which shows the category2
(i) has 0
which you have not mentioned as a category of quotas, (ii) does not provide category2
for the 1 to 3 transition. So, I changed 0 with 4, and added a category2
for the 1 to 3 transition.
I hope I've not misunderstood, but the result looks similar to what you expect:
library(zoo)
newDF <- do.call(rbind, lapply(split(DF, DF$fisher),
function(x) { res <- x[diff(x$category) != 0,] ;
aa <- unique(x$category) ;
cbind(res, rollapply(unique(x$category), width = 2, c)) }))
newDF$category2 <- unlist(apply(newDF[,c("1", "2")], 1,
function(x) trans$category2[grep(paste(x, collapse = " to "),
paste(trans$From, trans$to, sep = " to "))]), use.names = F)
newDF
# ID fisher year qtty category cpue 1 2 category2
#1.3 3 1 2000 13 1 0.6871764 1 2 1
#1.9 9 1 2006 20 2 0.7380952 2 3 2
#2.20 20 2 2004 50 3 0.5857708 3 1 5
#2.25 25 2 2009 10 1 0.8615834 1 4 6
#3.30 30 3 2001 25 2 0.6808454 2 1 3
#3.34 34 3 2005 10 1 0.6181238 1 3 not given
#4.44 44 4 2007 50 3 0.7867613 3 2 4
#4.46 46 4 2009 20 2 0.9368927 2 1 3
#4.47 47 4 2010 10 1 0.8123333 1 4 6
#5 55 5 2004 20 2 0.7469739 2 4 7
#6 63 6 2006 55 3 0.6181238 3 4 8
Columns 1
and 2
of newDF is the "from - to" transition.
DF
is your large dataframe and trans
is your last dataframe with the transitions (as I changed it):
DF <- structure(list(ID = 1:64, fisher = c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), year = c(1998L, 1999L,
2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 2012L, 2000L, 2001L, 2002L, 2003L, 2004L,
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 1998L, 1999L, 2000L,
2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L,
2010L, 2011L, 2012L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
2011L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L,
2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L), qtty = c(13L,
13L, 13L, 20L, 20L, 20L, 20L, 20L, 20L, 45L, 45L, 45L, 45L, 45L,
45L, 50L, 50L, 50L, 50L, 50L, 10L, 10L, 10L, 10L, 10L, 0L, 25L,
25L, 25L, 25L, 10L, 10L, 10L, 10L, 45L, 45L, 45L, 45L, 45L, 45L,
45L, 45L, 50L, 50L, 20L, 20L, 10L, 0L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 0L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 0L), category = c(1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 4L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 4L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L),
cpue = c(0.5994452, 0.6176183, 0.6871764, 0.3228005, 0.6505336,
0.8615834, 0.6871764, 0.7469739, 0.7380952, 0.7516396, 0.6808454,
0.6734158, 0.70367, 0.5434572, 0.6181238, 0.5191856, 0.6098226,
1.0018519, 1.2049724, 0.5857708, 0.6744186, 0.8123333, 0.3228005,
0.6505336, 0.8615834, 0, 0.7469739, 0.7380952, 0.7516396,
0.6808454, 0.6734158, 0.70367, 0.5434572, 0.6181238, 0.4698849,
1.0714286, 1.242439, 1.0614261, 0.9761391, 1.0041898, 0.9429851,
0.9310958, 0.8932985, 0.7867613, 0.7994713, 0.9368927, 0.8123333,
0, 0.4698849, 1.0714286, 1.242439, 1.0614261, 0.9761391,
1.0041898, 0.7469739, 0.7380952, 0.7516396, 0.6808454, 0.6734158,
0.6505336, 0.8615834, 0.6871764, 0.6181238, 0)), .Names = c("ID",
"fisher", "year", "qtty", "category", "cpue"), class = "data.frame", row.names = c(NA,
-64L))
trans <- structure(list(From = c("1", "2", "2", "3", "3", "1", "2", "3",
"1"), to = c("2", "3", "1", "2", "1", "4", "4", "4", "3"), category2 = c("1",
"2", "3", "4", "5", "6", "7", "8", "not given")), .Names = c("From",
"to", "category2"), row.names = c(NA, 9L), class = "data.frame")
来源:https://stackoverflow.com/questions/19743957/extracting-row-from-a-data-frame-according-a-criterion-based-if-values-through-r