Problems with aggregating and crosstabulating data

吃可爱长大的小学妹 提交于 2019-12-12 01:13:36

问题


My data looks like this, ca 3300 rows of data:

Year     Location     Catch
1991       0313       45100
1989       0711         323
1991       0312        1100
1991       0313       45100
1989       0711         323
1991       0312         400 
1990       0313      101000
1981       0711         623
1999       0312         410
2000       0313      145100
1987       0711         323
1987       1285         770  
....

Years cover the period 1977-2015, and there are approx 500 different locations, not all having data in every year.

I need an output like this, summing up the catch for each cell, tabulated by location (rows) and year (columns):

Location   '1977' '1978' '1979' '1980' '1981' '1982' '1983' ...
0312         456  11100  12560    320   4566      0  12010  ...
0313         121    100   4500    760    112  12050 100100  ...
0711        5500   6500      0   1205   1201    560  90500  ...
0712         325    215    600  10100      0   4500  11050  ...
1285       10005   2700   1900 101000     50   7500   6050  ...
...          ...    ...    ...    ...    ...    ...    ...

I have asked a similar question previously, and I have tried different versions of the solution suggested successfully for that question, but without success. This is a little different and slightly more complex, and I am getting something wrong.

Grateful for any help.


回答1:


You can probably achieve what you are looking for with xtabs which, below, calculates the sum of each location by year for the table that you posted in your answer.

xtabs(Catch ~ Location + Year, df)
        Year
Location   1981   1987   1989   1990   1991   1999   2000
    312       0      0      0      0   1500    410      0
    313       0      0      0 101000  90200      0 145100
    711     623    323    646      0      0      0      0
    1285      0    770      0      0      0      0      0

data

df <- read.table(header=TRUE, text="Year     Location     Catch
1991       0313       45100
1989       0711         323
1991       0312        1100
1991       0313       45100
1989       0711         323
1991       0312         400 
1990       0313      101000
1981       0711         623
1999       0312         410
2000       0313      145100
1987       0711         323
1987       1285         770 ")



回答2:


Here is a tidyverse way. After grouping by 'Year', 'Location', get the sum of 'Catch' and then spread from 'long' to 'wide' format

library(tidyverse)
df %>%
    group_by(Year, Location) %>%
    summarise(Catch = sum(Catch)) %>% 
    spread(Year, Catch, fill = 0)
# A tibble: 4 × 8
#  Location `1981` `1987` `1989` `1990` `1991` `1999` `2000`
#*    <int>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#1      312      0      0      0      0   1500    410      0
#2      313      0      0      0 101000  90200      0 145100
#3      711    623    323    646      0      0      0      0
#4     1285      0    770      0      0      0      0      0


来源:https://stackoverflow.com/questions/42930879/problems-with-aggregating-and-crosstabulating-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!