R - graph frequency of observations over time with small value range

China☆狼群 提交于 2019-12-07 01:55:10

问题


I'd trying to graph the frequency of observations over time. I have a dataset where hundreds of laws are coded 0-3. I'd like to know if outcomes 2-3 are occurring more often as time progresses. Here is a sample of mock data:

Data <- data.frame(
  year = sample(1998:2004, 200, replace = TRUE),
  score = sample(1:4, 200, replace = TRUE)
)

If i plot

plot(Data$year, Data$score)

I get a checkered matrix where every single spot is filled in, but I can't tell which numbers occur more often. Is there a way to color or to change the size of each point by the number of observations of a given row/year?

A few notes may help in answering the question:

1). I don't know how to sample data where certain numbers occur more frequently than others. My sample procedure samples equally from all numbers. If there is a better way I should have created my reproducible data to reflect more observations in later years, I would like to know how.

2). this seemed like it would be best to visualize in a scatter plot, but I could be wrong. I'm open to other visualizations.

Thanks!


回答1:


Here's how I would approach this (hope this is what you need)

Create the data (Note: when using sample in questions, always use set.seed too so it will be reproducible)

set.seed(123)
Data <- data.frame(
  year = sample(1998:2004, 200, replace = TRUE),
  score = sample(1:4, 200, replace = TRUE)
)

Find frequncies of score per year using table

Data2 <- as.data.frame.matrix(table(Data))
Data2$year <- row.names(Data2)

Use melt to convert it back to long format

library(reshape2)
Data2 <- melt(Data2, "year")

Plot the data while showing different color per group and relative size pre frequency

library(ggplot2)
ggplot(Data2, aes(year, variable, size = value, color = variable)) +
  geom_point()

Alternatively, you could use both fill and size to describe frequency, something like

ggplot(Data2, aes(year, variable, size = value, fill = value)) +
  geom_point(shape = 21)




回答2:


So many answers... You seem to want to know if the frequency of outcomes 2-3 is increasing over time, so why not plot that directly:

set.seed(1)
Data <- data.frame(
  year = sample(1998:2004, 200, replace = TRUE),
  score = sample(0:3, 200, replace = TRUE))
library(ggplot2)
ggplot(Data, aes(x=factor(year),y=score, group=(score>1)))+
  stat_summary(aes(color=(score>1)),fun.y=length, geom="line")+
  scale_color_discrete("score",labels=c("0 - 1","2 - 3"))+
  labs(x="",y="Frequency")




回答3:


Here's another approach:

ggplot(Data, aes(year)) + geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score)

Each facet represents one "score" value, as noted in the title of each facet. You can easily get a feeling for the counts by looking at the hight of the bars + the colour (lighter blue indicating more counts).


Of course you could also do this only for the score %in% 2:3, if you don't want score 1 and 4 included. In such a case, you could do:

ggplot(Data[Data$score %in% 2:3,], aes(year)) + 
     geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score)



回答4:


> with(Data, round( prop.table(table(year,score), 1), 3)  )

      score
year       1     2     3     4
  1998 0.308 0.231 0.231 0.231
  1999 0.136 0.273 0.227 0.364
  2000 0.281 0.250 0.219 0.250
  2001 0.129 0.290 0.226 0.355
  2002 0.217 0.174 0.261 0.348
  2003 0.286 0.286 0.200 0.229
  2004 0.387 0.129 0.194 0.290

png(); plot(jitter(Data$year), jitter(Data$score) );dev.off()

There are other methods one could use if the number of points are so large that jittering doesn't let you determine counts by eye. You can use transparent color which would allow you to determine density of points. The last 2 hex digits in an 8-position hex number preceded bu an octothorpe is the alpha-transparency of a color. See ?rgb and ?col2rgb. Compare these two plots with new data that allows you to have differences in proportions:

Data <- data.frame(
   year = rep(1998:2004, length=49000),
   score = sample(1:7, 49000, prob=(1:7)/5, replace = TRUE)
 )

png(); plot(jitter(Data$year), jitter(Data$score) );dev.off()

 png(); plot(jitter(Data$year), jitter(Data$score) ,
        col="#bbbbbb11" );dev.off()




回答5:


Another alternative:

df<-aggregate(Data$score,by= list(Data$year),table)
matplot(df$Group.1,(df[,2]))

hope it helps



来源:https://stackoverflow.com/questions/27626915/r-graph-frequency-of-observations-over-time-with-small-value-range

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!