R - How to create a seasonal plot - Different lines for years

ⅰ亾dé卋堺 提交于 2021-02-19 03:50:10

问题


I already asked the same question yesterday, but I didnt get any suggestions until now, so I decided to delete the old one and ask again, giving additional infos.

So here again:

I have a dataframe like this:

Link to the original dataframe: https://megastore.uni-augsburg.de/get/JVu_V51GvQ/

      Date   DENI011
1 1993-01-01   9.946
2 1993-01-02  13.663
3 1993-01-03   6.502
4 1993-01-04   6.031
5 1993-01-05  15.241
6 1993-01-06   6.561
     ....
     ....
6569 2010-12-26  44.113
6570 2010-12-27  34.764
6571 2010-12-28  51.659
6572 2010-12-29  28.259
6573 2010-12-30  19.512
6574 2010-12-31  30.231

I want to create a plot that enables me to compare the monthly values in the DENI011 over the years. So I want to have something like this:

http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Seasonal%20Plot

Jan-Dec on the x-scale, values on the y-scale and the years displayed by different colored lines.

I found several similar questions here, but nothing works for me. I tried to follow the instructions on the website with the example, but the problem is that I cant create a ts-object.

Then I tried it this way:

Ref_Data$MonthN <- as.numeric(format(as.Date(Ref_Data$Date),"%m")) # Month's number
Ref_Data$YearN <- as.numeric(format(as.Date(Ref_Data$Date),"%Y"))
Ref_Data$Month  <- months(as.Date(Ref_Data$Date), abbreviate=TRUE) # Month's abbr.

g <- ggplot(data = Ref_Data, aes(x = MonthN, y = DENI011, group = YearN, colour=YearN)) + 
  geom_line() +
  scale_x_discrete(breaks = Ref_Data$MonthN, labels = Ref_Data$Month)

That also didnt work, the plot looks horrible. I dont need to put all the years in 1 plot from 1993-2010. Actually only a few years would be ok, like from 1998-2006 maybe.

And suggestions, how to solve this?


回答1:


As others have noted, in order to create a plot such as the one you used as an example, you'll have to aggregate your data first. However, it's also possible to retain daily data in a similar plot.

reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2018-02-11

library(tidyverse)
library(lubridate)

# Import the data
url <- "https://megastore.uni-augsburg.de/get/JVu_V51GvQ/"
raw <- read.table(url, stringsAsFactors = FALSE)

# Parse the dates, and use lower case names
df <- as_tibble(raw) %>% 
  rename_all(tolower) %>% 
  mutate(date = ymd(date))

One trick to achieve this would be to set the year component in your date variable to a constant, effectively collapsing the dates to a single year, and then controlling the axis labelling so that you don't include the constant year in the plot.

# Define the plot
p <- df %>% 
  mutate(
    year = factor(year(date)),     # use year to define separate curves
    date = update(date, year = 1)  # use a constant year for the x-axis
  ) %>% 
  ggplot(aes(date, deni011, color = year)) +
    scale_x_date(date_breaks = "1 month", date_labels = "%b")

# Raw daily data
p + geom_line()

In this case though, your daily data are quite variable, so this is a bit of a mess. You could hone in on a single year to see the daily variation a bit better.

# Hone in on a single year
p + geom_line(aes(group = year), color = "black", alpha = 0.1) +
  geom_line(data = function(x) filter(x, year == 2010), size = 1)

But ultimately, if you want to look a several years at a time, it's probably a good idea to present smoothed lines rather than raw daily values. Or, indeed, some monthly aggregate.

# Smoothed version
p + geom_smooth(se = F)
#> `geom_smooth()` using method = 'loess'
#> Warning: Removed 117 rows containing non-finite values (stat_smooth).




回答2:


If you don't want to add in library(dplyr), this is the base R code. Exact same strategy and results as www's answer.

dat <- read.delim("~/Downloads/df1.dat", sep = " ")

dat$Date <- as.Date(dat$Date)

dat$month <- factor(months(dat$Date, TRUE), levels = month.abb)
dat$year <- gsub("-.*", "", dat$Date)

month_summary <- aggregate(DENI011 ~ month + year, data = dat, mean)

ggplot(month_summary, aes(month, DENI011, color = year, group = year)) +
    geom_path()



回答3:


There are multiple values from one month, so when plotting your original data, you got multiple points in one month. Therefore, the line looks strange.

If you want to create something similar to the example your provided, you have to summarize your data by year and month. Below I calculated the mean of each year and month for your data. In addition, you need to convert your year and month to factors if you want to plot it as discrete variables.

library(dplyr)
Ref_Data2 <- Ref_Data %>%
  group_by(MonthN, YearN, Month) %>%
  summarize(DENI011 = mean(DENI011)) %>%
  ungroup() %>%
  # Convert the Month column to factor variable with levels from Jan to Dec
  # Convert the YearN column to factor
  mutate(Month = factor(Month, levels = unique(Month)),
         YearN = as.factor(YearN))

g <- ggplot(data = Ref_Data2, 
            aes(x = Month, y = DENI011, group = YearN, colour = YearN)) + 
  geom_line() 
g



来源:https://stackoverflow.com/questions/48722758/r-how-to-create-a-seasonal-plot-different-lines-for-years

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!