dplyr

plot polynomial regression line with ggplot stat_smooth

懵懂的女人 提交于 2021-02-20 19:05:21
问题 I'm trying to create a scatter plot with second degree polynomial regression line using ggplot:stat_smooth. Here are the codes: df.car_spec_data <- read.csv(url("http://www.sharpsightlabs.com/ wp- content/uploads/2015/01/auto-snout_car-specifications_COMBINED.txt")) df.car_spec_data$year <- as.character(df.car_spec_data$year) df.car_spec_data %>% group_by(year) %>% summarise(maxspeed=max(top_speed_mph, na.rm=T)) %>% ggplot(aes(x=year, y=maxspeed, group=1))+geom_point(color='red', alpha=0.3,

dplyr Update a cell in a data.frame

我怕爱的太早我们不能终老 提交于 2021-02-20 18:52:44
问题 df <-data.frame(x=c(1:5),y=c(letters[1:5])) Let's say I want to modify the last row, update.row<-filter(df,x==5) %>% mutate(y="R") How do I update this row into the data.frame ? The only way, I found albeit a strange way is to do an anti-join and append the results. df <-anti_join(df,update.row,by="x") %>% bind_rows(update.row) However, it seems like a very inelegant way to achieve a simple task. Any ideas are much appreciated... 回答1: If you are insistant on dplyr , perhaps df <-data.frame(x

dplyr Update a cell in a data.frame

风流意气都作罢 提交于 2021-02-20 18:51:46
问题 df <-data.frame(x=c(1:5),y=c(letters[1:5])) Let's say I want to modify the last row, update.row<-filter(df,x==5) %>% mutate(y="R") How do I update this row into the data.frame ? The only way, I found albeit a strange way is to do an anti-join and append the results. df <-anti_join(df,update.row,by="x") %>% bind_rows(update.row) However, it seems like a very inelegant way to achieve a simple task. Any ideas are much appreciated... 回答1: If you are insistant on dplyr , perhaps df <-data.frame(x

Take difference between first and last observations in a row, where each row is different

倾然丶 夕夏残阳落幕 提交于 2021-02-20 17:56:41
问题 I have data that looks like the following: Region X2012 X2013 X2014 X2015 X2016 X2017 1 1 10 11 12 13 14 15 2 2 NA 17 14 NA 23 NA 3 3 12 18 18 NA 23 NA 4 4 NA NA 15 28 NA 38 5 5 14 18.5 16 27 25 39 6 6 15 NA 17 27.5 NA 39 The numbers are irrelevant here but what I am trying to do is take the difference between the earliest and latest observed points in each row to make a new column for the difference where: Region Diff 1 (15 - 10) = 5 2 (23 - 17) = 6 and so on, not actually showing the

Using n() at the same time as calculating other summary statistics

我的未来我决定 提交于 2021-02-20 09:10:58
问题 I am having trouble to prepare a summary table using dplyr based on the data set below: set.seed(1) df <- data.frame(rep(sample(c(2012,2016),10, replace = T)), sample(c('Treat','Control'),10,replace = T), runif(10,0,1), runif(10,0,1), runif(10,0,1)) colnames(df) <- c('Year','Group','V1','V2','V3') I want to calculate the mean, median, standard deviation and count the number of observations by each combination of Year and Group . I have successfully used this code to get mean , median and sd :

Using n() at the same time as calculating other summary statistics

时光总嘲笑我的痴心妄想 提交于 2021-02-20 09:09:33
问题 I am having trouble to prepare a summary table using dplyr based on the data set below: set.seed(1) df <- data.frame(rep(sample(c(2012,2016),10, replace = T)), sample(c('Treat','Control'),10,replace = T), runif(10,0,1), runif(10,0,1), runif(10,0,1)) colnames(df) <- c('Year','Group','V1','V2','V3') I want to calculate the mean, median, standard deviation and count the number of observations by each combination of Year and Group . I have successfully used this code to get mean , median and sd :

Using n() at the same time as calculating other summary statistics

最后都变了- 提交于 2021-02-20 09:08:36
问题 I am having trouble to prepare a summary table using dplyr based on the data set below: set.seed(1) df <- data.frame(rep(sample(c(2012,2016),10, replace = T)), sample(c('Treat','Control'),10,replace = T), runif(10,0,1), runif(10,0,1), runif(10,0,1)) colnames(df) <- c('Year','Group','V1','V2','V3') I want to calculate the mean, median, standard deviation and count the number of observations by each combination of Year and Group . I have successfully used this code to get mean , median and sd :

Using n() at the same time as calculating other summary statistics

爱⌒轻易说出口 提交于 2021-02-20 09:08:35
问题 I am having trouble to prepare a summary table using dplyr based on the data set below: set.seed(1) df <- data.frame(rep(sample(c(2012,2016),10, replace = T)), sample(c('Treat','Control'),10,replace = T), runif(10,0,1), runif(10,0,1), runif(10,0,1)) colnames(df) <- c('Year','Group','V1','V2','V3') I want to calculate the mean, median, standard deviation and count the number of observations by each combination of Year and Group . I have successfully used this code to get mean , median and sd :

mapping (ordered) factors to colors in ggplot

有些话、适合烂在心里 提交于 2021-02-20 04:05:55
问题 Consider this example data_frame(mylabel = c('month 18', 'month 19', 'month 20', 'month 21', 'month 22'), value = c(5,10,-2,2,0), time = c(1,2,3,4,5)) %>% ggplot(aes( x= time, y = value, color = mylabel)) + geom_point(size = 7) Here you can see that the variable mylabel has a natural ordering: month 18 comes before month 19 etc. However, this natural ordering is not preserved by the colors chosen by ggplot . In my real dataset, I have about 50 different months and I would like to use a color

mapping (ordered) factors to colors in ggplot

℡╲_俬逩灬. 提交于 2021-02-20 04:05:49
问题 Consider this example data_frame(mylabel = c('month 18', 'month 19', 'month 20', 'month 21', 'month 22'), value = c(5,10,-2,2,0), time = c(1,2,3,4,5)) %>% ggplot(aes( x= time, y = value, color = mylabel)) + geom_point(size = 7) Here you can see that the variable mylabel has a natural ordering: month 18 comes before month 19 etc. However, this natural ordering is not preserved by the colors chosen by ggplot . In my real dataset, I have about 50 different months and I would like to use a color