dplyr | 易学教程

plot polynomial regression line with ggplot stat_smooth

阅读更多关于 plot polynomial regression line with ggplot stat_smooth

问题 I'm trying to create a scatter plot with second degree polynomial regression line using ggplot:stat_smooth. Here are the codes: df.car_spec_data <- read.csv(url("http://www.sharpsightlabs.com/ wp- content/uploads/2015/01/auto-snout_car-specifications_COMBINED.txt")) df.car_spec_data$year <- as.character(df.car_spec_data$year) df.car_spec_data %>% group_by(year) %>% summarise(maxspeed=max(top_speed_mph, na.rm=T)) %>% ggplot(aes(x=year, y=maxspeed, group=1))+geom_point(color='red', alpha=0.3,

dplyr Update a cell in a data.frame

阅读更多关于 dplyr Update a cell in a data.frame

问题 df <-data.frame(x=c(1:5),y=c(letters[1:5])) Let's say I want to modify the last row, update.row<-filter(df,x==5) %>% mutate(y="R") How do I update this row into the data.frame ? The only way, I found albeit a strange way is to do an anti-join and append the results. df <-anti_join(df,update.row,by="x") %>% bind_rows(update.row) However, it seems like a very inelegant way to achieve a simple task. Any ideas are much appreciated... 回答1: If you are insistant on dplyr , perhaps df <-data.frame(x

dplyr Update a cell in a data.frame

阅读更多关于 dplyr Update a cell in a data.frame

Take difference between first and last observations in a row, where each row is different

阅读更多关于 Take difference between first and last observations in a row, where each row is different

问题 I have data that looks like the following: Region X2012 X2013 X2014 X2015 X2016 X2017 1 1 10 11 12 13 14 15 2 2 NA 17 14 NA 23 NA 3 3 12 18 18 NA 23 NA 4 4 NA NA 15 28 NA 38 5 5 14 18.5 16 27 25 39 6 6 15 NA 17 27.5 NA 39 The numbers are irrelevant here but what I am trying to do is take the difference between the earliest and latest observed points in each row to make a new column for the difference where: Region Diff 1 (15 - 10) = 5 2 (23 - 17) = 6 and so on, not actually showing the

Using n() at the same time as calculating other summary statistics

阅读更多关于 Using n() at the same time as calculating other summary statistics

问题 I am having trouble to prepare a summary table using dplyr based on the data set below: set.seed(1) df <- data.frame(rep(sample(c(2012,2016),10, replace = T)), sample(c('Treat','Control'),10,replace = T), runif(10,0,1), runif(10,0,1), runif(10,0,1)) colnames(df) <- c('Year','Group','V1','V2','V3') I want to calculate the mean, median, standard deviation and count the number of observations by each combination of Year and Group . I have successfully used this code to get mean , median and sd :

Using n() at the same time as calculating other summary statistics

阅读更多关于 Using n() at the same time as calculating other summary statistics

Using n() at the same time as calculating other summary statistics

阅读更多关于 Using n() at the same time as calculating other summary statistics

Using n() at the same time as calculating other summary statistics

阅读更多关于 Using n() at the same time as calculating other summary statistics

mapping (ordered) factors to colors in ggplot

阅读更多关于 mapping (ordered) factors to colors in ggplot

问题 Consider this example data_frame(mylabel = c('month 18', 'month 19', 'month 20', 'month 21', 'month 22'), value = c(5,10,-2,2,0), time = c(1,2,3,4,5)) %>% ggplot(aes( x= time, y = value, color = mylabel)) + geom_point(size = 7) Here you can see that the variable mylabel has a natural ordering: month 18 comes before month 19 etc. However, this natural ordering is not preserved by the colors chosen by ggplot . In my real dataset, I have about 50 different months and I would like to use a color

mapping (ordered) factors to colors in ggplot

阅读更多关于 mapping (ordered) factors to colors in ggplot