r

Generating variable names for dataframes based on the loop number in a loop in R

情到浓时终转凉″ 提交于 2021-02-17 06:04:40
问题 I am working on developing and optimizing a linear model using the lm() function and subsequently the step() function for optimization. I have added a variable to my dataframe by using a random generator of 0s and 1s (50% chance each). I use this variable to subset the dataframe into a training set and a validation set If a record is not assigned to the training set it is assigned to the validation set. By using these subsets I am able to estimate how good the fit of the model is (by using

r-project SixSigma ss.rr gives Error in `row.names<-.data.frame`(`*tmp*`, value = value) : invalid 'row.names' length

别说谁变了你拦得住时间么 提交于 2021-02-17 05:56:30
问题 I have below data.frame > str(luc) 'data.frame': 19 obs. of 4 variables: $ driver : Factor w/ 16 levels "nr #1","nr #10",..: 1 9 10 11 12 13 14 15 16 2 ... $ position: Factor w/ 16 levels "pos #1","pos #10",..: 1 9 10 11 12 13 14 15 16 2 ... $ ate : num 2 2 2 2 2 2 2 2 2 1 ... $ i2 : num 0.00656 0.00676 0.00679 0.00681 0.00666 0.00657 0.00674 0.00676 0.00682 0.00684 ... > luc driver position ate i2 1 nr #1 pos #1 2 0.00656 2 nr #2 pos #2 2 0.00676 3 nr #3 pos #3 2 0.00679 4 nr #4 pos #4 2 0

R Help converting factor data from long to wide and assigning logical value

别来无恙 提交于 2021-02-17 05:54:28
问题 I have data in long format as seen below: Data: id code 1 EP 2 EP 3 EP 4 UM 5 UM 1 UM 2 UM 10 UM 6 BZ 7 BZ 14 BZ 2 BZ 8 TVOL 9 TVOL 16 TVOL 10 NW 11 NW 7 NW 12 SM 13 SM 3 SM 14 GS 15 GS 1 GS 2 GS 9 GS I would like to create a wide dataframe with each "code" as its own column marked TRUE/FALSE depending on whether there's an associated "id" as seen in the minimal example below: id code.EP code.UM code.BZ code.TVOL code.NW code.SM code.GS 1 TRUE TRUE FALSE FALSE FALSE FALSE TRUE 2 TRUE FALSE

Add column to data frame with sequence depending on other column

試著忘記壹切 提交于 2021-02-17 05:50:27
问题 I have two columns of data like this: I want to add a column or modify the second column resulting in a sequence of integers starting with 1, wherever the 1 already appears. Result changes to: I can do this with a loop, but what is the "right" R way of doing it? Here's my loop: for(i in 1:length(df2$col2)) { df2$col3[i] <- ifelse(df2$col2[i] == 1, 1, df2$col3[i - 1] + 1) if(is.na(df2$col2[i])) df2$col3[i] <- df2$col3[i - 1] + 1 } Here is a sample data set with 20 rows: 478.69, 320.45, 503.7,

Add column to data frame with sequence depending on other column

佐手、 提交于 2021-02-17 05:50:07
问题 I have two columns of data like this: I want to add a column or modify the second column resulting in a sequence of integers starting with 1, wherever the 1 already appears. Result changes to: I can do this with a loop, but what is the "right" R way of doing it? Here's my loop: for(i in 1:length(df2$col2)) { df2$col3[i] <- ifelse(df2$col2[i] == 1, 1, df2$col3[i - 1] + 1) if(is.na(df2$col2[i])) df2$col3[i] <- df2$col3[i - 1] + 1 } Here is a sample data set with 20 rows: 478.69, 320.45, 503.7,

How to merge multiple rows by a given condition and sum?

為{幸葍}努か 提交于 2021-02-17 05:50:06
问题 I have long format data with ID, time and state columns. I would like some states to be merged within ID by s_2 and s_3 and the time column to be summed. Let's say I have data: ID state time 1 s_1 4 1 s_2 6 1 s_3 7 2 s_1 2 2 s_2 12 2 s_3 5 2 s_4 4 3 s_1 10 3 s_2 2 3 s_3 3 that I'd like to convert into: ID state time 1 s_1 4 1 s_2+ 13 2 s_1 2 2 s_2+ 17 2 s_4 4 3 s_1 10 3 s_2+ 5 Any ideas? 回答1: Change the label of state values and then group by sum. library(dplyr) df %>% group_by(ID, state =

R: How to Count All Character Values Separated By Commas In A Column?

我怕爱的太早我们不能终老 提交于 2021-02-17 05:49:08
问题 Below is a couple of rows of some test data I am using. I am wanting to count the frequency of all the characters in the ICD10Code column which are separated by columns. From the segment of code below, I used group_by because every "PatientId" value had duplicates in that column but had unique values in other columns. How can I go about counting the frequency of all character values? PatientId ReferralSource NextAppt Age InsuranceName ICD10Code 1584 St Francis Y 34 SLIDING FEE SCHEDULE M5136,

Merging polygons and summing their values

女生的网名这么多〃 提交于 2021-02-17 05:47:28
问题 I have a dataframe with many overlapping polygons that I would like to combine into a single shape with the value that equates to the sum of the values given to each infividual polygon. some example data: df <- data.frame(x = c(0.5, 1.5, 4.5, 5.5), y = c(1, 1, 1, 1), id = c('a', 'b', 'c', 'd'), score = c(1, 3, 2, 4)) s_df <- SpatialPointsDataFrame(df[, c('x', 'y')], df[, 3:4]) %>% as('sf') %>% st_buffer(dist = 1) plot(s_df) I can get the union of these polygons by using the st_union function

gsub with “|” character in R

谁说胖子不能爱 提交于 2021-02-17 05:45:31
问题 I have a data frame with strings under a variable with the | character. What I want is to remove anything downstream of the | character. For example, considering the string heat-shock protein hsp70, putative | location=Ld28_v01s1:1091329-1093293(-) | length=654 | sequence_SO=chromosome | SO=protein_coding I wish to have only: heat-shock protein hsp70, putative Do I need any escape character for the | character? If I do: a <- c("foo_5", "bar_7") gsub("*_.", "", a) I get: [1] "foo" "bar" i.e. I

gsub with “|” character in R

妖精的绣舞 提交于 2021-02-17 05:43:05
问题 I have a data frame with strings under a variable with the | character. What I want is to remove anything downstream of the | character. For example, considering the string heat-shock protein hsp70, putative | location=Ld28_v01s1:1091329-1093293(-) | length=654 | sequence_SO=chromosome | SO=protein_coding I wish to have only: heat-shock protein hsp70, putative Do I need any escape character for the | character? If I do: a <- c("foo_5", "bar_7") gsub("*_.", "", a) I get: [1] "foo" "bar" i.e. I