subset | 易学教程

R finding rows of a data frame where certain columns match those of another [duplicate]

阅读更多关于 R finding rows of a data frame where certain columns match those of another [duplicate]

问题 This question already has answers here : Extract data from one data frame to another data frame with different row length (4 answers) Closed 6 years ago . I have an R question that I'm even sure how to word in one sentence, and couldn't find an answer for this yet. I have two data frames that I would like to 'intersect' and find all rows where column values match in two columns. I've tried connecting two intersect() and which() statements with &&, but neither has given me what I want yet.

Subset a dataframe by multiple factor levels [duplicate]

阅读更多关于 Subset a dataframe by multiple factor levels [duplicate]

问题 This question already has answers here : Select rows from a data frame based on values in a vector (3 answers) Closed 2 years ago . How can I avoid using a loop to subset a dataframe based on multiple factor levels? In the following example my desired output is a dataframe. The dataframe should contain the rows of the original dataframe where the value in "Code" equals one of the values in "selected". Working example: #sample data Code<-c("A","B","C","D","C","D","A","A") Value<-c(1, 2, 3, 4,

Select multiple elements from a list

阅读更多关于 Select multiple elements from a list

问题 I have a list in R some 10,000 elements long. Say I want to select only elements, 5, 7, and 9. I'm not sure how I would do that without a for loop. I want to do something like mylist[[c(5,7,9]] but that doesn't work. I've also tried the lapply function but haven't been able to get that working either. 回答1: mylist[c(5,7,9)] should do it. You want the sublists returned as sublists of the result list; you don't use [[]] (or rather, the function is [[ ) for that -- as Dason mentions in comments,

Brackets make a vector different. How exactly is vector expression evaluated?

阅读更多关于 Brackets make a vector different. How exactly is vector expression evaluated?

问题 I have a data frame as follows: planets type diameter rotation rings Mercury Terrestrial planet 0.382 58.64 FALSE Venus Terrestrial planet 0.949 -243.02 FALSE Earth Terrestrial planet 1.000 1.00 FALSE Mars Terrestrial planet 0.532 1.03 FALSE Jupiter Gas giant 11.209 0.41 TRUE Saturn Gas giant 9.449 0.43 TRUE Uranus Gas giant 4.007 -0.72 TRUE Neptune Gas giant 3.883 0.67 TRUE I wanted to select last 3 rows: planets_df[nrow(planets_df)-3:nrow(planets_df),] However, I've got something I didn't

Subset data based on Minimum Value

阅读更多关于 Subset data based on Minimum Value

问题 This might an easy one. Here's the data: dat <- read.table(header=TRUE, text=" Seg ID Distance Seg46 V21 160.37672 Seg72 V85 191.24400 Seg373 V85 167.38930 Seg159 V147 14.74852 Seg233 V171 193.01636 Seg234 V171 200.21458 ") dat Seg ID Distance Seg46 V21 160.37672 Seg72 V85 191.24400 Seg373 V85 167.38930 Seg159 V147 14.74852 Seg233 V171 193.01636 Seg234 V171 200.21458 I am intending to get a table like the following that will give me Seg for the minimized distance (as duplication is seen in ID

Subsetting R array: dimension lost when its length is 1

阅读更多关于 Subsetting R array: dimension lost when its length is 1

问题 When subsetting arrays, R behaves differently depending on whether one of the dimensions is of length 1 or not. If a dimension has length 1, that dimension is lost during subsetting: ax <- array(1:24, c(2,3,4)) ay <- array(1:12, c(1,3,4)) dim(ax) #[1] 2 3 4 dim(ay) #[1] 1 3 4 dim(ax[,1:2,]) #[1] 2 2 4 dim(ay[,1:2,]) #[1] 2 4 From my point of view, ax and ay are the same, and performing the same subset operation on them should return an array with the same dimensions. I can see that the way

Remove rows from a single-column data frame

阅读更多关于 Remove rows from a single-column data frame

问题 When I try to remove the last row from a single column data frame, I get a vector back instead of a data frame: > df = data.frame(a=1:10) > df a 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 > df[-(length(df[,1])),] [1] 1 2 3 4 5 6 7 8 9 The behavior I'm looking for is what happens when I use this command on a two-column data frame: > df = data.frame(a=1:10,b=11:20) > df a b 1 1 11 2 2 12 3 3 13 4 4 14 5 5 15 6 6 16 7 7 17 8 8 18 9 9 19 10 10 20 > df[-(length(df[,1])),] a b 1 1 11 2 2 12 3 3 13 4

Subset data based on partial match of column names

阅读更多关于 Subset data based on partial match of column names

问题 I need to subset a df to include certain strings. Some of these are full column names, and the following works fine: testData[,c("FullColName1","FullColName2","FullColName3")] My problem is that I need to expand this to also include column names that contain specific strings that may partially match to some other column names. These strings include letters and symbols: "PartString1()","PartString2()" I tried putting wildcards around these. (I've indicated this below with the prefix "star"

SQL: How To Select Earliest Row

阅读更多关于 SQL: How To Select Earliest Row

问题 I have a report that looks something like this: CompanyA Workflow27 June5 CompanyA Workflow27 June8 CompanyA Workflow27 June12 CompanyB Workflow13 Apr4 CompanyB Workflow13 Apr9 CompanyB Workflow20 Dec11 CompanyB Wofkflow20 Dec17 This is done with SQL (specifically, T-SQL version Server 2005): SELECT company , workflow , date FROM workflowTable I would like the report to show just the earliest dates for each workflow: CompanyA Workflow27 June5 CompanyB Workflow13 Apr4 CompanyB Workflow20 Dec11

Subset panel data by group [duplicate]

阅读更多关于 Subset panel data by group [duplicate]

问题 This question already has answers here : How to select the first and last row within a grouping variable in a data frame? (4 answers) Closed 10 months ago . I would like to subset an unbalanced panel data set by group. For each group, I would like to keep the two observations in the first and the last years. How do I best do this in R? For example: dt <- data.frame(name= rep(c("A", "B", "C"), c(3,2,3)), year=c(2001:2003,2000,2002,2000:2001,2003)) > dt name year 1 A 2001 2 A 2002 3 A 2003 4 B