subset | 易学教程

Subset dataframe based on number of observations in each column

阅读更多关于 Subset dataframe based on number of observations in each column

问题 I have one problem would you like to give me a hand. I tried to come up with solution, but I do not have any idea how to work it out. Please use this to recreate my dataframe. structure(list(A1 = c(87L, 67L, 80L, 36L, 71L, 6L, 26L, 15L, 14L, 46L, 19L, 93L, 5L, 94L), A2 = c(50L, NA, 73L, 58L, 47L, 74L, 39L, NA, NA, NA, NA, NA, NA, NA), A3 = c(NA, 38L, 10L, 41L, NA, 66L, NA, 7L, 29L, NA, 70L, 23L, 46L, 55L)), .Names = c("A1", "A2", "A3"), class = "data.frame", row.names = c(NA, -14L)) I have

Subsetting a dataframe for a specified month and year

阅读更多关于 Subsetting a dataframe for a specified month and year

问题 I have a dataframe where the first column is a date in d/m/y format and the second is a numeric value (sales). I want to create subsets for each month of one year (eg. 11/11, 12/11 etc). I tried the code suggested in this answer: subset a data.frame with multiple conditions and it works when the condition on the month is imposed: subset(sales, format.Date(date, "%m")=="11") but it returns an empty subset with error message invalid 'x' argument when I add the year condition: subset(sales,

subset a data.frame with multiple conditions

阅读更多关于 subset a data.frame with multiple conditions

问题 Suppose my data looks like this: 2372 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 1.3 05/07/2006 9104 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 0.34 07/23/2006 9212 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 0.33 02/11/2007 2094 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 1.4 05/06/2007 16763 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 0.61 05/11/2009 1076 Kansas KS2000111 HUMBOLDT, CITY OF METOLACHLOR 0.48 05/12/2002 1077 Kansas KS2000111 HUMBOLDT, CITY OF METOLACHLOR 0.3 05/07/2006 I

Group rows in data frame based on time difference between consecutive rows

阅读更多关于 Group rows in data frame based on time difference between consecutive rows

问题 I have a data frame of this type YEAR MONTH DAY HOUR LON LAT 1860 10 3 13 -19.50 3.00 1860 10 3 17 -19.50 4.00 1860 10 3 21 -19.50 5.00 1860 10 5 5 -20.50 6.00 1860 10 5 13 -21.50 7.00 1860 10 5 17 -21.50 8.00 1860 10 6 1 -22.50 9.00 1860 10 6 5 -22.50 10.00 1860 12 5 9 -22.50 -7.00 1860 12 5 18 -23.50 -8.00 1860 12 5 22 -23.50 -9.00 1860 12 6 6 -24.50 -10.00 1860 12 6 10 -24.50 -11.00 1860 12 6 18 -24.50 -12.00 What I wold like to do is to calculate the interpolating line for every subset of

Faster way to subset on rows of a data frame in R?

阅读更多关于 Faster way to subset on rows of a data frame in R?

问题 I have been using these 2 methods interchangeably to subset data from a data frame in R. Method 1 subset_df <- df[which(df$age>5) , ] Method 2 subset_df <- subset(df, age>5) I had 2 questions belonging to these. 1. Which one is faster considering I have very large data? 2. This post here Subsetting data frames in R suggests that there is in fact difference between above 2 methods. One of them handles NA accurately. Which one is safe to use then? 回答1: The question asks for a faster way to

Subsets in Prolog

阅读更多关于 Subsets in Prolog

问题 I'm looking for a predicate that works as this: ?- subset([1,2,3], X). X = [] ; X = [1] ; X = [2] ; X = [3] ; X = [1, 2] ; X = [1, 2, 3] ; X = [2, 3] ; ... I've seen some subset implementations, but they all work when you want to check if one list is a subset of the another, not when you want to generate the subsets. Any ideas? 回答1: Here goes an implementation: subset([], []). subset([E|Tail], [E|NTail]):- subset(Tail, NTail). subset([_|Tail], NTail):- subset(Tail, NTail). It will generate

Why is subsetting on a “logical” type slower than subsetting on “numeric” type?

阅读更多关于 Why is subsetting on a “logical” type slower than subsetting on “numeric” type?

问题 Suppose we've a vector (or a data.frame for that matter) as follows: set.seed(1) x <- sample(10, 1e6, TRUE) And one wants to get all values of x where x > 4 , say: a1 <- x[x > 4] # (or) a2 <- x[which(x > 4)] identical(a1, a2) # TRUE I think most people would prefer x[x > 4] . But surprisingly (at least to me), subsetting using which is faster! require(microbenchmark) microbenchmark(x[x > 4], x[which(x > 4)], times = 100) Unit: milliseconds expr min lq median uq max neval x[x > 4] 56.59467 57

subset a column in data frame based on another data frame/list

阅读更多关于 subset a column in data frame based on another data frame/list

问题 I have the following table1 which is a data frame composed of 6 columns and 8083 rows. Below I am displaying the head of this table1 : |gene ID | prom_65| prom_66| amast_69| amast_70| p_value| |:--------------|---------:|---------:|---------:|---------:|---------:| |LdBPK_321470.1 | 24.7361| 25.2550| 31.2974| 45.4209| 0.2997430| |LdBPK_251900.1 | 107.3580| 112.9870| 77.4182| 86.3211| 0.0367792| |LdBPK_331430.1 | 72.0639| 86.1486| 68.5747| 77.8383| 0.2469355| |LdBPK_100640.1 | 43.8766| 53.4004

How can I subset rows in a data frame in R based on a vector of values?

阅读更多关于 How can I subset rows in a data frame in R based on a vector of values?

问题 I have two data sets that are supposed to be the same size but aren't. I need to trim the values from A that are not in B and vice versa in order to eliminate noise from a graph that's going into a report. (Don't worry, this data isn't being permanently deleted!) I have read the following: Selecting columns in R data frame based on those *not* in a vector http://www.ats.ucla.edu/stat/r/faq/subset_R.htm How to combine multiple conditions to subset a data-frame using "OR"? But I'm still not

How can I subset rows in a data frame in R based on a vector of values?

阅读更多关于 How can I subset rows in a data frame in R based on a vector of values?