subset

subsetting a Python DataFrame

喜你入骨 提交于 2019-11-28 15:57:56
I am transitioning from R to Python. I just began using Pandas. I have an R code that subsets nicely: k1 <- subset(data, Product = p.id & Month < mn & Year == yr, select = c(Time, Product)) Now, I want to do similar stuff in Python. this is what I have got so far: import pandas as pd data = pd.read_csv("../data/monthly_prod_sales.csv") #first, index the dataset by Product. And, get all that matches a given 'p.id' and time. data.set_index('Product') k = data.ix[[p.id, 'Time']] # then, index this subset with Time and do more subsetting.. I am beginning to feel that I am doing this the wrong way.

Tables whose sole purpose is specify a subset of another table

别说谁变了你拦得住时间么 提交于 2019-11-28 14:37:50
The database I'm designing has an employees table; there can be multiple types of employees, one of which are medical employees. The database needs to also describe a many-to-many relation between medical employees and what competences they have. Is it okay to create a table medical_employees with only an id column, whose only purpose is to specify which employees are medics? The id column has a foreign key constraint that references the employees table. The code below should make my question clearer: /* Defines a generic employee */ CREATE TABLE employees ( id INT PRIMARY KEY AUTO_INCREMENT,

Subset panel data by group [duplicate]

十年热恋 提交于 2019-11-28 14:23:15
This question already has an answer here: How to select the first and last row within a grouping variable in a data frame? 3 answers I would like to subset an unbalanced panel data set by group. For each group, I would like to keep the two observations in the first and the last years. How do I best do this in R? For example: dt <- data.frame(name= rep(c("A", "B", "C"), c(3,2,3)), year=c(2001:2003,2000,2002,2000:2001,2003)) > dt name year 1 A 2001 2 A 2002 3 A 2003 4 B 2000 5 B 2002 6 C 2000 7 C 2001 8 C 2003 What I would like to have: name year 1 A 2001 3 A 2003 4 B 2000 5 B 2002 6 C 2000 8 C

Subset dataframe based on number of observations in each column

社会主义新天地 提交于 2019-11-28 14:12:40
I have one problem would you like to give me a hand. I tried to come up with solution, but I do not have any idea how to work it out. Please use this to recreate my dataframe. structure(list(A1 = c(87L, 67L, 80L, 36L, 71L, 6L, 26L, 15L, 14L, 46L, 19L, 93L, 5L, 94L), A2 = c(50L, NA, 73L, 58L, 47L, 74L, 39L, NA, NA, NA, NA, NA, NA, NA), A3 = c(NA, 38L, 10L, 41L, NA, 66L, NA, 7L, 29L, NA, 70L, 23L, 46L, 55L)), .Names = c("A1", "A2", "A3"), class = "data.frame", row.names = c(NA, -14L)) I have this dataframe: A1 A2 A3 87 50 NA 67 NA 38 80 73 10 36 58 41 71 47 NA 6 74 66 26 39 NA 15 NA 7 14 NA 29

How to subset a matrix with different column positions for each row? [duplicate]

自古美人都是妖i 提交于 2019-11-28 13:48:36
This question already has an answer here: Subset a matrix according to a columns vector 2 answers I want to subset a matrix using different (but one) column for every row. So propably apply could do the job? But propably also smart subsetting could work, but i havent found a solution. Computation time is an issue - I have a solution with a for loop, but loading the matrix in the RAM several times is just too slow. Here is an example: Matrix M and vector v are given, M<-matrix(1:15,nrow=5,ncol=3) [,1] [,2] [,3] [1,] 1 6 11 [2,] 2 7 12 [3,] 3 8 13 [4,] 4 9 14 [5,] 5 10 15 v<-c(3,1,1,2,1) and the

Why does tapply take the subset as NA and not exclude them totally

a 夏天 提交于 2019-11-28 12:49:42
问题 I have a question. I want to make a barplot with the mean and errorbars, where it is grouped for two factors. To get the mean and the standard errors I used the function tapply. However for one of the factor I want to drop one level. So what I did was did: dataFE <- data[-which(plant=="FS"),] # this works fine, I get exactly the data set I want without the FS level of the factor plant Then to get the mean and standard error I use this: means <- with(dataFE, as.matrix(tapply(leaves, list(plant

Update subset of values in a dataframe column

﹥>﹥吖頭↗ 提交于 2019-11-28 11:49:54
Here's an excerpt of my dataframe: x y se 4 a 7.146329 15 a 8.458633 17 a 9.286849 11 b 6.700024 8 b 4.697962 12 c 7.884244 10 c 7.834816 17 c 7.762385 12 d 5.910785 15 d 12.98158 I need to update the first column, so that each number will be subtracted by 1, but only for conditions a and b. That is, instead of c(4, 15, 17, 11, 8, 12, 10, 17, 12, 15) , I would get c(3, 14, 16, 10, 7, 12, 10, 17, 12, 15) . Could use ifelse here. Assuming data frame is named df1 : df1$x <- ifelse(df1$y %in% c("a", "b"), df1$x - 1, df1$x) 来源: https://stackoverflow.com/questions/42706372/update-subset-of-values-in

subset() a factor by its number of observation

久未见 提交于 2019-11-28 11:40:57
I have a problem with subset()function. How can I subset a factor of my dataframe by its number of observation? NAME CLASS COLOR VALUE antonio B YELLOW 5 antonio B BLUE 8 antonio B BLUE 7 antonio B BLUE 12 luca C YELLOW 99 luca B YELLOW 87 luca B YELLOW 98 giovanni A BLUE 48 I would like to obtain data where the three factors "NAME","CLASS" and "COLOR" compare at least three times in order to make a mean of VALUE. in this case I'll obtain: NAME CLASS COLOR VALUE antonio B BLUE mean because antonio is the only with three observations for each factor thank you so much Nik You can use the table

Generate a powerset of a set without keeping a stack in Erlang or Ruby

↘锁芯ラ 提交于 2019-11-28 11:38:08
I would like to generate a powerset of a rather big set (about 30-50 elements) and I know that it takes 2^n to store the powerset. Is it possible to generate one subset at a time? I.e. generate a powerset of a set with iterations, saving each generated subset to disk/database, removing it from the stack/memory and only then continuing to generate other subsets? Unfortunately I have failed to modify Erlang and Ruby examples to my needs. Edit: Added the enumerator (as @Jörg W Mittag) if no block is given. class Array def powerset return to_enum(:powerset) unless block_given? 1.upto(self.size) do

subset a data.frame with multiple conditions

那年仲夏 提交于 2019-11-28 10:32:09
Suppose my data looks like this: 2372 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 1.3 05/07/2006 9104 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 0.34 07/23/2006 9212 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 0.33 02/11/2007 2094 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 1.4 05/06/2007 16763 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 0.61 05/11/2009 1076 Kansas KS2000111 HUMBOLDT, CITY OF METOLACHLOR 0.48 05/12/2002 1077 Kansas KS2000111 HUMBOLDT, CITY OF METOLACHLOR 0.3 05/07/2006 I want to be able to subset by the Analyte and a partial match on the date(namely I just want the year). I