subset | 易学教程

Pandas: Use iterrows on Dataframe subset

阅读更多关于 Pandas: Use iterrows on Dataframe subset

问题 What is the best way to do iterrows with a subset of a DataFrame? Let's take the following simple example: import pandas as pd df = pd.DataFrame({ 'Product': list('AAAABBAA'), 'Quantity': [5,2,5,10,1,5,2,3], 'Start' : [ DT.datetime(2013,1,1,9,0), DT.datetime(2013,1,1,8,5), DT.datetime(2013,2,5,14,0), DT.datetime(2013,2,5,16,0), DT.datetime(2013,2,8,20,0), DT.datetime(2013,2,8,16,50), DT.datetime(2013,2,8,7,0), DT.datetime(2013,7,4,8,0)]}) df = df.set_index(['Start']) Now I would like to

How to subset data with advance string matching

阅读更多关于 How to subset data with advance string matching

问题 I have the following data frame from which I would like to extract rows based on matching strings. > GEMA_EO5 gene_symbol fold_EO p_value RefSeq_ID BH_p_value KNG1 3.433049 8.56e-28 NM_000893,NM_001102416 1.234245e-24 REXO4 3.245317 1.78e-27 NM_020385 2.281367e-24 VPS29 3.827665 2.22e-25 NM_057180,NM_016226 2.560770e-22 CYP51A1 3.363149 5.95e-25 NM_000786,NM_001146152 6.239386e-22 TNPO2 4.707600 1.60e-23 NM_001136195,NM_001136196,NM_013433 1.538000e-20 NSDHL 2.703922 6.74e-23 NM_001129765,NM

checking for equality

阅读更多关于 checking for equality

问题 i want to check equality of a dataset. the data set is looking like this Equips <- c(1,1,1,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,5,6,7,8) Notifs <- c(10,10,20,55,63,67,71,73,73,73,81,81,83,32,32,32,32, 47,48,45,45,45,51,51,55,56,69,65,88) Comps <- c("Motor","Ventil","Motor","Gehäuse","Ventil","Motor","Steuerung","Motor", "Ventil","Gehäuse","Gehäuse","Ventil","Motor","Schraube","Motor","Festplatte", "Heizgerät","Motor","Schraube","Schraube","Lichtmaschine","Bremse","Lichtmaschine",

checking for equality

阅读更多关于 checking for equality

How to extract a dataframe which is within a list in r, using a condition?

阅读更多关于 How to extract a dataframe which is within a list in r, using a condition?

问题 I have a list which has dataframes of various dimensions. I want to extract those dataframes who rows greater than 30 I tried : DR<-sapply(list, function(x) subset(list,nrow(list$'x')=30)) But it is showing error. Please help! 回答1: Assuming your list is called list_df , we can use Filter Filter(function(x) nrow(x) == 30, list_df) Or sapply list_df[sapply(list_df, nrow) == 30] We can also use purrr::keep purrr::keep(list_df, ~nrow(.) == 30) 来源： https://stackoverflow.com/questions/58850863/how

how to pass an expression through a function for the subset function to evaluate in R

阅读更多关于 how to pass an expression through a function for the subset function to evaluate in R

问题 i'm trying to write a subset method for a different object class that i'd like users to be able to execute the same way they use the subset.data.frame function. i've read a few related articles like this and this, but i don't think they're the solution here. i believe i'm using the wrong environment, but i don't understand enough about environments and also the substitute function to figure out why the first half of this code works but the second half doesn't. could anyone explain what i'm

How do I extract specific elements from an array?

阅读更多关于 How do I extract specific elements from an array?

问题 If I have an array a = [1,2,3,4,5,6,7,8,9,10] and I want a subset of this array - the 1st, 5th and 7th elements. Is it possible to extract these from this array in a simple way. I was thinking something like: a[0,4,6] = [1,5,7] but that doesn't work. Also is there a way to return all indices except those specified? For example, something like a[-0,-4,-6] = [2,3,4,6,8,9,10] 回答1: Here's one way: [0,4,6].map{|i| a[i]} 回答2: You can simply do: [1] pry(main)> [1,2,3,4,5,6,7,8,9,10].values_at(0, 4,

How can I select a row by row name in a subsetted data frame in R?

阅读更多关于 How can I select a row by row name in a subsetted data frame in R?

问题 I want to select rows by name in a data frame that is a subset of a larger one. The subsetted data frame appears to have retained the names of the original data frame, such that: > DFsubset[1:3,] x1 x2 x3 271 3 5 2 553 2 4 1 563 2 5 3 while using the printed row name returns the following: > DFsubset[271,] Error in xj[i, , drop = FALSE] : subscript out of bounds How can I select these rows based on the row names from the original DF, ie. 271, 553, 563? 回答1: You need to reference the rownames

R Subset Dataset Using Regular Expression

阅读更多关于 R Subset Dataset Using Regular Expression

问题 Is there a way to make the R code below run quicker (i.e. vectorized to avoid use of for loops)? My example contains two data frames. First is dimension n1*p. One of the p columns contains names. Second data frame is a column vector (n2*1). It contains names as well. I want to keep all rows of the first data frame, where some part of the name in the column vector of the second data frame appears in the corresponding first data frame. Sorry for the brutal explanation. Example (Data frame 1): x

geom_smooth on a subset of data

阅读更多关于 geom_smooth on a subset of data

问题 Here is some data and a plot: set.seed(18) data = data.frame(y=c(rep(0:1,3),rnorm(18,mean=0.5,sd=0.1)),colour=rep(1:2,12),x=rep(1:4,each=6)) ggplot(data,aes(x=x,y=y,colour=factor(colour)))+geom_point()+ geom_smooth(method='lm',formula=y~x,se=F) As you can see the linear regression is highly influenced by the values where x=1. Can I get linear regressions calculated for x >= 2 but display the values for x=1 (y equals either 0 or 1). The resulting graph would be exactly the same except for the