subset

Python regular expression to filter list of strings matching a pattern

佐手、 提交于 2019-12-01 19:41:56
问题 I use R a lot more and it is easier for me to do it in R: > test <- c('bbb', 'ccc', 'axx', 'xzz', 'xaa') > test[grepl("^x",test)] [1] "xzz" "xaa" But how to do it in python if test is a list? P.S. I am learning python using google's python exercise. And I prefer using regression expression. 回答1: You can use the following to find if any of the strings in list starts with 'x' >>> [e for e in test if e.startswith('x')] ['xzz', 'xaa'] >>> any(e.startswith('x') for e in test) True 回答2: You could

indexing spherical subset of 3d grid data in numpy

二次信任 提交于 2019-12-01 19:23:09
问题 I have a 3d grid with coordinates x = linspace(0, Lx, Nx) y = linspace(0, Ly, Ny) z = linspace(0, Lz, Nz) and I need to index points (i.e. x[i],y[j],z[k]) within some radius R of a position (x0,y0,z0). N_i can be quite large. I can do a simple loop to find what I need points=[] i0,j0,k0 = floor( (x0,y0,z0)/grid_spacing ) Nr = (i0,j0,k0)/grid_spacing + 2 for i in range(i0-Nr, i0+Nr): for j in range(j0-Nr, j0+Nr): for k in range(k0-Nr, k0+Nr): if norm(array([i,j,k])*grid_spacing - (x0,y0,k0)) <

indexing spherical subset of 3d grid data in numpy

随声附和 提交于 2019-12-01 17:52:42
I have a 3d grid with coordinates x = linspace(0, Lx, Nx) y = linspace(0, Ly, Ny) z = linspace(0, Lz, Nz) and I need to index points (i.e. x[i],y[j],z[k]) within some radius R of a position (x0,y0,z0). N_i can be quite large. I can do a simple loop to find what I need points=[] i0,j0,k0 = floor( (x0,y0,z0)/grid_spacing ) Nr = (i0,j0,k0)/grid_spacing + 2 for i in range(i0-Nr, i0+Nr): for j in range(j0-Nr, j0+Nr): for k in range(k0-Nr, k0+Nr): if norm(array([i,j,k])*grid_spacing - (x0,y0,k0)) < cutoff: points.append((i,j,k)) but this quite slow. Is there a more natural/ faster way to do this

Filter rows based on multiple column conditions R

▼魔方 西西 提交于 2019-12-01 16:28:34
问题 Suppose I have a dataset that has 100-odd columns and I need to keep only those rows in the data which meets one condition applied across all 100 columns.. How do I do this? Suppose, its like below... I need to only keep rows where either of Col1 or 2 or 3 or 4 is >0 Col1 Col2 Col3 Col4 1 1 3 4 0 0 4 2 4 3 4 3 2 1 0 2 1 2 0 3 0 0 0 0 In above example, except last row all rows will make it .. I need to place results in same dataframe as original. not sure if I can use the lapply to loop

geom_smooth on a subset of data

…衆ロ難τιáo~ 提交于 2019-12-01 15:27:23
Here is some data and a plot: set.seed(18) data = data.frame(y=c(rep(0:1,3),rnorm(18,mean=0.5,sd=0.1)),colour=rep(1:2,12),x=rep(1:4,each=6)) ggplot(data,aes(x=x,y=y,colour=factor(colour)))+geom_point()+ geom_smooth(method='lm',formula=y~x,se=F) As you can see the linear regression is highly influenced by the values where x=1. Can I get linear regressions calculated for x >= 2 but display the values for x=1 (y equals either 0 or 1). The resulting graph would be exactly the same except for the linear regressions. They would not "suffer" from the influence of the values on abscisse = 1 It's as

How to assign values to a column for a subset of data frame rows

倖福魔咒の 提交于 2019-12-01 14:19:08
I have a large data frame and I am trying to assign values to a particular data column for specific subsets. subset(P2Y12R_binding_summary,(SYSTEM=="4NTJ")&(VARIANT=="D294N")) SYSTEM VARIANT MODEL EPSIN INP dE_water_free dE_ERR_water_free dE_water_periodic dE_ERR_water_periodic 1 4NTJ D294N LVLSET 1 1 -42.155 29.28460 -42.205 29.52604 2 4NTJ D294N LVLSET 1 2 -34.225 29.75176 -34.235 29.96571 3 4NTJ D294N LVLSET 20 1 -65.163 40.62241 -65.163 40.52564 4 4NTJ D294N LVLSET 20 2 -57.454 41.04459 -57.454 41.26962 5 4NTJ D294N SES 1 1 -23.406 30.56636 -23.335 30.75794 6 4NTJ D294N SES 1 2 -15.434 30

How to assign values to a column for a subset of data frame rows

烈酒焚心 提交于 2019-12-01 13:07:25
问题 I have a large data frame and I am trying to assign values to a particular data column for specific subsets. subset(P2Y12R_binding_summary,(SYSTEM=="4NTJ")&(VARIANT=="D294N")) SYSTEM VARIANT MODEL EPSIN INP dE_water_free dE_ERR_water_free dE_water_periodic dE_ERR_water_periodic 1 4NTJ D294N LVLSET 1 1 -42.155 29.28460 -42.205 29.52604 2 4NTJ D294N LVLSET 1 2 -34.225 29.75176 -34.235 29.96571 3 4NTJ D294N LVLSET 20 1 -65.163 40.62241 -65.163 40.52564 4 4NTJ D294N LVLSET 20 2 -57.454 41.04459

R Dynamic split/subset of dataframe by selected rownumbers- Parsing textgrid praat

北战南征 提交于 2019-12-01 11:28:59
I am trying to process a "segmentation file" called .TextGrid (generated by Praat program). ) The original format looks like this: File type = "ooTextFile" Object class = "TextGrid" xmin = 0 xmax = 243.761375 tiers? <exists> size = 17 item []: item [1]: class = "IntervalTier" name = "phones" xmin = 0 xmax = 243.761 intervals: size = 2505 intervals [1]: xmin = 0 xmax = 0.4274939687384032 text = "_" intervals [2]: xmin = 0.4274939687384032 xmax = 0.472 text = "v" intervals [3]: [...] (This is then repeted to EOF, with intervals[3 to n] for n Item (layer of annotation) in a file. Somebody

Data.table: how to get the blazingly fast subsets it promises and apply to a second data.table

ぃ、小莉子 提交于 2019-12-01 11:16:52
I'm trying to enrich one dataset (adherence) based on subsets from another (lsr). For each individual row in adherence, I want to calculate (as a third column) the medication available for implementing the prescribed regimen. I have a function that returns the relevant result, but it runs for days on just a subset of the total data I have to run it on. The datasets are: library(dplyr) library(tidyr) library(lubridate) library(data.table) adherence <- cbind.data.frame(c("1", "2", "3", "1", "2", "3"), c("2013-01-01", "2013-01-01", "2013-01-01", "2013-02-01", "2013-02-01", "2013-02-01")) names

Dynamic Programing approach for a subset sum

萝らか妹 提交于 2019-12-01 10:50:34
Given the following Input 10 4 3 5 5 7 Where 10 = Total Score 4 = 4 players 3 = Score by player 1 5 = Score by player 2 5 = Score by player 3 7 = Score by player 4 I am to print players who's combine score adds to total so output can be 1 4 because player 1 + player 4 score = 3 + 7 -> 10 or output can be 2 3 because player 2 + player 3 score = 5 + 5 -> 10 So it is quite similar to a subset sum problem. I am relatively new to dynamic programing but after getting help on stackoverflow and reading dynamic programing tutorials online and watch few videos online for past 3 days. The following code