apply | 易学教程

constructing a function using colnames as variables

阅读更多关于 constructing a function using colnames as variables

问题 I'd like to collect terms under multiple columns of the annot data.frame . Below is the first row of information for a toy datset for annot. colnames(annot) # [1] "HUGO.Name" "Common.Name" "Gene.Class" "Cell.Type" "Annotation" annot[1,] # HUGO.Name Common.Name Gene.Class Cell.Type # 1 CCL1 CCL1 Immune Response - Cell Type specific aDC # Annotation # 1 Cell Type specific, Chemokines and receptors, Inflammatory response So far, I've been writing the colnames iteratively, but I'd like to learn

Conditional subsetting by POSIXct interval and another field containing interval

阅读更多关于 Conditional subsetting by POSIXct interval and another field containing interval

问题 Given a dataset Dat where I have species (SP), Area (AR), and Time (TM) (in POSIXct). I want to subset the data for individuals that were present with Species A, within a half hour prior and after it was recorded, and within the same area, including two adjacent areas (+ and - 1). For example, if species A was present at 1:00 on area 4, I wish to subset all species present from 12:30 to 1:30 in the same day in areas 3,4 and 5. As an example: SP TM AR B 1-jan-03 07:22 1 F 1-jan-03 09:22 4 A 1

Using lapply and which to subset dataframe by both characteristic and fuction

阅读更多关于 Using lapply and which to subset dataframe by both characteristic and fuction

问题 I have a dataframe with 5 dimensions of data that looks like this: > dim(alldata) [1] 162 6 > head(alldata) value layer Kmultiplier Resolution Season Variable 1: 0.01308008 b .01K 1km Baseflow Evapotranspiration 2: 0.03974779 b .01K 1km Peak Flow Evapotranspiration 3: 0.02396524 b .01K 1km Summer Flow Evapotranspiration 4: -0.15670996 b .01K 1km Baseflow Discharge 5: 0.06774948 b .01K 1km Peak Flow Discharge 6: -0.04138313 b .01K 1km Summer Flow Discharge What I'd like to do is get the mean

Rolling percentage add along column

阅读更多关于 Rolling percentage add along column

问题 I feel this should be easy in base R but I just can't figure it out. I have a simple dataframe, let's say it looks like this tbl <- read.table(text = "Field1 Field2 100 200 150 180 200 160 280 250 300 300 300 250", header = TRUE) Now, what I want to do is create a function that will apply a rolling % addition, something like: fn <- function(tbl, pct) {} which accepts the dataframe above as tbl . It adds a percentage fraction of the current row to the NEXT row down based on pct , and rolls

Improve efficiency for removing duplicate values per row and shift values in R

阅读更多关于 Improve efficiency for removing duplicate values per row and shift values in R

问题 I have a huge dataset ( > 2.5 Million). A small subset looks like this (code reproducible) temp <- data.frame(list(col1 = c("424", "560", "557"), col2 = c("276", "427", "V46"), col3 = c("780", "V45", "584"), col4 = c("276", "V45", "995"), col5 = c("428", "799", "427"))) > temp col1 col2 col3 col4 col5 1 424 276 780 276 428 2 560 427 V45 V45 799 3 557 V46 584 995 427 I am trying to remove duplicates per row, and shifting values left, using this code library(plyr) temp <- apply(temp,1,function

Why does function apply complain about long lists?

阅读更多关于 Why does function apply complain about long lists?

问题 As part of some Eulerian travails, I'm trying to code a Sieve of Eratosthenes with a factorization wheel. My code so far is: (defun ring (&rest content) "Returns a circular list containing the elements in content. The returned list starts with the first element of content." (setf (cdr (last content)) content)) (defun factorization-wheel (lst) "Returns a circular list containing a factorization wheel using the list of prime numbers in lst" (let ((circumference (apply #'* lst))) (loop for i

3D array -> apply -> 3D array

阅读更多关于 3D array -> apply -> 3D array

问题 It seems apply will not re-assemble 3D arrays when operating on just one margin. Consider: arr <- array( runif(2*4*3), dim=c(2, 4, 3), dimnames=list(a=paste0("a", 1:2), b=paste0("b", 1:4), c=paste0("c", 1:3)) ) # , , c = c1 # # b # a b1 b2 b3 b4 # a1 0.7321399 0.8851802 0.2469866 0.9307044 # a2 0.5896138 0.6183046 0.7732842 0.6652637 # # , , c = c2 # b # a b1 b2 b3 b4 # a1 0.5894680 0.7839048 0.3854357 0.56555024 # a2 0.6158995 0.6530224 0.8401427 0.04044974 # # , , c = c3 # b # a b1 b2 b3 b4

Parallelize pandas apply

阅读更多关于 Parallelize pandas apply

问题 New to pandas, I already want to parallelize a row-wise apply operation. So far I found Parallelize apply after pandas groupby However, that only seems to work for grouped data frames. My use case is different: I have a list of holidays and for my current row/date want to find the no-of-days before and after this day to the next holiday. This is the function I call via apply: def get_nearest_holiday(x, pivot): nearestHoliday = min(x, key=lambda x: abs(x- pivot)) difference = abs(nearesHoliday

Add 2 new columns to existing dataframe using apply

阅读更多关于 Add 2 new columns to existing dataframe using apply

问题 I want to use the apply function that: - Takes 2 columns as inputs - Outputs two new columns based on a function. An example is with this add_multiply function. #function with 2 column inputs and 2 outputs def add_multiply (a,b): return (a+b, a*b ) #example dataframe df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) #this doesn't work df[['add', 'multiply']] = df.apply(lambda x: add_multiply(x['col1'], x['col2']), axis=1) ideal result: col1 col2 add multiply 1 3 4 3 2 4 6 8 回答1: You can add

R - How to vectorize with apply family function and avoid while/for loops in this case?

阅读更多关于 R - How to vectorize with apply family function and avoid while/for loops in this case?

问题 In this case (more details could be found in this question: Count how many observations in the rest of the dat fits multiple conditions? (R)) This is a dataset called event, containing thousands of events (observations) and I selected several rows to show you the data structure. It contains the "STATEid", "date" of occurrence, and geographical coordinates in two variables "LON" "LAT". I am writing to calculate a new variable (column) for each row. This new variable should be: "Given any