apply | 易学教程

pandas apply() with and without lambda

阅读更多关于 pandas apply() with and without lambda

问题 What is the rule/process when a function is called with pandas apply() through lambda vs. not? Examples below. Without lambda apparently, the entire series ( df[column name] ) is passed to the "test" function which throws an error trying to do a boolean operation on a series. If the same function is called via lambda it works. Iteration over each row with each passed as "x" and the df[ column name ] returns a single value for that column in the current row. It's like lambda is removing a

Apply function to each cell of matrix in R

阅读更多关于 Apply function to each cell of matrix in R

问题 I'm trying to perform a function to each cell of a data table in R, creating a second one based on the result of this loop.. For example, imagine I have Matrix A Ad1 Ad2 Ad3 Ad4 AA 6 0 10 AB 7 10 12 AC 0 0 15 and I'm trying to create Matrix B Ad1 Ad2 Ad3 Ad4 AA 1 0 1 AB 1 0 1 AC 0 0 1 in a way that each cell assumes the value 1 if that cell has a value > 0 AND the sum of the column minus that cell is also greater than 0. For instance, AA~Ad2 is 6 and the sum of the column is 7 (6 + 7 + 0 - 6)

Replacing nested loop in R

阅读更多关于 Replacing nested loop in R

问题 I am very new to R and searched about this on forums but couldn't get a close enough solution for this. I am trying to do a mapping between the ip address & corresponding geo locations. I have 2 data sets. Set-a (1,60,000 rows): ip(int) | ID(int) Set-b (16,00,000 rows): Ip1(int) | Ip2(int) | Code(str) | Country(str) | Area1(str) | Area2(str) I am trying to do the following: if ip lies between Ip1 & Ip2 then add Country & Region to Set-a. I am doing the following (obviously not a very good way

Why does apply convert logicals in data frames to strings of 5 characters?

阅读更多关于 Why does apply convert logicals in data frames to strings of 5 characters?

问题 Suppose I have a data frame: mydf <- data.frame(colA = c(1,20), colB = c("a", "ab"), colC = c(T, F)) Now suppose I want to apply a function to each row on the data frame. This function uses the boolean value of column C. When using apply , every non-string is converted to a string of the maximum length present in the column: > apply(mydf, 1, '[', 3) [1] " TRUE" "FALSE" The string " TRUE" is no longer interpretable as a logical. > ifelse(apply(mydf, 1, '[', 3), 1, 2) [1] NA 2 I could solve

Why does apply convert logicals in data frames to strings of 5 characters?

阅读更多关于 Why does apply convert logicals in data frames to strings of 5 characters?

Python Pandas 'apply' returns series; can't convert to dataframe

阅读更多关于 Python Pandas 'apply' returns series; can't convert to dataframe

问题 OK, I'm at half-wit's end. I'm geocoding a dataframe with geopy. I've written a simple function to take an input - country name - and return the latitude and longitude. I use apply to run the function and it returns a Pandas series object. I can't seem to convert it to a dataframe. I'm sure I'm missing something obvious, but I'm new to python and still RTFMing. BTW, the geocoder function works great. # Import libraries import os import pandas as pd import numpy as np from geopy.geocoders

Sliding window on each column of a matrix in R with parallel processing

阅读更多关于 Sliding window on each column of a matrix in R with parallel processing

问题 I have a large matrix with 2000 columns and 3000 rows. For each column, I want to do a sliding window where I sum 15 rows together, then go down one row and sum the next 15, etc... and create a new matrix with this information. I have a function that works (although seems a bit slow) but would like to run it in parallel, as this is part of a larger script and if I use apply functions without the parallel equivalent the open cluster shuts down. Moreover, I have to do this whole operation 100

Count how many observations in the rest of the dat fits multiple conditions? (R)

阅读更多关于 Count how many observations in the rest of the dat fits multiple conditions? (R)

问题 friends, I am new in R programming. I have been trying to write a user-defined function for days but not yet nailed it. This is a dataset called event, containing thousands of events (observations) and I selected several rows to show you the data structure. It contains the "STATEid," "date" of occurrence, and geographical coordinates in two variables "LON" "LAT." I am writing to calculate a new variable (column) for each row. This new variable should be: "Given any specific incident, count

Determine if Value in Final Column exists in respective rows

阅读更多关于 Determine if Value in Final Column exists in respective rows

问题 I have a dataframe as follows: df1 ColA ColB ColC ColD ColE COlF ColG Recs 1 A-1 A - 3 B B NA C 1 B-1 C R D E NA B 1 NA A B A B How do I determine if the last from the column Recs is found in it's respective row? I tried below but it doesn't work because there are duplicates in my normal dataset: df1$Exist <- apply(df1, 1, FUN = function(x) c("No", "Yes")[(anyDuplicated(x[!is.na(x) & x != "" ])!=0) +1]) There are also blanks, NA's, and character values that have spaces and dashes. Final

constructing a function using colnames as variables

阅读更多关于 constructing a function using colnames as variables

问题 I'd like to collect terms under multiple columns of the annot data.frame . Below is the first row of information for a toy datset for annot. colnames(annot) # [1] "HUGO.Name" "Common.Name" "Gene.Class" "Cell.Type" "Annotation" annot[1,] # HUGO.Name Common.Name Gene.Class Cell.Type # 1 CCL1 CCL1 Immune Response - Cell Type specific aDC # Annotation # 1 Cell Type specific, Chemokines and receptors, Inflammatory response So far, I've been writing the colnames iteratively, but I'd like to learn