apply

pandas apply() with and without lambda

亡梦爱人 提交于 2019-12-23 16:26:03
问题 What is the rule/process when a function is called with pandas apply() through lambda vs. not? Examples below. Without lambda apparently, the entire series ( df[column name] ) is passed to the "test" function which throws an error trying to do a boolean operation on a series. If the same function is called via lambda it works. Iteration over each row with each passed as "x" and the df[ column name ] returns a single value for that column in the current row. It's like lambda is removing a

Apply function to each cell of matrix in R

杀马特。学长 韩版系。学妹 提交于 2019-12-23 15:56:17
问题 I'm trying to perform a function to each cell of a data table in R, creating a second one based on the result of this loop.. For example, imagine I have Matrix A Ad1 Ad2 Ad3 Ad4 AA 6 0 10 AB 7 10 12 AC 0 0 15 and I'm trying to create Matrix B Ad1 Ad2 Ad3 Ad4 AA 1 0 1 AB 1 0 1 AC 0 0 1 in a way that each cell assumes the value 1 if that cell has a value > 0 AND the sum of the column minus that cell is also greater than 0. For instance, AA~Ad2 is 6 and the sum of the column is 7 (6 + 7 + 0 - 6)

Replacing nested loop in R

痴心易碎 提交于 2019-12-23 12:16:23
问题 I am very new to R and searched about this on forums but couldn't get a close enough solution for this. I am trying to do a mapping between the ip address & corresponding geo locations. I have 2 data sets. Set-a (1,60,000 rows): ip(int) | ID(int) Set-b (16,00,000 rows): Ip1(int) | Ip2(int) | Code(str) | Country(str) | Area1(str) | Area2(str) I am trying to do the following: if ip lies between Ip1 & Ip2 then add Country & Region to Set-a. I am doing the following (obviously not a very good way

Why does apply convert logicals in data frames to strings of 5 characters?

女生的网名这么多〃 提交于 2019-12-23 11:45:15
问题 Suppose I have a data frame: mydf <- data.frame(colA = c(1,20), colB = c("a", "ab"), colC = c(T, F)) Now suppose I want to apply a function to each row on the data frame. This function uses the boolean value of column C. When using apply , every non-string is converted to a string of the maximum length present in the column: > apply(mydf, 1, '[', 3) [1] " TRUE" "FALSE" The string " TRUE" is no longer interpretable as a logical. > ifelse(apply(mydf, 1, '[', 3), 1, 2) [1] NA 2 I could solve

Why does apply convert logicals in data frames to strings of 5 characters?

你说的曾经没有我的故事 提交于 2019-12-23 11:45:02
问题 Suppose I have a data frame: mydf <- data.frame(colA = c(1,20), colB = c("a", "ab"), colC = c(T, F)) Now suppose I want to apply a function to each row on the data frame. This function uses the boolean value of column C. When using apply , every non-string is converted to a string of the maximum length present in the column: > apply(mydf, 1, '[', 3) [1] " TRUE" "FALSE" The string " TRUE" is no longer interpretable as a logical. > ifelse(apply(mydf, 1, '[', 3), 1, 2) [1] NA 2 I could solve

Python Pandas 'apply' returns series; can't convert to dataframe

喜夏-厌秋 提交于 2019-12-23 09:46:28
问题 OK, I'm at half-wit's end. I'm geocoding a dataframe with geopy. I've written a simple function to take an input - country name - and return the latitude and longitude. I use apply to run the function and it returns a Pandas series object. I can't seem to convert it to a dataframe. I'm sure I'm missing something obvious, but I'm new to python and still RTFMing. BTW, the geocoder function works great. # Import libraries import os import pandas as pd import numpy as np from geopy.geocoders

Sliding window on each column of a matrix in R with parallel processing

早过忘川 提交于 2019-12-23 05:41:30
问题 I have a large matrix with 2000 columns and 3000 rows. For each column, I want to do a sliding window where I sum 15 rows together, then go down one row and sum the next 15, etc... and create a new matrix with this information. I have a function that works (although seems a bit slow) but would like to run it in parallel, as this is part of a larger script and if I use apply functions without the parallel equivalent the open cluster shuts down. Moreover, I have to do this whole operation 100

Count how many observations in the rest of the dat fits multiple conditions? (R)

流过昼夜 提交于 2019-12-23 04:26:39
问题 friends, I am new in R programming. I have been trying to write a user-defined function for days but not yet nailed it. This is a dataset called event, containing thousands of events (observations) and I selected several rows to show you the data structure. It contains the "STATEid," "date" of occurrence, and geographical coordinates in two variables "LON" "LAT." I am writing to calculate a new variable (column) for each row. This new variable should be: "Given any specific incident, count

Determine if Value in Final Column exists in respective rows

隐身守侯 提交于 2019-12-23 04:22:34
问题 I have a dataframe as follows: df1 ColA ColB ColC ColD ColE COlF ColG Recs 1 A-1 A - 3 B B NA C 1 B-1 C R D E NA B 1 NA A B A B How do I determine if the last from the column Recs is found in it's respective row? I tried below but it doesn't work because there are duplicates in my normal dataset: df1$Exist <- apply(df1, 1, FUN = function(x) c("No", "Yes")[(anyDuplicated(x[!is.na(x) & x != "" ])!=0) +1]) There are also blanks, NA's, and character values that have spaces and dashes. Final

constructing a function using colnames as variables

人走茶凉 提交于 2019-12-22 17:30:12
问题 I'd like to collect terms under multiple columns of the annot data.frame . Below is the first row of information for a toy datset for annot. colnames(annot) # [1] "HUGO.Name" "Common.Name" "Gene.Class" "Cell.Type" "Annotation" annot[1,] # HUGO.Name Common.Name Gene.Class Cell.Type # 1 CCL1 CCL1 Immune Response - Cell Type specific aDC # Annotation # 1 Cell Type specific, Chemokines and receptors, Inflammatory response So far, I've been writing the colnames iteratively, but I'd like to learn