apply

Counting the number of rows of a series of csv files

≡放荡痞女 提交于 2019-12-04 06:54:01
I'm working through an R tutorial and suspect that I have to use one of these functions but I'm not sure which (Yes I researched them but until I become more fluent in R terminology they are quite confusing). In my working directory there is a folder "specdata". Specdata contains hundreds of CSV files named 001.csv - 300.csv. The function I am working on must count the total number of rows for an inputed number of csv files. So if the argument in the function is 1:10 and each of those files has ten rows, return 100. Here's what I have so far: complete <- function(directory,id = 1:332) {

python count how many times a string is present in the entire row of a pandas dataframe

你。 提交于 2019-12-04 05:58:05
问题 I have a question based upon my earlier question. Below code runs fine and it tells me whether the search_string is present in the entire row or not. How could I modify the last line so that it provides me counts of matches instead of 1 or 0? For example, for the first row it should return 4 as my search_string is present in 4 locations in that row. sales = [{'account': 'Jones LLC jones', 'Jan': '150', 'Feb': '200', 'Mar': '140 jones jones'}, {'account': 'Alpha Co', 'Jan': 'Jones', 'Feb':

Find a percentage based on multiple columns of criteria in R

喜你入骨 提交于 2019-12-04 04:52:39
问题 I have multiple columns and I would like to find the percentage of a one column in the other columns are the same. For example; ST cd variable 1 1 23432 1 1 2345 1 2 908890 1 2 350435 1 2 2343432 2 1 9999 2 1 23432 so what I'd like to do is: if ST and cd are the same, then find the percentage of variable for that row over all with the same ST and cd. So in the end it would look like: ST cd variable percentage 1 1 23432 90.90% 1 1 2345 9.10% 1 2 908890 25.30% 1 2 350435 9.48% 1 2 2343432 65.23

mapply over two lists [closed]

混江龙づ霸主 提交于 2019-12-04 04:20:58
问题 This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center. Closed 7 years ago . I recentely asked a question about using an apply function over two lists. Each list is a list of data frames created by splitting a large dataframe. For

Pandas transform() vs apply()

醉酒当歌 提交于 2019-12-04 01:35:39
I don't understand why apply and transform return different dtypes when called on the same data frame. The way I explained the two functions to myself before went something along the lines of " apply collapses the data, and transform does exactly the same thing as apply but preserves the original index and doesn't collapse." Consider the following. df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4], 'cat': [1,1,0,0,1,0,0,0,0,1]}) Let's identify those id s which have a nonzero entry in the cat column. >>> df.groupby('id')['cat'].apply(lambda x: (x == 1).any()) id 1 True 2 True 3 False 4 True Name:

pandas, apply with args which are dataframe row entries

纵饮孤独 提交于 2019-12-04 01:11:28
问题 I have a pandas dataframe 'df' with two columns 'A' and 'B', I have a function with two arguments def myfunction(B, A): # do something here to get the result return result and I would like to apply it row-by-row to df using the 'apply' function df['C'] = df['B'].apply(myfunction, args=(df['A'],)) but I get the error ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). whats happening here, it seems it takes df['A'] as the whole series! not

The value is returned instead of NULL when using function with OUTER APPLY

时光毁灭记忆、已成空白 提交于 2019-12-04 00:34:08
I am getting strange results when using inline function. Here is the code: IF EXISTS ( SELECT * FROM sys.objects AS o WHERE name = 'vendor_relation_users' ) DROP FUNCTION dbo.vendor_relation_users; GO CREATE FUNCTION [dbo].[vendor_relation_users] ( @user_name CHAR(12) ) RETURNS TABLE AS RETURN (SELECT @user_name AS user_name WHERE @user_name NOT LIKE '06%'); GO DECLARE @u CHAR(12) = '066BDLER' SELECT a.user_name, is_v.user_name FROM (SELECT @u AS user_name) a OUTER APPLY [dbo].[vendor_relation_users](@u) AS is_v SELECT a.user_name, is_v.user_name FROM (SELECT @u AS user_name) a OUTER APPLY

Why is pandas.DataFrame.apply printing out junk?

你。 提交于 2019-12-04 00:23:03
问题 Consider this simple dataframe: a b 0 1 2 1 2 3 I perform a .apply as such: In [4]: df.apply(lambda x: [x.values]) Out[4]: a [[140279910807944, 140279910807920]] b [[140279910807944, 140279910807920]] dtype: object In [5]: df.apply(lambda x: [x.values]) Out[5]: a [[37, 37]] b [[37, 37]] dtype: object In [6]: df.apply(lambda x: [x.values]) Out[6]: a [[11, 11]] b [[11, 11]] dtype: object Why is pandas printing out junk each time? I've verified this happens in v0.20. Edit: Looking for an answer,

Apply a list of n functions to each row of a dataframe?

我的未来我决定 提交于 2019-12-03 19:13:36
问题 I have a list of functions funs <- list(fn1 = function(x) x^2, fn2 = function(x) x^3, fn3 = function(x) sin(x), fn4 = function(x) x+1) #in reality these are all f = splinefun() And I have a dataframe: mydata <- data.frame(x1 = c(1, 2, 3, 2), x2 = c(3, 2, 1, 0), x3 = c(1, 2, 2, 3), x4 = c(1, 2, 1, 2)) #actually a 500x15 dataframe of 500 samples from 15 parameters For each of i rows, I would like to evaluate function j on each of the j columns and sum the results: unlist(funs) attach(mydata) a

Convert for loop to apply

人盡茶涼 提交于 2019-12-03 17:35:11
In R, how do you replace the following code using functions like apply , lapply , rapply , do.call , etc.? u <- 10:12 slist <- list() for (i in 1:length(u)) { p <- combn(u, i) for (j in 1:ncol(p)) { s <- paste(p[,j], collapse=",") slist[[s]] <- 0 } } For this part: for (j in 1:ncol(p)) { s <- paste(p[,j], collapse=",") I tried something like: s <- apply(p, 2, function(x) paste(x, collapse=",")) Which works. But then for that slist[[s]] <- 0 part inside that same for-loop, I don't know what to do. Edit: This is what I'm trying to do. For the vector u , I'm producing a list of all the subsets in