dataframe | 易学教程

Passing column name as parameter to a function using dplyr

阅读更多关于 Passing column name as parameter to a function using dplyr

问题 I have a dataframe like below : transid<-c(1,2,3,4,5,6,7,8) accountid<-c(a,a,b,a,b,b,a,b) month<-c(1,1,1,2,2,3,3,3) amount<-c(10,20,30,40,50,60,70,80) transactions<-data.frame(transid,accountid,month,amount) I am trying to write function for total monthly amount for each accountid using dplyr package verbs. my_sum<-function(df,col1,col2,col3){ df %>% group_by_(col1,col2) %>%summarise_(total_sum = sum(col3)) } my_sum(transactions, "accountid","month","amount") To get the result like below:

In a dataframe, I want to compare column A and column B and extract value in which A >= B?

阅读更多关于 In a dataframe, I want to compare column A and column B and extract value in which A >= B?

问题 Cars A B Honda 5 3 Kia 7 5 BMW 4 8 Mazda 6 10 Hyundai 15 12 Lexus 22 19 Toyota 40 50 Jeep 60 50 The above figure is my dataframe. From this i want to compare column A with column B and extract values in A which are greater or equals to B (A>=B). I tried to solve this by using function pmax(Cars$A,Cars$B) But it gave me this result - 5,7,8,10,15,22,50,60 The result I want - 5,7,15,22,60 回答1: pmax is the parallel maximun, from ?pmax Returns the (regular or p arallel) maxima and minima of the

In a dataframe, I want to compare column A and column B and extract value in which A >= B?

阅读更多关于 In a dataframe, I want to compare column A and column B and extract value in which A >= B?

Passing column name as parameter to a function using dplyr

阅读更多关于 Passing column name as parameter to a function using dplyr

Combine multiple dictionaries into one pandas dataframe in long format

阅读更多关于 Combine multiple dictionaries into one pandas dataframe in long format

问题 I have several dictionaries set up as follows: Dict1 = {'Orange': ['1', '2', '3', '4']} Dict2 = {'Red': ['3', '4', '5']} And I'd like the output to be one combined dataframe: | Type | Value | |--------------| |Orange| 1 | |Orange| 2 | |Orange| 3 | |Orange| 4 | | Red | 3 | | Red | 4 | | Red | 5 | I tried splitting everything out but I only get Dict2 in this dataframe. mydicts = [Dict1, Dict2] for x in mydicts: for k, v in x.items(): df = pd.DataFrame(v) df['Type'] = k 回答1: One option is using

Split, apply, and combine multiple data frames into one data frame

阅读更多关于 Split, apply, and combine multiple data frames into one data frame

问题 I have completed an origin-destination cost matrix (23 origins, ~600,000 destinations) for traveling through a street network in ArcGIS and disaggregated the resulting matrix into DBF tables by store ID using a Python script. I have loaded each DBF table into an R session as follows: # Import OD cost matrix results for each store origins <- read.dbf('ODM_origins.dbf') store_17318 <- read.dbf('table_17318.dbf') store_17358 <- read.dbf('table_17358.dbf') store_17601 <- read.dbf('table_17601.dbf

I have to compare data from each row of a Pandas DataFrame with data from the rest of the rows, is there a way to speed up the computation?

阅读更多关于 I have to compare data from each row of a Pandas DataFrame with data from the rest of the rows, is there a way to speed up the computation?

问题 Let's say I have a pandas DataFrame (loaded from a csv file) with this structure (the number of var and err columns is not fixed, and it varies from file to file): var_0; var_1; var_2; 32; 9; 41; 47; 22; 41; 15; 12; 32; 3; 4; 4; 10; 9; 41; 43; 21; 45; 32; 14; 32; 51; 20; 40; Let's discard the err_ds_j and the err_mean columns for the sake of this question. I have to perform an automatic comparison of the values of each row, with the values of the other rows; as an example: I have to compare

How to sort data frame by column values?

阅读更多关于 How to sort data frame by column values?

问题 I am relatively new to python and pandas data frames so maybe I have missed something very easy here. So I was having data frame with many rows and columns but at the end finally manage to get only one row with maximum value from each column. I used this code to do that: import pandas as pd d = {'A' : [1.2, 2, 4, 6], 'B' : [2, 8, 10, 12], 'C' : [5, 3, 4, 5], 'D' : [3.5, 9, 1, 11], 'E' : [5, 8, 7.5, 3], 'F' : [8.8, 4, 3, 2]} df = pd.DataFrame(d, index=['a', 'b', 'c', 'd']) print df Out: A B C

How to sort data frame by column values?

阅读更多关于 How to sort data frame by column values?

Pandas: Sort innermost column group-wise based on other multilevel column

阅读更多关于 Pandas: Sort innermost column group-wise based on other multilevel column

问题 Consider below df: In [3771]: df = pd.DataFrame({'A': ['a'] * 11, 'B': ['b'] * 11, 'C': ['C1', 'C1', 'C2','C1', 'C3', 'C3', 'C2', 'C3', 'C3', 'C2', 'C2'], 'D': ['D1', 'D2', 'D1', 'D3', 'D3', 'D2', 'D4', 'D4', 'D1', 'D2', 'D3'], 'E': [{'value': '4', 'percentage': None}, {'value': 5, 'percentage': None}, {'value': 12, 'percentage': None}, {'value': 5, 'percentage': None}, {'value': '12', 'percentage': None}, {'value': 'N/A', 'percentage': None}, {}, {'value': 19, 'percentage': None}, {'value':