dataframe | 易学教程

Joining a list of tuples within a pandas dataframe

阅读更多关于 Joining a list of tuples within a pandas dataframe

问题 I want to join a list of tuples within a dataframe. I have tried several methods of doing this within the dataframe with join and with lambda import pandas as pd from nltk import word_tokenize, pos_tag, pos_tag_sents data = {'Categories': ['animal','plant','object'], 'Type': ['tree','dog','rock'], 'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small', 'The rock was found in LA.']} def posTag(data): data = pd.DataFrame(data) comments = data['Comment'].tolist()

pandas dataframe count unique list

阅读更多关于 pandas dataframe count unique list

问题 If the type of a column in dataframe is int , float or string , we can get its unique values with columnName.unique() . But what if this column is a list, e.g. [1, 2, 3]. How could I get the unique of this column? 回答1: I think you can convert values to tuples and then unique works nice: df = pd.DataFrame({'col':[[1,1,2],[2,1,3,3],[1,1,2],[1,1,2]]}) print (df) col 0 [1, 1, 2] 1 [2, 1, 3, 3] 2 [1, 1, 2] 3 [1, 1, 2] print (df['col'].apply(tuple).unique()) [(1, 1, 2) (2, 1, 3, 3)] L = [list(x)

pandas dataframe count unique list

阅读更多关于 pandas dataframe count unique list

Select rows from dataframe with unique combination of values from multiple columns

阅读更多关于 Select rows from dataframe with unique combination of values from multiple columns

问题 I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. Some of the columns are team , opponent_team , date , result , team_runs , opponent_runs , etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row. For example team opponent_team date result team_runs opponent_runs BAL BOS 2010-04-05 W 5 4

Calculating mean of a specific column by specific rows

阅读更多关于 Calculating mean of a specific column by specific rows

问题 I have a dataframe that looks like in the pictures. Now, I want to add a new column that will show the average of power for each day (given the data is sampled every 5 minutes), but separately for when it is day_or_night (day = 0 in the column, night = 1). I've gotten this far: train['avg_by_day'][train['day_or_night']==1] = train['power'][train['day_or_night']==1].mean() train['avg_by_day'][train['day_or_night']==0] = train['power'][train['day_or_night']==0].mean() but this just adds the

Single column with value counts from multiple column dataframe

阅读更多关于 Single column with value counts from multiple column dataframe

问题 I would like to sum the frequencies over multiple columns with pandas. The amount of columns can vary between 2-15 columns. Here is an example of just 3 columns: code1 code2 code3 27 5 56 534 27 78 27 312 55 89 312 27 And I would like to have the following result: code frequency 5 1 27 4 55 1 56 2 78 1 312 2 534 1 To count values inside one column is not the problem, just need a sum of all frequencies in a dataframe a value can appear, no matter the amount of columns. 回答1: You could stack and

Calculating mean of a specific column by specific rows

阅读更多关于 Calculating mean of a specific column by specific rows

How to cbind many data frames with a loop?

阅读更多关于 How to cbind many data frames with a loop?

问题 I have 105 data frames with xts, zoo class and II want to combine their 6th columns into a data frame. So, I created a data frame that contains all the data frame names to use it with a 'for' function: mydata <- AAL for (i in 2:105) { k <- top100[i,1] # The first column contains all the data frame names mydata <- cbind(mydata, k) } It's obviously wrong, but I have no idea either how to cbind so many data frames with completely different names (my data frame names are NASDAQ Symbols) nor how

Pandas: Fill missing values using last available

阅读更多关于 Pandas: Fill missing values using last available

问题 I have a dataframe as follows: A B zDate 01-JAN-17 100 200 02-JAN-17 111 203 03-JAN-17 NaN 202 04-JAN-17 109 205 05-JAN-17 101 211 06-JAN-17 105 NaN 07-JAN-17 104 NaN What is the best way, to fill the missing values, using last available ones? Following is the intended result: A B zDate 01-JAN-17 100 200 02-JAN-17 111 203 03-JAN-17 111 202 04-JAN-17 109 205 05-JAN-17 101 211 06-JAN-17 105 211 07-JAN-17 104 211 回答1: Use ffill function, what is same as fillna with method ffill : df = df.ffill()

day of Year values starting from a particular date

阅读更多关于 day of Year values starting from a particular date

问题 I have a dataframe with a date column. The duration is 365 days starting from 02/11/2017 and ending at 01/11/2018. Date 02/11/2017 03/11/2017 05/11/2017 . . 01/11/2018 I want to add an adjacent column called Day_Of_Year as follows: Date Day_Of_Year 02/11/2017 1 03/11/2017 2 05/11/2017 4 . . 01/11/2018 365 I apologize if it's a very basic question, but unfortunately I haven't been able to start with this. I could use datetime(), but that would return values such as 1 for 1st january, 2 for 2nd