dataframe

Joining a list of tuples within a pandas dataframe

↘锁芯ラ 提交于 2021-02-05 07:12:05
问题 I want to join a list of tuples within a dataframe. I have tried several methods of doing this within the dataframe with join and with lambda import pandas as pd from nltk import word_tokenize, pos_tag, pos_tag_sents data = {'Categories': ['animal','plant','object'], 'Type': ['tree','dog','rock'], 'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small', 'The rock was found in LA.']} def posTag(data): data = pd.DataFrame(data) comments = data['Comment'].tolist()

pandas dataframe count unique list

眉间皱痕 提交于 2021-02-05 07:00:46
问题 If the type of a column in dataframe is int , float or string , we can get its unique values with columnName.unique() . But what if this column is a list, e.g. [1, 2, 3]. How could I get the unique of this column? 回答1: I think you can convert values to tuples and then unique works nice: df = pd.DataFrame({'col':[[1,1,2],[2,1,3,3],[1,1,2],[1,1,2]]}) print (df) col 0 [1, 1, 2] 1 [2, 1, 3, 3] 2 [1, 1, 2] 3 [1, 1, 2] print (df['col'].apply(tuple).unique()) [(1, 1, 2) (2, 1, 3, 3)] L = [list(x)

pandas dataframe count unique list

馋奶兔 提交于 2021-02-05 07:00:32
问题 If the type of a column in dataframe is int , float or string , we can get its unique values with columnName.unique() . But what if this column is a list, e.g. [1, 2, 3]. How could I get the unique of this column? 回答1: I think you can convert values to tuples and then unique works nice: df = pd.DataFrame({'col':[[1,1,2],[2,1,3,3],[1,1,2],[1,1,2]]}) print (df) col 0 [1, 1, 2] 1 [2, 1, 3, 3] 2 [1, 1, 2] 3 [1, 1, 2] print (df['col'].apply(tuple).unique()) [(1, 1, 2) (2, 1, 3, 3)] L = [list(x)

Select rows from dataframe with unique combination of values from multiple columns

元气小坏坏 提交于 2021-02-05 06:59:05
问题 I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. Some of the columns are team , opponent_team , date , result , team_runs , opponent_runs , etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row. For example team opponent_team date result team_runs opponent_runs BAL BOS 2010-04-05 W 5 4

Calculating mean of a specific column by specific rows

南楼画角 提交于 2021-02-05 06:55:11
问题 I have a dataframe that looks like in the pictures. Now, I want to add a new column that will show the average of power for each day (given the data is sampled every 5 minutes), but separately for when it is day_or_night (day = 0 in the column, night = 1). I've gotten this far: train['avg_by_day'][train['day_or_night']==1] = train['power'][train['day_or_night']==1].mean() train['avg_by_day'][train['day_or_night']==0] = train['power'][train['day_or_night']==0].mean() but this just adds the

Single column with value counts from multiple column dataframe

梦想的初衷 提交于 2021-02-05 06:54:04
问题 I would like to sum the frequencies over multiple columns with pandas. The amount of columns can vary between 2-15 columns. Here is an example of just 3 columns: code1 code2 code3 27 5 56 534 27 78 27 312 55 89 312 27 And I would like to have the following result: code frequency 5 1 27 4 55 1 56 2 78 1 312 2 534 1 To count values inside one column is not the problem, just need a sum of all frequencies in a dataframe a value can appear, no matter the amount of columns. 回答1: You could stack and

Calculating mean of a specific column by specific rows

╄→гoц情女王★ 提交于 2021-02-05 06:52:29
问题 I have a dataframe that looks like in the pictures. Now, I want to add a new column that will show the average of power for each day (given the data is sampled every 5 minutes), but separately for when it is day_or_night (day = 0 in the column, night = 1). I've gotten this far: train['avg_by_day'][train['day_or_night']==1] = train['power'][train['day_or_night']==1].mean() train['avg_by_day'][train['day_or_night']==0] = train['power'][train['day_or_night']==0].mean() but this just adds the

How to cbind many data frames with a loop?

旧时模样 提交于 2021-02-05 06:50:32
问题 I have 105 data frames with xts, zoo class and II want to combine their 6th columns into a data frame. So, I created a data frame that contains all the data frame names to use it with a 'for' function: mydata <- AAL for (i in 2:105) { k <- top100[i,1] # The first column contains all the data frame names mydata <- cbind(mydata, k) } It's obviously wrong, but I have no idea either how to cbind so many data frames with completely different names (my data frame names are NASDAQ Symbols) nor how

Pandas: Fill missing values using last available

荒凉一梦 提交于 2021-02-05 06:45:06
问题 I have a dataframe as follows: A B zDate 01-JAN-17 100 200 02-JAN-17 111 203 03-JAN-17 NaN 202 04-JAN-17 109 205 05-JAN-17 101 211 06-JAN-17 105 NaN 07-JAN-17 104 NaN What is the best way, to fill the missing values, using last available ones? Following is the intended result: A B zDate 01-JAN-17 100 200 02-JAN-17 111 203 03-JAN-17 111 202 04-JAN-17 109 205 05-JAN-17 101 211 06-JAN-17 105 211 07-JAN-17 104 211 回答1: Use ffill function, what is same as fillna with method ffill : df = df.ffill()

day of Year values starting from a particular date

删除回忆录丶 提交于 2021-02-05 06:40:55
问题 I have a dataframe with a date column. The duration is 365 days starting from 02/11/2017 and ending at 01/11/2018. Date 02/11/2017 03/11/2017 05/11/2017 . . 01/11/2018 I want to add an adjacent column called Day_Of_Year as follows: Date Day_Of_Year 02/11/2017 1 03/11/2017 2 05/11/2017 4 . . 01/11/2018 365 I apologize if it's a very basic question, but unfortunately I haven't been able to start with this. I could use datetime(), but that would return values such as 1 for 1st january, 2 for 2nd