pandas

Specifying colors for multiple lines on plot using matplotlib and pandas [duplicate]

六眼飞鱼酱① 提交于 2021-02-16 20:48:12
问题 This question already has an answer here : Matplotlib: change the colors of the result of group by (1 answer) Closed 7 months ago . Pandas dataframe groupby plot I have a similar dataframe to the one in the above question, but it has around 8 ticker symbols. I've defined a list of colours called 'colors' that correspond with the tickers, but when I do: df.groupby('ticker')['adj_close'].plot(color=colors) all the lines on the plot for each of the tickers are the same colour (i.e. the first

How to perform a multiple groupby and transform count with a condition in pandas

佐手、 提交于 2021-02-16 20:48:07
问题 This is an extension of the question here: here I am trying add an extra column to the grouby: # Import pandas library import pandas as pd import numpy as np # data data = [['tom', 10,2,'c',100,'x'], ['tom',16 ,3,'a',100,'x'], ['tom', 22,2,'a',100,'x'], ['matt', 10,1,'c',100,'x'], ['matt', 15,5,'b',100,'x'], ['matt', 14,1,'b',100,'x']] # Create the pandas DataFrame df = pd.DataFrame(data, columns = ['Name', 'Attempts','Score','Category','Rating','Other']) df['AttemptsbyRating'] = df.groupby

Pandas column content to new columns, with other original columns

我的梦境 提交于 2021-02-16 20:27:29
问题 A table like below, and I want to make a new table from it (using the values in the 'Color' column). I've tried: import pandas as pd import functools data = {'Seller': ["Mike","Mike","Mike","Mike","David","David","Pete","Pete","Pete"], 'Code' : ["9QBR1","9QBR1","9QBW2","9QBW2","9QD1X","9QD1X","9QEBO","9QEBO","9QEBO"], 'From': ["2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03"], 'Color_date' : ["2020-02-14","2020-02-14","2020

Replacing an unknown number in Pandas data frame with previous number

非 Y 不嫁゛ 提交于 2021-02-16 20:22:25
问题 I have some data frames I am trying to upload to a database. They are lists of values but some of the columns have the string 'null' in them and so this is causing errors. so I would like to use a function to remove these 'null' strings and am trying to use replace to back fill them below: df.replace("null", method = bfill) but it is giving me the error message: ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 I have also tried putting "bfill" instead and it

Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?

三世轮回 提交于 2021-02-16 20:22:17
问题 I have a database as partially shown below. For each date, there are entries for duration (1-20 per date), with items (100s) listed for each duration. Each item has several associated data points in adjacent columns, including an identifier. For each date, I want to select the largest duration. Then, I want to find the item with a value closest to a given input value. I would like to then obtain the ID for that item to be able to follow the value of this item through its time in the database.

Python Pandas dataframe find missing values

余生颓废 提交于 2021-02-16 20:18:15
问题 I'm trying to find missing values and then drop off missing values. Tried looking for the data online but can't seem to find the answer. Extracted Dataframe: In the df, for 1981 and 1982, it should be '-', i.e. missing values. I would like to find the missing values then drop off the missing values. Exported Dataframe using isnull: I used df.isnull() but in 1981 and 1982, it's detected as 'False' which means there's data. But it should be '-', therefore considered as missing values. I had

How to rank rows by id in Pandas Python

隐身守侯 提交于 2021-02-16 20:13:25
问题 I have a Dataframe like this: id points1 points2 1 44 53 1 76 34 1 63 66 2 23 34 2 44 56 I want output like this: id points1 points2 points1_rank points2_rank 1 44 53 3 2 1 76 34 1 3 1 63 66 2 1 2 23 79 2 1 2 44 56 1 2 Basically, I want to groupby('id') , and find the rank of each column with same id. I tried this: features = ["points1","points2"] df = pd.merge(df, df.groupby('id')[features].rank().reset_index(), suffixes=["", "_rank"], how='left', on=['id']) But I get keyerror 'id' 回答1: You

Sort date in string format in a pandas dataframe?

懵懂的女人 提交于 2021-02-16 20:06:58
问题 I have a dataframe like this, how to sort this. df = pd.DataFrame({'Date':['Oct20','Nov19','Jan19','Sep20','Dec20']}) Date 0 Oct20 1 Nov19 2 Jan19 3 Sep20 4 Dec20 I familiar in sorting list of dates(string) a.sort(key=lambda date: datetime.strptime(date, "%d-%b-%y")) Any thoughts? Should i split it ? 回答1: First convert column to datetimes and get positions of sorted values by Series.argsort what is used for change ordering with DataFrame.iloc: df = df.iloc[pd.to_datetime(df['Date'], format='

Backfilling columns by groups in Pandas

不打扰是莪最后的温柔 提交于 2021-02-16 20:06:38
问题 I have a csv like A,B,C,D 1,2,, 1,2,30,100 1,2,40,100 4,5,, 4,5,60,200 4,5,70,200 8,9,, In row 1 and row 4 C value is missing ( NaN ). I want to take their value from row 2 and 5 respectively. (First occurrence of same A,B value). If no matching row is found, just put 0 (like in last line) Expected op: A,B,C,D 1,2,30, 1,2,30,100 1,2,40,100 4,5,60, 4,5,60,200 4,5,70,200 8,9,0, using fillna I found bfill: use NEXT valid observation to fill gap but the NEXT observation has to be taken logically

Backfilling columns by groups in Pandas

可紊 提交于 2021-02-16 20:06:11
问题 I have a csv like A,B,C,D 1,2,, 1,2,30,100 1,2,40,100 4,5,, 4,5,60,200 4,5,70,200 8,9,, In row 1 and row 4 C value is missing ( NaN ). I want to take their value from row 2 and 5 respectively. (First occurrence of same A,B value). If no matching row is found, just put 0 (like in last line) Expected op: A,B,C,D 1,2,30, 1,2,30,100 1,2,40,100 4,5,60, 4,5,60,200 4,5,70,200 8,9,0, using fillna I found bfill: use NEXT valid observation to fill gap but the NEXT observation has to be taken logically