pandas | 易学教程

Specifying colors for multiple lines on plot using matplotlib and pandas [duplicate]

阅读更多关于 Specifying colors for multiple lines on plot using matplotlib and pandas [duplicate]

问题 This question already has an answer here : Matplotlib: change the colors of the result of group by (1 answer) Closed 7 months ago . Pandas dataframe groupby plot I have a similar dataframe to the one in the above question, but it has around 8 ticker symbols. I've defined a list of colours called 'colors' that correspond with the tickers, but when I do: df.groupby('ticker')['adj_close'].plot(color=colors) all the lines on the plot for each of the tickers are the same colour (i.e. the first

How to perform a multiple groupby and transform count with a condition in pandas

阅读更多关于 How to perform a multiple groupby and transform count with a condition in pandas

问题 This is an extension of the question here: here I am trying add an extra column to the grouby: # Import pandas library import pandas as pd import numpy as np # data data = [['tom', 10,2,'c',100,'x'], ['tom',16 ,3,'a',100,'x'], ['tom', 22,2,'a',100,'x'], ['matt', 10,1,'c',100,'x'], ['matt', 15,5,'b',100,'x'], ['matt', 14,1,'b',100,'x']] # Create the pandas DataFrame df = pd.DataFrame(data, columns = ['Name', 'Attempts','Score','Category','Rating','Other']) df['AttemptsbyRating'] = df.groupby

Pandas column content to new columns, with other original columns

阅读更多关于 Pandas column content to new columns, with other original columns

问题 A table like below, and I want to make a new table from it (using the values in the 'Color' column). I've tried: import pandas as pd import functools data = {'Seller': ["Mike","Mike","Mike","Mike","David","David","Pete","Pete","Pete"], 'Code' : ["9QBR1","9QBR1","9QBW2","9QBW2","9QD1X","9QD1X","9QEBO","9QEBO","9QEBO"], 'From': ["2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03"], 'Color_date' : ["2020-02-14","2020-02-14","2020

Replacing an unknown number in Pandas data frame with previous number

阅读更多关于 Replacing an unknown number in Pandas data frame with previous number

问题 I have some data frames I am trying to upload to a database. They are lists of values but some of the columns have the string 'null' in them and so this is causing errors. so I would like to use a function to remove these 'null' strings and am trying to use replace to back fill them below: df.replace("null", method = bfill) but it is giving me the error message: ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 I have also tried putting "bfill" instead and it

Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?

阅读更多关于 Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?

问题 I have a database as partially shown below. For each date, there are entries for duration (1-20 per date), with items (100s) listed for each duration. Each item has several associated data points in adjacent columns, including an identifier. For each date, I want to select the largest duration. Then, I want to find the item with a value closest to a given input value. I would like to then obtain the ID for that item to be able to follow the value of this item through its time in the database.

Python Pandas dataframe find missing values

阅读更多关于 Python Pandas dataframe find missing values

问题 I'm trying to find missing values and then drop off missing values. Tried looking for the data online but can't seem to find the answer. Extracted Dataframe: In the df, for 1981 and 1982, it should be '-', i.e. missing values. I would like to find the missing values then drop off the missing values. Exported Dataframe using isnull: I used df.isnull() but in 1981 and 1982, it's detected as 'False' which means there's data. But it should be '-', therefore considered as missing values. I had

How to rank rows by id in Pandas Python

阅读更多关于 How to rank rows by id in Pandas Python

问题 I have a Dataframe like this: id points1 points2 1 44 53 1 76 34 1 63 66 2 23 34 2 44 56 I want output like this: id points1 points2 points1_rank points2_rank 1 44 53 3 2 1 76 34 1 3 1 63 66 2 1 2 23 79 2 1 2 44 56 1 2 Basically, I want to groupby('id') , and find the rank of each column with same id. I tried this: features = ["points1","points2"] df = pd.merge(df, df.groupby('id')[features].rank().reset_index(), suffixes=["", "_rank"], how='left', on=['id']) But I get keyerror 'id' 回答1: You

Sort date in string format in a pandas dataframe?

阅读更多关于 Sort date in string format in a pandas dataframe?

问题 I have a dataframe like this, how to sort this. df = pd.DataFrame({'Date':['Oct20','Nov19','Jan19','Sep20','Dec20']}) Date 0 Oct20 1 Nov19 2 Jan19 3 Sep20 4 Dec20 I familiar in sorting list of dates(string) a.sort(key=lambda date: datetime.strptime(date, "%d-%b-%y")) Any thoughts? Should i split it ? 回答1: First convert column to datetimes and get positions of sorted values by Series.argsort what is used for change ordering with DataFrame.iloc: df = df.iloc[pd.to_datetime(df['Date'], format='

Backfilling columns by groups in Pandas

阅读更多关于 Backfilling columns by groups in Pandas

问题 I have a csv like A,B,C,D 1,2,, 1,2,30,100 1,2,40,100 4,5,, 4,5,60,200 4,5,70,200 8,9,, In row 1 and row 4 C value is missing ( NaN ). I want to take their value from row 2 and 5 respectively. (First occurrence of same A,B value). If no matching row is found, just put 0 (like in last line) Expected op: A,B,C,D 1,2,30, 1,2,30,100 1,2,40,100 4,5,60, 4,5,60,200 4,5,70,200 8,9,0, using fillna I found bfill: use NEXT valid observation to fill gap but the NEXT observation has to be taken logically

Backfilling columns by groups in Pandas

阅读更多关于 Backfilling columns by groups in Pandas