pandas | 易学教程

make upper case and replace space in column dataframe

阅读更多关于 make upper case and replace space in column dataframe

问题 for a specific column of a pandas dataframe I would like to make the elements all uppercase and replace the spaces import pandas as pd df = pd.DataFrame(data=[['AA 123',00],[99,10],['bb 12',10]],columns=['A','B'],index=[0,1,2]) # find elements 'A' that are string temp1 = [isinstance(s, str) for s in df['A'].values] # Make upper case and replace any space temp2 = df['A'][temp1].str.upper() temp2 = temp2.str.replace(r'\s', '') # replace in dataframe df['A'].loc[temp2.index] = temp2.values I get

Call a report from a dictionary of dataframes

阅读更多关于 Call a report from a dictionary of dataframes

问题 I'm my previous question, I have asked how to iterate over multiple csv files (like 100 different files of stocks symbols) and calculate their daily returns at once. I would like to know how to call max/min values for these returns for each file and print a report. Here is the creation of dictionaries as per Mr. Trenton McKinney: import pandas as pd from pathlib import Path # create the path to the files p = Path('c:/Users/<<user_name>>/Documents/stock_files') # get all the files files = p

pd.merge_asof() based on Time-Difference not merging all values - Pandas

阅读更多关于 pd.merge_asof() based on Time-Difference not merging all values - Pandas

问题 I have two dataframes, one with news and the other with stock price. Both the dataframes have a "Date" column. I want to merge them on a gap of 5 days. Lets say my news dataframe is df1 and the other price dataframe as df2. My df1 looks like this: News_Dates News 2018-09-29 Huge blow to ABC Corp. as they lost the 2012 tax case 2018-09-30 ABC Corp. suffers a loss 2018-10-01 ABC Corp to Sell stakes 2018-12-20 We are going to comeback strong said ABC CEO 2018-12-22 Shares are down massively for

pd.merge_asof() based on Time-Difference not merging all values - Pandas

阅读更多关于 pd.merge_asof() based on Time-Difference not merging all values - Pandas

Pandas: find all unique values in one column and normalize all values in another column to their last value

阅读更多关于 Pandas: find all unique values in one column and normalize all values in another column to their last value

How to visualize a list of strings on a colorbar in matplotlib

阅读更多关于 How to visualize a list of strings on a colorbar in matplotlib

问题 I have a dataset like x = 3,4,6,77,3 y = 8,5,2,5,5 labels = "null","exit","power","smile","null" Then I use from matplotlib import pyplot as plt plt.scatter(x,y) colorbar = plt.colorbar(labels) plt.show() to make a scatter plot, but cannot make colorbar showing labels as its colors. How to get this? 回答1: I'm not sure, if it's a good idea to do that for scatter plots in general (you have the same description for different data points, maybe just use some legend here?), but I guess a specific

Selecting first row with groupby and NaN columns

阅读更多关于 Selecting first row with groupby and NaN columns

问题 I'm trying to select the first row of each group of a data frame. import pandas as pd import numpy as np x = [{'id':"a",'val':np.nan, 'val2':-1},{'id':"a",'val':'TREE','val2':15}] df = pd.DataFrame(x) # id val val2 # 0 a NaN -1 # 1 a TREE 15 When I try to do this with groupby , I get df.groupby('id', as_index=False).first() # id val val2 # 0 a TREE -1 The row returned to me is nowhere in the original data frame. Do I need to do something special with NaN values in columns other than the

Pandas merge two DF with rows replacement

阅读更多关于 Pandas merge two DF with rows replacement

问题 I faced with an issue to merge two DF into one and save all duplicate rows by id value from the second DF. Example: df1 = pd.DataFrame({ 'id': ['id1', 'id2', 'id3', 'id4'], 'com': [134.6, 223, 0, 123], 'malicious': [False, False, True, False] }) df2 = pd.DataFrame({ 'id': ['id7', 'id2', 'id5', 'id6'], 'com': [134.6, 27.6, 0, 123], 'malicious': [False, False, False, False] }) df1 id com malicious 0 id1 134.6 False 1 id2 223.0 False 2 id3 0.0 True 3 id4 123.0 False df2 id com malicious date 0

Python - the best way to create a new dataframe from two other dataframes with different shapes?

阅读更多关于 Python - the best way to create a new dataframe from two other dataframes with different shapes?

问题 Essentially, I'm trying to build a new dataframe from two others but the situation is a little complicated and I'm not sure what the best way to do this is. In DF1, each row is data about objects defined by IDs, and it looks something like this: ID Name datafield1 datafield2 1 Foo info1 info2 2 bar info3 info4 3 Foos info5 info6 DF2 has monthly data about each object formatted like this: ID Name Month data 1 Foo 1/20 53.6 1 Foo 2/20 47.2 1 Foo 3/20 12.7 1 Foo 4/20 3.2 2 Bar 1/20 82.2 2 Bar 2

How to print rolling window equation process from pandas dataframe in python?

阅读更多关于 How to print rolling window equation process from pandas dataframe in python?

问题 I created a pandas dataframe sample and it tried to sum for every 3 rows: import pandas as pd import numpy as np d={'A':[100,110,120,175,164,169,155,153,156,200]} df=pd.DataFrame(d) A 0 100 1 110 2 120 3 175 4 164 5 169 6 155 7 153 8 156 9 200 0 NaN 1 NaN 2 330.0 #this is the result tho 3 405.0 4 459.0 5 508.0 6 488.0 7 477.0 8 464.0 9 509.0 Name: sum, dtype: float64 And i want to display the equation process like this: NaN NaN 330.0 = 100+110+120 405.0 = 110+120+175 459.0 . 508.0 . 488.0 .