pandas

make upper case and replace space in column dataframe

走远了吗. 提交于 2021-02-17 06:02:51
问题 for a specific column of a pandas dataframe I would like to make the elements all uppercase and replace the spaces import pandas as pd df = pd.DataFrame(data=[['AA 123',00],[99,10],['bb 12',10]],columns=['A','B'],index=[0,1,2]) # find elements 'A' that are string temp1 = [isinstance(s, str) for s in df['A'].values] # Make upper case and replace any space temp2 = df['A'][temp1].str.upper() temp2 = temp2.str.replace(r'\s', '') # replace in dataframe df['A'].loc[temp2.index] = temp2.values I get

Call a report from a dictionary of dataframes

☆樱花仙子☆ 提交于 2021-02-17 05:48:08
问题 I'm my previous question, I have asked how to iterate over multiple csv files (like 100 different files of stocks symbols) and calculate their daily returns at once. I would like to know how to call max/min values for these returns for each file and print a report. Here is the creation of dictionaries as per Mr. Trenton McKinney: import pandas as pd from pathlib import Path # create the path to the files p = Path('c:/Users/<<user_name>>/Documents/stock_files') # get all the files files = p

pd.merge_asof() based on Time-Difference not merging all values - Pandas

*爱你&永不变心* 提交于 2021-02-17 05:38:10
问题 I have two dataframes, one with news and the other with stock price. Both the dataframes have a "Date" column. I want to merge them on a gap of 5 days. Lets say my news dataframe is df1 and the other price dataframe as df2. My df1 looks like this: News_Dates News 2018-09-29 Huge blow to ABC Corp. as they lost the 2012 tax case 2018-09-30 ABC Corp. suffers a loss 2018-10-01 ABC Corp to Sell stakes 2018-12-20 We are going to comeback strong said ABC CEO 2018-12-22 Shares are down massively for

pd.merge_asof() based on Time-Difference not merging all values - Pandas

痞子三分冷 提交于 2021-02-17 05:36:34
问题 I have two dataframes, one with news and the other with stock price. Both the dataframes have a "Date" column. I want to merge them on a gap of 5 days. Lets say my news dataframe is df1 and the other price dataframe as df2. My df1 looks like this: News_Dates News 2018-09-29 Huge blow to ABC Corp. as they lost the 2012 tax case 2018-09-30 ABC Corp. suffers a loss 2018-10-01 ABC Corp to Sell stakes 2018-12-20 We are going to comeback strong said ABC CEO 2018-12-22 Shares are down massively for

Pandas: find all unique values in one column and normalize all values in another column to their last value

吃可爱长大的小学妹 提交于 2021-02-17 05:25:07
问题 Problem I want to find all unique values in one column and normalize the corresponding values in another column to its last value. I want to achieve this via the pandas module using python3 . Example: Original dataset Fruit | Amount Orange | 90 Orange | 80 Orange | 10 Apple | 100 Apple | 50 Orange | 20 Orange | 60 --> latest value of Orange. Use to normalize Orange Apple | 75 Apple | 25 Apple | 40 --> latest value of Apple. Used to normalize Apple Desired output Ratio column with normalized

How to visualize a list of strings on a colorbar in matplotlib

别来无恙 提交于 2021-02-17 05:20:11
问题 I have a dataset like x = 3,4,6,77,3 y = 8,5,2,5,5 labels = "null","exit","power","smile","null" Then I use from matplotlib import pyplot as plt plt.scatter(x,y) colorbar = plt.colorbar(labels) plt.show() to make a scatter plot, but cannot make colorbar showing labels as its colors. How to get this? 回答1: I'm not sure, if it's a good idea to do that for scatter plots in general (you have the same description for different data points, maybe just use some legend here?), but I guess a specific

Selecting first row with groupby and NaN columns

拥有回忆 提交于 2021-02-17 05:19:40
问题 I'm trying to select the first row of each group of a data frame. import pandas as pd import numpy as np x = [{'id':"a",'val':np.nan, 'val2':-1},{'id':"a",'val':'TREE','val2':15}] df = pd.DataFrame(x) # id val val2 # 0 a NaN -1 # 1 a TREE 15 When I try to do this with groupby , I get df.groupby('id', as_index=False).first() # id val val2 # 0 a TREE -1 The row returned to me is nowhere in the original data frame. Do I need to do something special with NaN values in columns other than the

Pandas merge two DF with rows replacement

半腔热情 提交于 2021-02-17 05:18:25
问题 I faced with an issue to merge two DF into one and save all duplicate rows by id value from the second DF. Example: df1 = pd.DataFrame({ 'id': ['id1', 'id2', 'id3', 'id4'], 'com': [134.6, 223, 0, 123], 'malicious': [False, False, True, False] }) df2 = pd.DataFrame({ 'id': ['id7', 'id2', 'id5', 'id6'], 'com': [134.6, 27.6, 0, 123], 'malicious': [False, False, False, False] }) df1 id com malicious 0 id1 134.6 False 1 id2 223.0 False 2 id3 0.0 True 3 id4 123.0 False df2 id com malicious date 0

Python - the best way to create a new dataframe from two other dataframes with different shapes?

女生的网名这么多〃 提交于 2021-02-17 05:13:20
问题 Essentially, I'm trying to build a new dataframe from two others but the situation is a little complicated and I'm not sure what the best way to do this is. In DF1, each row is data about objects defined by IDs, and it looks something like this: ID Name datafield1 datafield2 1 Foo info1 info2 2 bar info3 info4 3 Foos info5 info6 DF2 has monthly data about each object formatted like this: ID Name Month data 1 Foo 1/20 53.6 1 Foo 2/20 47.2 1 Foo 3/20 12.7 1 Foo 4/20 3.2 2 Bar 1/20 82.2 2 Bar 2

How to print rolling window equation process from pandas dataframe in python?

和自甴很熟 提交于 2021-02-17 05:11:24
问题 I created a pandas dataframe sample and it tried to sum for every 3 rows: import pandas as pd import numpy as np d={'A':[100,110,120,175,164,169,155,153,156,200]} df=pd.DataFrame(d) A 0 100 1 110 2 120 3 175 4 164 5 169 6 155 7 153 8 156 9 200 0 NaN 1 NaN 2 330.0 #this is the result tho 3 405.0 4 459.0 5 508.0 6 488.0 7 477.0 8 464.0 9 509.0 Name: sum, dtype: float64 And i want to display the equation process like this: NaN NaN 330.0 = 100+110+120 405.0 = 110+120+175 459.0 . 508.0 . 488.0 .