pandas | 易学教程

Check if multiple pd.DataFrames are equal

阅读更多关于 Check if multiple pd.DataFrames are equal

问题 Is there a Pythonic way (no loops or recursion) to check if more than two pd.DataFrames (e.g., a list of pd.DataFrames) are equal to each other? 回答1: Something like: all(x.equals(dfs[0]) for x in dfs) with dfs the list of dataframes. This checks if they are all equal to the first - which I think is equivalent to asking if they are all equal to one another. 来源： https://stackoverflow.com/questions/60452817/check-if-multiple-pd-dataframes-are-equal

How to let null values are not stored in HBase in Pandas Python?

阅读更多关于 How to let null values are not stored in HBase in Pandas Python?

问题 I have some sample data as below: test_a test_b test_c test_d test_date ------------------------------------------------- 1 a 500 0.1 111 20191101 2 a NaN 0.2 NaN 20191102 3 a 200 0.1 111 20191103 4 a 400 NaN 222 20191104 5 a NaN 0.2 333 20191105 I would like to let those data store in Hbase, and I use the below code to achieve it. from test.db import impala, hbasecon, HiveClient import pandas as pd sql = """ SELECT test_a ,test_b ,test_c ,test_d ,test_date FROM table_test """ conn_impa =

Check if multiple pd.DataFrames are equal

阅读更多关于 Check if multiple pd.DataFrames are equal

Pandas set_option - more than one option per line

阅读更多关于 Pandas set_option - more than one option per line

问题 This may be a stupid question, but.... when setting options after importing pandas I set them one at a time such as: pd.set_option('max_rows',1000) pd.set_option('notebook_repr_html',False) Is there any way to try and combine them. I tried passing a list of options but no worky. Not a biggy if there is only one way to do it. 回答1: There isn't a native way to do multiple options on one line. I guess you could do something like: [pd.set_option(option, setting) for option, setting in [('max_rows'

Pandas set_option - more than one option per line

阅读更多关于 Pandas set_option - more than one option per line

Plot a time series grouped by id

阅读更多关于 Plot a time series grouped by id

问题 I want to plot a time series grouped by id. So that time is my x-value and 'value' is my y-value. How can I plot x and y grouped by 'id' 1? id time value 1 1 0.3 1 2 0.6 1 3 0.9 2 1 0.1 2 2 0.3 2 3 0.6 3 1 0.2 3 2 0.4 3 3 0.5 回答1: I think you can use pivot with DataFrame.plot.bar or only DataFrame.plot: import matplotlib.pyplot as plt df = df.pivot(index='time', columns='id', values='value') print (df) id 1 2 3 time 1 0.3 0.1 0.2 2 0.6 0.3 0.4 3 0.9 0.6 0.5 df.plot.bar() plt.show() df = df

Count by condition applied to the same column in Pandas

阅读更多关于 Count by condition applied to the same column in Pandas

问题 This is my data frame. acc_index veh_count veh_type 001 1 1 002 2 1 002 2 2 003 2 1 003 2 2 004 1 1 005 2 1 005 2 3 006 1 2 007 2 1 007 2 2 008 2 1 008 2 1 009 3 1 009 3 1 009 3 2 acc_index is unique for each accident veh_count shows how many vehicles are involved in one accident veh_type shows the type of vehicles involved in an accident (1=bicycle, 2=car, 3=bus). What I want to do is to count the number of accidents between cars and bicycles (so, where veh_type=1 and veh_type=9 for the same

Follow-up rolling_apply depreciated

阅读更多关于 Follow-up rolling_apply depreciated

问题 Following up on this answer: Is there a way to do a weight-average rolling sum over a grouping? rsum = pd.rolling_apply(g.values,p,lambda x: np.nansum(w*x),min_periods=p) rolling_apply is depreciated now. How would you change this to work under current functionality. Thank you. 回答1: As of 0.18+, use Series.rolling.apply . w = np.array([0.1,0.1,0.2,0.6]) df.groupby('ID').VALUE.apply( lambda x: x.rolling(window=4).apply(lambda x: np.dot(x, w), raw=False)) 0 NaN 1 NaN 2 NaN 3 146.0 4 166.0 5 NaN

Follow-up rolling_apply depreciated

阅读更多关于 Follow-up rolling_apply depreciated

Pandas Time Series DataFrame Missing Values

阅读更多关于 Pandas Time Series DataFrame Missing Values

问题 I have a dataset of Total Sales from 2008-2015. I have an entry for each day, and so I have a created a pandas DataFrame with a DatetimeIndex and a column for sales. So it looks like this The problem is that I am missing data for most of 2010. These missing values are currently represented by 0.0 so if I plot the DataFrame I get I want to try forecast values for 2016, possibly using an ARIMA model, so the first step I took was to perform a decomposition of this time series Obviously if I