pandas | 易学教程

Pandas DataFrame iloc spoils the data type

阅读更多关于 Pandas DataFrame iloc spoils the data type

问题 Having pandas 0.19.2. Here's an example: testdf = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [1.0, 2.0, 3.0, 4.0]}) testdf.dtypes Output: A int64 B float64 dtype: object Everything looks fine for now, but what I don't like is that (note, that first call is a pd.Series.iloc and the second one is pd.DataFrame.iloc ) print(type(testdf.A.iloc[0])) print(type(testdf.iloc[0].A)) Output: <class 'numpy.int64'> <class 'numpy.float64'> I found it while trying to understand why pd.DataFrame.join() operation

Remove Header and Footer from Pandas Dataframe print

阅读更多关于 Remove Header and Footer from Pandas Dataframe print

问题 The following code prints all the values I want but has "Date" as the first row and "Name: Close, Length: 1828, dtype: float64" as the last row import pandas as pd from pandas.io.data import DataReader from datetime import datetime ibm = DataReader('IBM', 'yahoo', datetime(2009,1,1)) pd.set_option('display.max_rows',len(ibm)) print ibm["Close"] How do I print the data w/o this first "Date" line and the last "Name: Close, Length: 1828, dtype:float64" line? Slicing doesn't work, I've tried

How can I delete rows for a particular Date in a Pandas dataframe?

阅读更多关于 How can I delete rows for a particular Date in a Pandas dataframe?

问题 I've got a Pandas DataFrame using Date as an index. How can I drop all rows that have the date "2000-01-06"? Sample code: import numpy as np import pandas as pd dates = pd.date_range('1/1/2000', periods=8) df = pd.DataFrame(np.random.randn(8, 3), index=dates, columns=['A', 'B', 'C']) df.index.name = 'Date' Example DataFrame: A B C Date 2000-01-01 -0.501547 -0.227375 0.275931 2000-01-02 0.994459 1.266288 -0.178603 2000-01-03 -0.982746 -0.339685 0.157390 2000-01-04 -1.013987 -1.074076 -2.346117

How to perform a cumulative sum of distinct values in pandas dataframe

阅读更多关于 How to perform a cumulative sum of distinct values in pandas dataframe

问题 I have a dataframe like this: id date company ...... 123 2019-01-01 A 224 2019-01-01 B 345 2019-01-01 B 987 2019-01-03 C 334 2019-01-03 C 908 2019-01-04 C 765 2019-01-04 A 554 2019-01-05 A 482 2019-01-05 D and I want to get the cumulative number of unique values over time for the 'company' column. So if a company appears at a later date they are not counted again. My expected output is: date cumulative_count 2019-01-01 2 2019-01-03 3 2019-01-04 3 2019-01-05 4 I've tried: df.groupby(['date'])

How to perform a cumulative sum of distinct values in pandas dataframe

阅读更多关于 How to perform a cumulative sum of distinct values in pandas dataframe

Load pandas dataframe with chunksize determined by column variable

阅读更多关于 Load pandas dataframe with chunksize determined by column variable

问题 If I have a csv file that's too large to load into memory with pandas (in this case 35gb), I know it's possible to process the file in chunks, with chunksize. However I want to know if it's possible to change chunksize based on values in a column. I have an ID column, and then several rows for each ID with information, like this: ID, Time, x, y sasd, 10:12, 1, 3 sasd, 10:14, 1, 4 sasd, 10:32, 1, 2 cgfb, 10:02, 1, 6 cgfb, 10:13, 1, 3 aenr, 11:54, 2, 5 tory, 10:27, 1, 3 tory, 10:48, 3, 5 ect...

Load pandas dataframe with chunksize determined by column variable

阅读更多关于 Load pandas dataframe with chunksize determined by column variable

With `pandas.cut()`, how do I get integer bins and avoid getting a negative lowest bound?

阅读更多关于 With `pandas.cut()`, how do I get integer bins and avoid getting a negative lowest bound?

问题 My dataframe has zero as the lowest value. I am trying to use the precision and include_lowest parameters of pandas.cut() , but I can't get the intervals consist of integers rather than floats with one decimal. I can also not get the left most interval to stop at zero. import pandas as pd import seaborn as sns import matplotlib.pyplot as plt sns.set(style='white', font_scale=1.3) df = pd.DataFrame(range(0,389,8)[:-1], columns=['value']) df['binned_df_pd'] = pd.cut(df.value, bins=7, precision

Python Pandas Expand a Column of List of Lists to Two New Column

阅读更多关于 Python Pandas Expand a Column of List of Lists to Two New Column

问题 I have a DF which looks like this. name id apps john 1 [[app1, v1], [app2, v2], [app3,v3]] smith 2 [[app1, v1], [app4, v4]] I want to expand the apps column such that it looks like this. name id app_name app_version john 1 app1 v1 john 1 app2 v2 john 1 app3 v3 smith 2 app1 v1 smith 2 app4 v4 Any help is appreciated 回答1: You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe. import pandas as pd df = pd.DataFrame({ 'name': ['john

'Could not interpret input' error with Seaborn when plotting groupbys

阅读更多关于 'Could not interpret input' error with Seaborn when plotting groupbys

问题 Say I have this dataframe d = { 'Path' : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'], 'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'], 'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'], 'Value' : [30, 20, 10, 40, 40, 50], 'Field' : [50, 70, 10, 20, 30, 30] } df = DataFrame(d) df.set_index(['Path', 'Detail'], inplace=True) df Field Program Value Path Detail abc foo 50 prog1 30 bar 70 prog1 20 ghi bar 10 prog1 10 foo 20 prog2 40 jkl foo 30 prog3 40 foo 30 prog3 50 I can aggregate it