pandas

Pandas DataFrame iloc spoils the data type

回眸只為那壹抹淺笑 提交于 2021-02-07 06:50:59
问题 Having pandas 0.19.2. Here's an example: testdf = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [1.0, 2.0, 3.0, 4.0]}) testdf.dtypes Output: A int64 B float64 dtype: object Everything looks fine for now, but what I don't like is that (note, that first call is a pd.Series.iloc and the second one is pd.DataFrame.iloc ) print(type(testdf.A.iloc[0])) print(type(testdf.iloc[0].A)) Output: <class 'numpy.int64'> <class 'numpy.float64'> I found it while trying to understand why pd.DataFrame.join() operation

Remove Header and Footer from Pandas Dataframe print

江枫思渺然 提交于 2021-02-07 06:47:05
问题 The following code prints all the values I want but has "Date" as the first row and "Name: Close, Length: 1828, dtype: float64" as the last row import pandas as pd from pandas.io.data import DataReader from datetime import datetime ibm = DataReader('IBM', 'yahoo', datetime(2009,1,1)) pd.set_option('display.max_rows',len(ibm)) print ibm["Close"] How do I print the data w/o this first "Date" line and the last "Name: Close, Length: 1828, dtype:float64" line? Slicing doesn't work, I've tried

How can I delete rows for a particular Date in a Pandas dataframe?

蓝咒 提交于 2021-02-07 06:46:28
问题 I've got a Pandas DataFrame using Date as an index. How can I drop all rows that have the date "2000-01-06"? Sample code: import numpy as np import pandas as pd dates = pd.date_range('1/1/2000', periods=8) df = pd.DataFrame(np.random.randn(8, 3), index=dates, columns=['A', 'B', 'C']) df.index.name = 'Date' Example DataFrame: A B C Date 2000-01-01 -0.501547 -0.227375 0.275931 2000-01-02 0.994459 1.266288 -0.178603 2000-01-03 -0.982746 -0.339685 0.157390 2000-01-04 -1.013987 -1.074076 -2.346117

How to perform a cumulative sum of distinct values in pandas dataframe

China☆狼群 提交于 2021-02-07 06:01:50
问题 I have a dataframe like this: id date company ...... 123 2019-01-01 A 224 2019-01-01 B 345 2019-01-01 B 987 2019-01-03 C 334 2019-01-03 C 908 2019-01-04 C 765 2019-01-04 A 554 2019-01-05 A 482 2019-01-05 D and I want to get the cumulative number of unique values over time for the 'company' column. So if a company appears at a later date they are not counted again. My expected output is: date cumulative_count 2019-01-01 2 2019-01-03 3 2019-01-04 3 2019-01-05 4 I've tried: df.groupby(['date'])

How to perform a cumulative sum of distinct values in pandas dataframe

我只是一个虾纸丫 提交于 2021-02-07 06:01:06
问题 I have a dataframe like this: id date company ...... 123 2019-01-01 A 224 2019-01-01 B 345 2019-01-01 B 987 2019-01-03 C 334 2019-01-03 C 908 2019-01-04 C 765 2019-01-04 A 554 2019-01-05 A 482 2019-01-05 D and I want to get the cumulative number of unique values over time for the 'company' column. So if a company appears at a later date they are not counted again. My expected output is: date cumulative_count 2019-01-01 2 2019-01-03 3 2019-01-04 3 2019-01-05 4 I've tried: df.groupby(['date'])

Load pandas dataframe with chunksize determined by column variable

喜你入骨 提交于 2021-02-07 05:47:12
问题 If I have a csv file that's too large to load into memory with pandas (in this case 35gb), I know it's possible to process the file in chunks, with chunksize. However I want to know if it's possible to change chunksize based on values in a column. I have an ID column, and then several rows for each ID with information, like this: ID, Time, x, y sasd, 10:12, 1, 3 sasd, 10:14, 1, 4 sasd, 10:32, 1, 2 cgfb, 10:02, 1, 6 cgfb, 10:13, 1, 3 aenr, 11:54, 2, 5 tory, 10:27, 1, 3 tory, 10:48, 3, 5 ect...

Load pandas dataframe with chunksize determined by column variable

孤人 提交于 2021-02-07 05:46:21
问题 If I have a csv file that's too large to load into memory with pandas (in this case 35gb), I know it's possible to process the file in chunks, with chunksize. However I want to know if it's possible to change chunksize based on values in a column. I have an ID column, and then several rows for each ID with information, like this: ID, Time, x, y sasd, 10:12, 1, 3 sasd, 10:14, 1, 4 sasd, 10:32, 1, 2 cgfb, 10:02, 1, 6 cgfb, 10:13, 1, 3 aenr, 11:54, 2, 5 tory, 10:27, 1, 3 tory, 10:48, 3, 5 ect...

With `pandas.cut()`, how do I get integer bins and avoid getting a negative lowest bound?

旧巷老猫 提交于 2021-02-07 05:35:06
问题 My dataframe has zero as the lowest value. I am trying to use the precision and include_lowest parameters of pandas.cut() , but I can't get the intervals consist of integers rather than floats with one decimal. I can also not get the left most interval to stop at zero. import pandas as pd import seaborn as sns import matplotlib.pyplot as plt sns.set(style='white', font_scale=1.3) df = pd.DataFrame(range(0,389,8)[:-1], columns=['value']) df['binned_df_pd'] = pd.cut(df.value, bins=7, precision

Python Pandas Expand a Column of List of Lists to Two New Column

自闭症网瘾萝莉.ら 提交于 2021-02-07 05:25:07
问题 I have a DF which looks like this. name id apps john 1 [[app1, v1], [app2, v2], [app3,v3]] smith 2 [[app1, v1], [app4, v4]] I want to expand the apps column such that it looks like this. name id app_name app_version john 1 app1 v1 john 1 app2 v2 john 1 app3 v3 smith 2 app1 v1 smith 2 app4 v4 Any help is appreciated 回答1: You can .apply(pd.Series) twice to get what you need as an intermediate step, then merge back to the original dataframe. import pandas as pd df = pd.DataFrame({ 'name': ['john

'Could not interpret input' error with Seaborn when plotting groupbys

眉间皱痕 提交于 2021-02-07 05:20:15
问题 Say I have this dataframe d = { 'Path' : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'], 'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'], 'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'], 'Value' : [30, 20, 10, 40, 40, 50], 'Field' : [50, 70, 10, 20, 30, 30] } df = DataFrame(d) df.set_index(['Path', 'Detail'], inplace=True) df Field Program Value Path Detail abc foo 50 prog1 30 bar 70 prog1 20 ghi bar 10 prog1 10 foo 20 prog2 40 jkl foo 30 prog3 40 foo 30 prog3 50 I can aggregate it