pandas

ValueError: cannot insert ID, already exists

北城余情 提交于 2021-02-05 20:22:37
问题 I have this data: ID TIME 1 2 1 4 1 2 2 3 I want to group the data by ID and calculate the mean time and the size of each group. ID MEAN_TIME COUNT 1 2.67 3 2 3.00 1 If I run this code, then I get an error "ValueError: cannot insert ID, already exists": result = df.groupby(['ID']).agg({'TIME': 'mean', 'ID': 'count'}).reset_index() 回答1: Use parameter drop=True which not create new column with index but remove it: result = df.groupby(['ID']).agg({'TIME': 'mean', 'ID': 'count'}).reset_index(drop

Formatting latex (to_latex) output

余生颓废 提交于 2021-02-05 20:11:08
问题 I've read about the to_latex method, but it's not clear how to use the formatters argument . I have some numbers which are too long and some which I want thousand separators . A side issue for the to_latex method on multi-indexed tables, the indices are parsed together and it issues some & s in the latex output. 回答1: For a simple data frame. First, without formatters: In [11]: df Out[11]: c1 c2 first 0.821354 0.936703 second 0.138376 0.482180 In [12]: print df.to_latex() \begin{tabular}{|l|c

Formatting latex (to_latex) output

吃可爱长大的小学妹 提交于 2021-02-05 20:10:33
问题 I've read about the to_latex method, but it's not clear how to use the formatters argument . I have some numbers which are too long and some which I want thousand separators . A side issue for the to_latex method on multi-indexed tables, the indices are parsed together and it issues some & s in the latex output. 回答1: For a simple data frame. First, without formatters: In [11]: df Out[11]: c1 c2 first 0.821354 0.936703 second 0.138376 0.482180 In [12]: print df.to_latex() \begin{tabular}{|l|c

Formatting latex (to_latex) output

流过昼夜 提交于 2021-02-05 20:10:01
问题 I've read about the to_latex method, but it's not clear how to use the formatters argument . I have some numbers which are too long and some which I want thousand separators . A side issue for the to_latex method on multi-indexed tables, the indices are parsed together and it issues some & s in the latex output. 回答1: For a simple data frame. First, without formatters: In [11]: df Out[11]: c1 c2 first 0.821354 0.936703 second 0.138376 0.482180 In [12]: print df.to_latex() \begin{tabular}{|l|c

Efficient way to read 15 M lines csv files in python

泄露秘密 提交于 2021-02-05 18:54:07
问题 For my application, I need to read multiple files with 15 M lines each, store them in a DataFrame, and save the DataFrame in HDFS5 format. I've already tried different approaches, notably pandas.read_csv with chunksize and dtype specifications, and dask.dataframe. They both take around 90 seconds to treat 1 file, and so I'd like to know if there's a way to efficiently treat these files in the described way. In the following, I show some code of the tests I've done. import pandas as pd import

Efficient way to read 15 M lines csv files in python

别等时光非礼了梦想. 提交于 2021-02-05 18:52:03
问题 For my application, I need to read multiple files with 15 M lines each, store them in a DataFrame, and save the DataFrame in HDFS5 format. I've already tried different approaches, notably pandas.read_csv with chunksize and dtype specifications, and dask.dataframe. They both take around 90 seconds to treat 1 file, and so I'd like to know if there's a way to efficiently treat these files in the described way. In the following, I show some code of the tests I've done. import pandas as pd import

unpacking a sql select into a pandas dataframe

这一生的挚爱 提交于 2021-02-05 12:55:47
问题 Suppose I have a select roughly like this: select instrument, price, date from my_prices; How can I unpack the prices returned into a single dataframe with a series for each instrument and indexed on date? To be clear: I'm looking for: <class 'pandas.core.frame.DataFrame'> DatetimeIndex: ... Data columns (total 2 columns): inst_1 ... inst_2 ... dtypes: float64(1), object(1) I'm NOT looking for: <class 'pandas.core.frame.DataFrame'> DatetimeIndex: ... Data columns (total 2 columns): instrument

Return Pandas dataframe from PostgreSQL query with sqlalchemy

梦想与她 提交于 2021-02-05 12:50:52
问题 I want to query a PostgreSQL database and return the output as a Pandas dataframe. I created a connection to the database with 'SqlAlchemy': from sqlalchemy import create_engine engine = create_engine('postgresql://user@localhost:5432/mydb') I write a Pandas dataframe to a database table: i=pd.read_csv(path) i.to_sql('Stat_Table',engine,if_exists='replace') Based on the docs, looks like pd.read_sql_query() should accept a SQLAlchemy engine: a=pd.read_sql_query('select * from Stat_Table',con

Reassign index of a dataframe

陌路散爱 提交于 2021-02-05 12:30:19
问题 I have the following dataframe: Month 1 -0.075844 2 -0.089111 3 0.042705 4 0.002147 5 -0.010528 6 0.109443 7 0.198334 8 0.209830 9 0.075139 10 -0.062405 11 -0.211774 12 -0.109167 1 -0.075844 2 -0.089111 3 0.042705 4 0.002147 5 -0.010528 6 0.109443 7 0.198334 8 0.209830 9 0.075139 10 -0.062405 11 -0.211774 12 -0.109167 Name: Passengers, dtype: float64 As you can see numbers are listed twice from 1-12 / 1-12, instead, I would like to change the index to 1-24. The problem is that when plotting

not able to groupby by one level in My dataframe by pandas

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-05 12:29:01
问题 I am importing an excel document and creating a dataframe, df3 . I want to group by only Name . The other uplicate data should reflect as shown in the output. Df3 =pd.read_excel('stats') print (df3) Name ID Month Shift Jon 1 Feb A Jon 1 Jan B Jon 1 Mar C Mike 1 Jan A Mike 1 Jan B Jon 1 Feb C Jon 1 Jan A Output Required: I want to have output like as below in the same format and will save in excel. Please help me on same as I'm stuck here. Note (Month must be ascending order) Will be greatfull