pandas

pandas read_csv parse header as string type but i want integer

你。 提交于 2021-02-07 08:37:08
问题 for example, csv file is as below ,(1,2,3) is header! 1,2,3 0,0,0 I read csv file using pd.read_csv and print import pandas as pd df = pd.read_csv('./test.csv') print(df[1]) it occur error key error:1 it seems like that read_csv parse header as string.. is there any way using integer type in dataframe column? 回答1: I think more general is cast to columns names to integer by astype: df = pd.read_csv('./test.csv') df.columns = df.columns.astype(int) Another way is first get only first column and

How to use df.rolling(window, min_periods, win_type='exponential').sum()

自古美人都是妖i 提交于 2021-02-07 08:36:37
问题 I would like to calculate the rolling exponentially weighted mean with df.rolling().mean() . I get stuck at the win_type = 'exponential' . I have tried other *win_types such as 'gaussian'. I think there would be sth a little different from 'exponential'. dfTemp.rolling(window=21, min_periods=10, win_type='gaussian').mean(std=1) # works fine but when it comes to 'exponential', dfTemp.rolling(window=21, min_periods=10, win_type='exponential').mean(tau=10) # ValueError: The 'exponential' window

Pandas: read_csv indicating 'space-delimited'

房东的猫 提交于 2021-02-07 08:35:28
问题 I have the following file.txt (abridged): SICcode Catcode Category SICname MultSIC 0111 A1500 Wheat, corn, soybeans and cash grain Wheat X 0112 A1600 Other commodities (incl rice, peanuts) Rice X 0115 A1500 Wheat, corn, soybeans and cash grain Corn X 0116 A1500 Wheat, corn, soybeans and cash grain Soybeans X 0119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC X 0131 A1100 Cotton Cotton X 0132 A1300 Tobacco & Tobacco products Tobacco X I'm having some problems reading it into a

pandas to_html no value representation

 ̄綄美尐妖づ 提交于 2021-02-07 08:32:29
问题 When I run the line below, the NaN number in the dataframe does not get modified. Utilizing the exact same argument with .to_csv() , I get the expected result. Does .to_html require something different? df.to_html('file.html', float_format='{0:.2f}'.format, na_rep="NA_REP") 回答1: It looks like the float_format doesn't play nice with na_rep . However, you can work around it if you pass a function to float_format that conditionally handles your NaNs along with the float formatting you want: >>>

Does Spark Dataframe have an equivalent option of Panda's merge indicator?

假装没事ソ 提交于 2021-02-07 08:17:51
问题 The python Pandas library contains the following function : DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False) The indicator field combined with Panda's value_counts() function can be used to quickly determine how well a join performed. Example: In [48]: df1 = pd.DataFrame({'col1': [0, 1], 'col_left':['a', 'b']}) In [49]: df2 = pd.DataFrame({'col1': [1, 2, 2],'col_right':

Does Spark Dataframe have an equivalent option of Panda's merge indicator?

浪子不回头ぞ 提交于 2021-02-07 08:16:41
问题 The python Pandas library contains the following function : DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False) The indicator field combined with Panda's value_counts() function can be used to quickly determine how well a join performed. Example: In [48]: df1 = pd.DataFrame({'col1': [0, 1], 'col_left':['a', 'b']}) In [49]: df2 = pd.DataFrame({'col1': [1, 2, 2],'col_right':

Multiplying pandas dataframe and series, element wise

我是研究僧i 提交于 2021-02-07 08:14:20
问题 Lets say I have a pandas series: import pandas as pd x = pd.DataFrame({0: [1,2,3], 1: [4,5,6], 2: [7,8,9] }) y = pd.Series([-1, 1, -1]) I want to multiply x and y in such a way that I get z: z = pd.DataFrame({0: [-1,2,-3], 1: [-4,5,-6], 2: [-7,8,-9] }) In other words, if element j of the series is -1, then all elements of the j-th row of x get multiplied by -1. If element k of the series is 1, then all elements of the j-th row of x get multiplied by 1. How do I do this? 回答1: You can do that:

Fastest way to read huge MySQL table in python

て烟熏妆下的殇ゞ 提交于 2021-02-07 07:57:51
问题 I was trying to read a very huge MySQL table made of several millions of rows. I have used Pandas library and chunks . See the code below: import pandas as pd import numpy as np import pymysql.cursors connection = pymysql.connect(user='xxx', password='xxx', database='xxx', host='xxx') try: with connection.cursor() as cursor: query = "SELECT * FROM example_table;" chunks=[] for chunk in pd.read_sql(query, connection, chunksize = 1000): chunks.append(chunk) #print(len(chunks)) result = pd

How to find the intersection of a pair of columns in multiple pandas dataframes with pairs in any order?

淺唱寂寞╮ 提交于 2021-02-07 07:56:56
问题 I have multiple pandas dataframes, to keep it simple, let's say I have three. >> df1= col1 col2 id1 A B id2 C D id3 B A id4 E F >> df2= col1 col2 id1 B A id2 D C id3 M N id4 F E >> df3= col1 col2 id1 A B id2 D C id3 N M id4 E F The result needed is : >> df= col1 col2 id1 A B id2 C D id3 E F Because the pairs (A, B),(C, D),(E, F) appear in all the data frames although it may be reversed. While using pandas merge it just considers the way columns are passed. To check my observation I tried the

Fastest way to read huge MySQL table in python

孤街醉人 提交于 2021-02-07 07:56:11
问题 I was trying to read a very huge MySQL table made of several millions of rows. I have used Pandas library and chunks . See the code below: import pandas as pd import numpy as np import pymysql.cursors connection = pymysql.connect(user='xxx', password='xxx', database='xxx', host='xxx') try: with connection.cursor() as cursor: query = "SELECT * FROM example_table;" chunks=[] for chunk in pd.read_sql(query, connection, chunksize = 1000): chunks.append(chunk) #print(len(chunks)) result = pd