pandas | 易学教程

pandas read_csv parse header as string type but i want integer

阅读更多关于 pandas read_csv parse header as string type but i want integer

问题 for example, csv file is as below ,(1,2,3) is header! 1,2,3 0,0,0 I read csv file using pd.read_csv and print import pandas as pd df = pd.read_csv('./test.csv') print(df[1]) it occur error key error:1 it seems like that read_csv parse header as string.. is there any way using integer type in dataframe column? 回答1: I think more general is cast to columns names to integer by astype: df = pd.read_csv('./test.csv') df.columns = df.columns.astype(int) Another way is first get only first column and

How to use df.rolling(window, min_periods, win_type='exponential').sum()

阅读更多关于 How to use df.rolling(window, min_periods, win_type='exponential').sum()

问题 I would like to calculate the rolling exponentially weighted mean with df.rolling().mean() . I get stuck at the win_type = 'exponential' . I have tried other *win_types such as 'gaussian'. I think there would be sth a little different from 'exponential'. dfTemp.rolling(window=21, min_periods=10, win_type='gaussian').mean(std=1) # works fine but when it comes to 'exponential', dfTemp.rolling(window=21, min_periods=10, win_type='exponential').mean(tau=10) # ValueError: The 'exponential' window

Pandas: read_csv indicating 'space-delimited'

阅读更多关于 Pandas: read_csv indicating 'space-delimited'

问题 I have the following file.txt (abridged): SICcode Catcode Category SICname MultSIC 0111 A1500 Wheat, corn, soybeans and cash grain Wheat X 0112 A1600 Other commodities (incl rice, peanuts) Rice X 0115 A1500 Wheat, corn, soybeans and cash grain Corn X 0116 A1500 Wheat, corn, soybeans and cash grain Soybeans X 0119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC X 0131 A1100 Cotton Cotton X 0132 A1300 Tobacco & Tobacco products Tobacco X I'm having some problems reading it into a

pandas to_html no value representation

阅读更多关于 pandas to_html no value representation

问题 When I run the line below, the NaN number in the dataframe does not get modified. Utilizing the exact same argument with .to_csv() , I get the expected result. Does .to_html require something different? df.to_html('file.html', float_format='{0:.2f}'.format, na_rep="NA_REP") 回答1: It looks like the float_format doesn't play nice with na_rep . However, you can work around it if you pass a function to float_format that conditionally handles your NaNs along with the float formatting you want: >>>

Does Spark Dataframe have an equivalent option of Panda's merge indicator?

阅读更多关于 Does Spark Dataframe have an equivalent option of Panda's merge indicator?

问题 The python Pandas library contains the following function : DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False) The indicator field combined with Panda's value_counts() function can be used to quickly determine how well a join performed. Example: In [48]: df1 = pd.DataFrame({'col1': [0, 1], 'col_left':['a', 'b']}) In [49]: df2 = pd.DataFrame({'col1': [1, 2, 2],'col_right':

Does Spark Dataframe have an equivalent option of Panda's merge indicator?

阅读更多关于 Does Spark Dataframe have an equivalent option of Panda's merge indicator?

Multiplying pandas dataframe and series, element wise

阅读更多关于 Multiplying pandas dataframe and series, element wise

问题 Lets say I have a pandas series: import pandas as pd x = pd.DataFrame({0: [1,2,3], 1: [4,5,6], 2: [7,8,9] }) y = pd.Series([-1, 1, -1]) I want to multiply x and y in such a way that I get z: z = pd.DataFrame({0: [-1,2,-3], 1: [-4,5,-6], 2: [-7,8,-9] }) In other words, if element j of the series is -1, then all elements of the j-th row of x get multiplied by -1. If element k of the series is 1, then all elements of the j-th row of x get multiplied by 1. How do I do this? 回答1: You can do that:

Fastest way to read huge MySQL table in python

阅读更多关于 Fastest way to read huge MySQL table in python

问题 I was trying to read a very huge MySQL table made of several millions of rows. I have used Pandas library and chunks . See the code below: import pandas as pd import numpy as np import pymysql.cursors connection = pymysql.connect(user='xxx', password='xxx', database='xxx', host='xxx') try: with connection.cursor() as cursor: query = "SELECT * FROM example_table;" chunks=[] for chunk in pd.read_sql(query, connection, chunksize = 1000): chunks.append(chunk) #print(len(chunks)) result = pd

How to find the intersection of a pair of columns in multiple pandas dataframes with pairs in any order?

阅读更多关于 How to find the intersection of a pair of columns in multiple pandas dataframes with pairs in any order?

问题 I have multiple pandas dataframes, to keep it simple, let's say I have three. >> df1= col1 col2 id1 A B id2 C D id3 B A id4 E F >> df2= col1 col2 id1 B A id2 D C id3 M N id4 F E >> df3= col1 col2 id1 A B id2 D C id3 N M id4 E F The result needed is : >> df= col1 col2 id1 A B id2 C D id3 E F Because the pairs (A, B),(C, D),(E, F) appear in all the data frames although it may be reversed. While using pandas merge it just considers the way columns are passed. To check my observation I tried the

Fastest way to read huge MySQL table in python

阅读更多关于 Fastest way to read huge MySQL table in python