pandas | 易学教程

Cosine Similarity rows in a dataframe of pandas

阅读更多关于 Cosine Similarity rows in a dataframe of pandas

问题 I have a CSV file which have content as belows and I want to calculate the cosine similarity from one the remaining ID in the CSV file. I have load it into a dataframe of pandas as follows: old_df['Vector']=old_df.apply(lambda row: np.array(np.matrix(row.Vector)).ravel(), axis = 1) l=[] for a in old_df['Vector']: l.append(a) A=np.array(l) similarities = cosine_similarity(A) The output looks fine. However, i do not know how to find which the GUID (or ID)similar to other GUID (or ID), and I

pandas read_csv. How to ignore delimiter before line break

阅读更多关于 pandas read_csv. How to ignore delimiter before line break

问题 I'm reading a file with numerical values. data = pd.read_csv('data.dat', sep=' ', header=None) In the text file, each row end with a space, So pandas wait for a value that is not there and add a "nan" at the end of each row. For example: 2.343 4.234 is read as: [2.343, 4.234, nan] I can avoid it using , usecols = [0 1] but I would prefer a more general solution 回答1: You can use regular expressions in your sep argument. Instead of specifying the separator to be one space, you can ask it to use

How to use rolling in pandas?

阅读更多关于 How to use rolling in pandas?

问题 I am working on the code below: # Resample, interpolate and inspect ozone data here data = data.resample('D').interpolate() data.info() # Create the rolling window ***rolling = data.rolling(360)['Ozone'] # Insert the rolling quantiles to the monthly returns data['q10'] = rolling.quantile(.1) data['q50'] = rolling.quantile(.5) data['q90'] = rolling.quantile(.9) # Plot the data data.plot() plt.show() For the starred line (***), I was wondering, can I use the following instead? data['Ozone']

pandas.io.common.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file

阅读更多关于 pandas.io.common.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file

问题 I have large csv files with size more than 10 mb each and about 50+ such files. These inputs have more than 25 columns and more than 50K rows. All these have same headers and I am trying to merge them into one csv with headers to be mentioned only one time. Option: One Code: Working for small sized csv -- 25+ columns but size of the file in kbs. import pandas as pd import glob interesting_files = glob.glob("*.csv") df_list = [] for filename in sorted(interesting_files): df_list.append(pd.read

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9 in position 388: invalid continuation byte

阅读更多关于 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9 in position 388: invalid continuation byte

问题 I am really beginning at python, but I am hours in this line, can't go anywhere without fixing it. cadastro_2019_10= pd.read_csv("inf_cadastral_fi_20191015.csv",delimiter=";")[["CNPJ_FUNDO","DENOM_SOCIAL","CLASSE"]] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9 in position 49: invalid continuation byte cadastro_2019_10= pd.read_csv("inf_cadastral_fi_20191015.csv",delimiter=";")[["CNPJ_FUNDO","DENOM_SOCIAL","CLASSE"]] again: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9

Copy argument vs Series.Copy()

阅读更多关于 Copy argument vs Series.Copy()

问题 y = pd.Series(x, copy=True,dtype=float) z = pd.Series(x, copy=True) a = pd.Series(x) f = pd.Series.copy(x) All the above expressions give the same output of x value and even after updating the x value the change is not reflecting. So I need to know what is the use of copy as argument and the series.copy() and also how to copy x series to some other series such that any changes made in x is reflected back in the new series also. If any thing is wrong or not possible please forgive me... I'm a

How do I create a new column in pandas from the difference of two string columns?

阅读更多关于 How do I create a new column in pandas from the difference of two string columns?

问题 How can I create a new column in pandas that is the result of the difference of two other columns consisting of strings? I have one column titled "Good_Address" which has entries like "123 Fake Street Apt 101" and another column titled "Bad_Address" which has entries like "123 Fake Street". I want the output in column "Address_Difference" to be " Apt101". I've tried doing: import pandas as pd data = pd.read_csv("AddressFile.csv") data['Address Difference'] = data['GOOD_ADR1'].replace(data[

How to left align a dataframe column in python?

阅读更多关于 How to left align a dataframe column in python?

问题 Have to left align a description column in the pandas dataframe in python. Similar to left or right align a cell in excel sheet. is there any solution for this? Image attached for reference. !Dataset 回答1: Try this df.style.set_properties(subset=["col1", "col2"], **{'text-align': 'right'}) 回答2: I think you can just remove the leading spaces. df.Description = df.Description.apply(lambda row: row.lstrip(' ')) 来源： https://stackoverflow.com/questions/53460941/how-to-left-align-a-dataframe-column

How do I create a new column in pandas from the difference of two string columns?

阅读更多关于 How do I create a new column in pandas from the difference of two string columns?

Element-wise mean of a list of pandas DataFrames

阅读更多关于 Element-wise mean of a list of pandas DataFrames

问题 Is there a canonical way to compute the element-wise mean of a list of DataFrames with identical columns and indices? The best way I can think of is from functools import reduce dfs = [df1, df2, df3, df4, df5] reduce(lambda x, y: x.add(y), dfs) / len(dfs) 回答1: Use concat with mean per index values: df1 = pd.DataFrame({ 'C':[7,8,9], 'D':[1,3,5], }) df2 = pd.DataFrame({ 'C':[4,2,3], 'D':[7,1,0], }) df3 = pd.DataFrame({ 'C':[9,4,2], 'D':[1,7,1], }) from functools import reduce dfs = [df1, df2,