dataframe

How to Merge Columns in Rows in a Dataframe that fulfill a Condition, while deleting the Rows

喜你入骨 提交于 2021-01-28 12:36:24
问题 I dont think I can solve it with groupby() or agg() like in these (Question1, Question2)'s. I have a pandas.DataFrame that has one identifier column ( ID_Code ) and some information columns( information 1 and information 2 ). I need to aggregate some of the identifiers. Meaning some have to be deleted and their information has to be added into specific other rows. To illustrate my problem here is something I made up: import pandas as pd inp = [{'ID_Code':1,'information 1':list(x * 3 for x in

Apply a function to multiple dataframes

狂风中的少年 提交于 2021-01-28 12:32:08
问题 I have many dataframes where missing values are denoted by the character string 'NA' which are not understood as missing by R. The lengthy solution would be to apply the following function to each dataframe: mydf[mydf == 'NA'] <- NA I want to apply the above function to many dataframes. Consider the following example: set.seed(123) A=as.data.frame(matrix(sample(c('NA',1:10),10*10,T),10))) B=as.data.frame(matrix(sample(c('NA',LETTERS[1:10]),10*10,T),10)) C=as.data.frame(matrix(sample(c('NA'

Saving multiple dataframes to multiple excel sheets multiple times?

时光毁灭记忆、已成空白 提交于 2021-01-28 12:30:24
问题 I have a function to save multiple dataframes as multiple tables to single excel workbook sheet: def multiple_dfs(df_list, sheets, file_name, spaces): writer = pd.ExcelWriter(file_name,engine='xlsxwriter') row = 0 for dataframe in df_list: dataframe.to_excel(writer,sheet_name=sheets,startrow=row , startcol=0) row = row + len(dataframe.index) + spaces + 1 writer.save() If I call this function multiple times to write multiple tables to multiple sheets, I end up with just one workbook and one

Creating a Random Feature Array in Spark DataFrames

自闭症网瘾萝莉.ら 提交于 2021-01-28 12:15:56
问题 When creating an ALS model, we can extract a userFactors DataFrame and an itemFactors DataFrame. These DataFrames contain a column with an Array. I would like to generate some random data and union it to the userFactors DataFrame. Here is my code: val df1: DataFrame = Seq((123, 456, 4.0), (123, 789, 5.0), (234, 456, 4.5), (234, 789, 1.0)).toDF("user", "item", "rating") val model1 = (new ALS() .setImplicitPrefs(true) .fit(df1)) val iF = model1.itemFactors val uF = model1.userFactors I then

Check if two rows in pandas DataFrame has same set of values regard & regardless of column order

左心房为你撑大大i 提交于 2021-01-28 12:09:28
问题 I have two dataframe with same index but different column names. Number of columns are the same. I want to check, index by index, 1) whether they have same set of values regardless of column order, and 2) whether they have same set of values regarding column order. ind = ['aaa', 'bbb', 'ccc'] df1 = pd.DataFrame({'old1': ['A','A','A'], 'old2': ['B','B','B'], 'old3': ['C','C','C']}, index=ind) df2 = pd.DataFrame({'new1': ['A','A','A'], 'new2': ['B','C','B'], 'new3': ['C','B','D']}, index=ind)

Plotting binned correlation of two variables using common axis

Deadly 提交于 2021-01-28 11:40:21
问题 I have three lists that I have loaded into a pandas dataframe. import pandas as pd df = pd.DataFrame({'x': location}) df = df.assign(y1 = variable1) df = df.assign(y2 = variable2) I would like to plot the correlation of y1 with y2 with x being the common x-axis. That is, really, I would like to bin y1 and y2 values according to x location, find the correlation of y1 with y2 within each bin and then plot a line of the correlations across the whole x domain. So my final plot will have

Replace values within a groupby based on multiple conditions

狂风中的少年 提交于 2021-01-28 11:34:43
问题 My question is related to this one but I'm still not seeing how I can apply the answer to my problem. I have a DataFrame like so: df = pd.DataFrame({ 'date': ['2001-01-01', '2001-02-01', '2001-03-01', '2001-04-01', '2001-02-01', '2001-03-01', '2001-04-01'], 'cohort': ['2001-01-01', '2001-01-01', '2001-01-01', '2001-01-01', '2001-02-01', '2001-02-01', '2001-02-01'], 'val': [100, 101, 102, 101, 200, 201, 201] }) df date cohort val 0 2001-01-01 2001-01-01 100 1 2001-02-01 2001-01-01 101 2 2001

Invalid Argument in pd.read_excel

狂风中的少年 提交于 2021-01-28 11:25:20
问题 f=[] for root, dirs, files in os.walk(os.path.abspath(r"F:\Mathnasium Project\Downloaded files")): for file in files: f.append(os.path.join("r"+'"'+root, file+'"')) for x in f: print(x) z=pd.read_excel(x) student_report=pd.merge(student_report,z,how='left',left_on='Student Name',right_on='Student') an error comes up as invalid argument in the pd.read_excel() OSError: [Errno 22] Invalid argument: 'r"F:\\Mathnasium Project\\Downloaded files\\Abdelrahman Mahmoud LP 05_11_2020.xlsx"' and i don't

How to format columns dates in python that they are weekly based on eachother?

巧了我就是萌 提交于 2021-01-28 11:25:18
问题 I have a dataframe df that looks similar to this: identity Start End week E 6/18/2020 7/2/2020 1 E 6/18/2020 7/2/2020 2 2D 7/18/2020 8/1/2020 1 2D 7/18/2020 8/1/2020 2 A1 9/6/2020 9/20/2020 1 A1 9/6/2020 9/20/2020 2 The problem is that when I extracted the data I only had Start date and End date for every identity it replaced, but I have the data by weeks all identitys have the same amount of weeks some times all identitys can have 5 or 6 weeks but they are always the same. I want to make

Python: How do I output a element to a specific column and rows depending on the result of if statement

妖精的绣舞 提交于 2021-01-28 11:05:19
问题 Incorporating with excel, I'm looking for a solution that would copy a specific element to another element depending if isOrganization is true. Using pandas df['isOrganization'] = df['Code'].str.endswith('000') statement, I managed to list true and false result with print function. If the column isOrganization is true, then the row that is true should be copied from column E and F to column B and C. Else: the row should be copied from column E and F to column D and E I.E. : This copies the