dataframe | 易学教程

How to Merge Columns in Rows in a Dataframe that fulfill a Condition, while deleting the Rows

阅读更多关于 How to Merge Columns in Rows in a Dataframe that fulfill a Condition, while deleting the Rows

问题 I dont think I can solve it with groupby() or agg() like in these (Question1, Question2)'s. I have a pandas.DataFrame that has one identifier column ( ID_Code ) and some information columns( information 1 and information 2 ). I need to aggregate some of the identifiers. Meaning some have to be deleted and their information has to be added into specific other rows. To illustrate my problem here is something I made up: import pandas as pd inp = [{'ID_Code':1,'information 1':list(x * 3 for x in

Apply a function to multiple dataframes

阅读更多关于 Apply a function to multiple dataframes

问题 I have many dataframes where missing values are denoted by the character string 'NA' which are not understood as missing by R. The lengthy solution would be to apply the following function to each dataframe: mydf[mydf == 'NA'] <- NA I want to apply the above function to many dataframes. Consider the following example: set.seed(123) A=as.data.frame(matrix(sample(c('NA',1:10),10*10,T),10))) B=as.data.frame(matrix(sample(c('NA',LETTERS[1:10]),10*10,T),10)) C=as.data.frame(matrix(sample(c('NA'

Saving multiple dataframes to multiple excel sheets multiple times?

阅读更多关于 Saving multiple dataframes to multiple excel sheets multiple times?

问题 I have a function to save multiple dataframes as multiple tables to single excel workbook sheet: def multiple_dfs(df_list, sheets, file_name, spaces): writer = pd.ExcelWriter(file_name,engine='xlsxwriter') row = 0 for dataframe in df_list: dataframe.to_excel(writer,sheet_name=sheets,startrow=row , startcol=0) row = row + len(dataframe.index) + spaces + 1 writer.save() If I call this function multiple times to write multiple tables to multiple sheets, I end up with just one workbook and one

Creating a Random Feature Array in Spark DataFrames

阅读更多关于 Creating a Random Feature Array in Spark DataFrames

问题 When creating an ALS model, we can extract a userFactors DataFrame and an itemFactors DataFrame. These DataFrames contain a column with an Array. I would like to generate some random data and union it to the userFactors DataFrame. Here is my code: val df1: DataFrame = Seq((123, 456, 4.0), (123, 789, 5.0), (234, 456, 4.5), (234, 789, 1.0)).toDF("user", "item", "rating") val model1 = (new ALS() .setImplicitPrefs(true) .fit(df1)) val iF = model1.itemFactors val uF = model1.userFactors I then

Check if two rows in pandas DataFrame has same set of values regard & regardless of column order

阅读更多关于 Check if two rows in pandas DataFrame has same set of values regard & regardless of column order

问题 I have two dataframe with same index but different column names. Number of columns are the same. I want to check, index by index, 1) whether they have same set of values regardless of column order, and 2) whether they have same set of values regarding column order. ind = ['aaa', 'bbb', 'ccc'] df1 = pd.DataFrame({'old1': ['A','A','A'], 'old2': ['B','B','B'], 'old3': ['C','C','C']}, index=ind) df2 = pd.DataFrame({'new1': ['A','A','A'], 'new2': ['B','C','B'], 'new3': ['C','B','D']}, index=ind)

Plotting binned correlation of two variables using common axis

阅读更多关于 Plotting binned correlation of two variables using common axis

问题 I have three lists that I have loaded into a pandas dataframe. import pandas as pd df = pd.DataFrame({'x': location}) df = df.assign(y1 = variable1) df = df.assign(y2 = variable2) I would like to plot the correlation of y1 with y2 with x being the common x-axis. That is, really, I would like to bin y1 and y2 values according to x location, find the correlation of y1 with y2 within each bin and then plot a line of the correlations across the whole x domain. So my final plot will have

Replace values within a groupby based on multiple conditions

阅读更多关于 Replace values within a groupby based on multiple conditions

问题 My question is related to this one but I'm still not seeing how I can apply the answer to my problem. I have a DataFrame like so: df = pd.DataFrame({ 'date': ['2001-01-01', '2001-02-01', '2001-03-01', '2001-04-01', '2001-02-01', '2001-03-01', '2001-04-01'], 'cohort': ['2001-01-01', '2001-01-01', '2001-01-01', '2001-01-01', '2001-02-01', '2001-02-01', '2001-02-01'], 'val': [100, 101, 102, 101, 200, 201, 201] }) df date cohort val 0 2001-01-01 2001-01-01 100 1 2001-02-01 2001-01-01 101 2 2001

Invalid Argument in pd.read_excel

阅读更多关于 Invalid Argument in pd.read_excel

问题 f=[] for root, dirs, files in os.walk(os.path.abspath(r"F:\Mathnasium Project\Downloaded files")): for file in files: f.append(os.path.join("r"+'"'+root, file+'"')) for x in f: print(x) z=pd.read_excel(x) student_report=pd.merge(student_report,z,how='left',left_on='Student Name',right_on='Student') an error comes up as invalid argument in the pd.read_excel() OSError: [Errno 22] Invalid argument: 'r"F:\\Mathnasium Project\\Downloaded files\\Abdelrahman Mahmoud LP 05_11_2020.xlsx"' and i don't

How to format columns dates in python that they are weekly based on eachother?

阅读更多关于 How to format columns dates in python that they are weekly based on eachother?

问题 I have a dataframe df that looks similar to this: identity Start End week E 6/18/2020 7/2/2020 1 E 6/18/2020 7/2/2020 2 2D 7/18/2020 8/1/2020 1 2D 7/18/2020 8/1/2020 2 A1 9/6/2020 9/20/2020 1 A1 9/6/2020 9/20/2020 2 The problem is that when I extracted the data I only had Start date and End date for every identity it replaced, but I have the data by weeks all identitys have the same amount of weeks some times all identitys can have 5 or 6 weeks but they are always the same. I want to make

Python: How do I output a element to a specific column and rows depending on the result of if statement

阅读更多关于 Python: How do I output a element to a specific column and rows depending on the result of if statement

问题 Incorporating with excel, I'm looking for a solution that would copy a specific element to another element depending if isOrganization is true. Using pandas df['isOrganization'] = df['Code'].str.endswith('000') statement, I managed to list true and false result with print function. If the column isOrganization is true, then the row that is true should be copied from column E and F to column B and C. Else: the row should be copied from column E and F to column D and E I.E. : This copies the