dataframe | 易学教程

Pandas how to get rows with consecutive dates and sales more than 1000?

阅读更多关于 Pandas how to get rows with consecutive dates and sales more than 1000?

问题 I have a data frame called df : Date Sales 01/01/2020 812 02/01/2020 981 03/01/2020 923 04/01/2020 1033 05/01/2020 988 ... ... How can I get the first occurrence of 7 consecutive days with sales above 1000? This is what I am doing to find the rows where sales is above 1000: In [221]: df.loc[df["sales"] >= 1000] Out [221]: Date Sales 04/01/2020 1033 08/01/2020 1008 09/01/2020 1091 17/01/2020 1080 18/01/2020 1121 19/01/2020 1098 ... ... 回答1: You can assign a unique identifier per consecutive

Pandas how to get rows with consecutive dates and sales more than 1000?

阅读更多关于 Pandas how to get rows with consecutive dates and sales more than 1000?

Can Dataframe joins in Spark preserve order?

阅读更多关于 Can Dataframe joins in Spark preserve order?

问题 I'm currently trying to join two DataFrames together but retain the same order in one of the Dataframes. From Which operations preserve RDD order?, it seems that (correct me if this is inaccurate because I'm new to Spark) joins do not preserve order because rows are joined / "arrive" at the final dataframe not in a specified order due to the data being in different partitions. How could one perform a join of two DataFrames while preserving the order of one table? E.g., +------------+---------

Python/Pandas: Converting numbers by comma separated for thousands

阅读更多关于 Python/Pandas: Converting numbers by comma separated for thousands

问题 I have a dataframe with a column containing long numbers. I am trying to convert all the values in the numbers column to comma separated for thousands. df col_1 col_2 Rooney 34590927 Ronaldo 5467382 John 25647398 How do I iterate and get the following result? Expected result: col_1 col_2 Rooney 34,590,927 Ronaldo 5,467,382 John 25,647,398 回答1: You can use string formatting, df['col_2'] = pd.to_numeric(df['col_2'].fillna(0), errors='coerce') df['col_2'] = df['col_2'].map('{:,.2f}'.format) Do

Divide or split dataframe into multiple dfs based on empty row and header title

阅读更多关于 Divide or split dataframe into multiple dfs based on empty row and header title

问题 I have a dataframe which has multiple values in a single file. I want to divide it into multiple files around 25 from the file. Pattern for the file is where there is one blank row and a header title is there , it is a new df. I Have tried this Splitting dataframes in R based on empty rows but this does not take care of any blank row within the new df (V1 column 9th row). I want the data to be divided on empty row and a header title my data and code i have tried is given below . Also how can

Divide or split dataframe into multiple dfs based on empty row and header title

阅读更多关于 Divide or split dataframe into multiple dfs based on empty row and header title

scala - how to substring column names after the last dot?

阅读更多关于 scala - how to substring column names after the last dot?

问题 After exploding a nested structure I have a DataFrame with column names like this: sales_data.metric1 sales_data.type.metric2 sales_data.type3.metric3 When performing a select I'm getting the error: cannot resolve 'sales_data.metric1' given input columns: [sales_data.metric1, sales_data.type.metric2, sales_data.type3.metric3] How should I select from the DataFrame so the column names are parsed correctly? I've tried the following: the substrings after dots are extracted successfully. But

Multiply dataframe with values from other dataframe

阅读更多关于 Multiply dataframe with values from other dataframe

问题 I have two dataframes df1 = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]], index = ['a','b','c', 'a'], columns = ['d','e']) d e a 1 2 b 3 4 c 5 6 a 7 8 df2 = pd.DataFrame([['a', 10],['b',20],['c',30],['f',40]]) 0 1 0 a 10 1 b 20 2 c 30 3 f 40 i want my final dataframe to multiply rows of df1 to multiply by a factor corresponding to value in df2 (for eg. 20 for b) so my output should look like d e a 10 20 b 60 80 c 150 180 a 70 80 Kindly provide a solution assuming df1 to be hundreds of rows in

pandas sum the differences between two columns in each group

阅读更多关于 pandas sum the differences between two columns in each group

问题 I have a df looks like, A B C D 2017-10-01 2017-10-11 M 2017-10 2017-10-02 2017-10-03 M 2017-10 2017-11-01 2017-11-04 B 2017-11 2017-11-08 2017-11-09 B 2017-11 2018-01-01 2018-01-03 A 2018-01 the dtype of A and B are datetime64 , C and D are of strings ; I like to groupby C and D and get the differences between B and A , df.groupby(['C', 'D']).apply(lambda row: row['B'] - row['A']) but I don't know how to sum such differences in each group and assign the values to a new column say E ,

pandas sum the differences between two columns in each group

阅读更多关于 pandas sum the differences between two columns in each group