pandas | 易学教程

Pandas string extract all the matches

阅读更多关于 Pandas string extract all the matches

问题 I am learning regex operation in pandas series string method. I was able to extract the first number from the string, but my regex is not matching the second number. How to capture both the numbers? Note that second row, the second element is NAN here. CODE: import pandas as pd df = pd.DataFrame({'a': ["number 1.23 has 1.2 ", "number 12.2 has 12 "]}) pat = r""".+\s+ (\d+\.\d+) .+ ((?:\d+\.\d+)?) .+""" df['a'].str.extract(pat,flags=re.X,expand=True) Gives: 0 1 1.23 12.2 Expected: 0 1 1.23 1.2

How to compare two dataframes ignoring column names?

阅读更多关于 How to compare two dataframes ignoring column names?

问题 Suppose I want to compare the content of two dataframes, but not the column names (or index names). Is it possible to achieve this without renaming the columns? For example: df = pd.DataFrame({'A': [1,2], 'B':[3,4]}) df_equal = pd.DataFrame({'a': [1,2], 'b':[3,4]}) df_diff = pd.DataFrame({'A': [1,2], 'B':[3,5]}) In this case, df is df_equal but different to df_diff , because the values in df_equal has the same content, but the ones in df_diff . Notice that the column names in df_equal are

How to split consecutive elements in a list into sublists

阅读更多关于 How to split consecutive elements in a list into sublists

问题 I have the following list: indices_to_remove: [0,1,2,3,..,600,800,801,802,....,1200,1600,1601,1602,...,1800] I have basically 3 subsets of consecutive indices: 0-600 800-1200 1600-1800 I would like to create 3 different small lists that will include only consecutive numbers. Expected outcome: indices_to_remove_1 : [0,1,2,3,....,600] indices_to_remove_2 : [800,801,802,....,1200] indices_to_remove_3 : [1600,1601,1602,....., 1800] P.S: The numbers are arbitrary and random; moreover, I may

Create columns from row with same ID

阅读更多关于 Create columns from row with same ID

问题 I have a df like this: Id username age 1 michael. 34 6. Mike. 65 7. Stephanie. 14 1. Mikael. 34 6. Mick. 65 As you can see, username are not writed the same for the same id. I would like to regroup all username to the same row like this: Id username username_2 Age 1 michael. mikael. 34 6. Mike. Mick. 65 7. Stephanie. 14 Thanks. 回答1: You can create MultiIndex for count duplicated Id by cumcount and then is possible reshape by unstack, last some data cleaning by add_prefix with reset_index: df1

Transform pandas groupby result with subtotals to relative values

阅读更多关于 Transform pandas groupby result with subtotals to relative values

问题 I have come accross a nice solution to insert subtotals into a pandas groupby dataframe. However, now I would like to modify the result to show relative values with respect to the subtotals, instead of the absolute values. This is the code to show the groupby: import pandas as pd import numpy as np df = pd.DataFrame( { "Category": np.random.choice(["Group A", "Group B"], 50), "Product": np.random.choice(["Product 1", "Product 2"], 50), "Units_Sold": np.random.randint(1, 100, size=(50)), "Date

Convert data on reading csv in pandas

阅读更多关于 Convert data on reading csv in pandas

问题 I'm reading a .csv file into a pandas dataframe. The .csv file contains several columns. Column 'A' contains a string '20-989-98766'. Is it possible to only read the last 5 characters '98766' from the string when loading the file? df = pd.read_csv("test_data2.csv", column={'A':read the last 5 characters}) output: A 98766 95476 ..... 回答1: You can define a func and pass this as an arg to converters param for read_csv: In [57]: import io import pandas as pd def func(x): return x[-5:] t="""column

Matplotlib - Move labels into middle of pie chart

阅读更多关于 Matplotlib - Move labels into middle of pie chart

问题 I've got my pie chart working but I noticed that the text boxes for the actual chart doesn't seem to be working correctly. They are just clustered so I was wondering is there any way for me to move the labels into the middle where the white circle is and have the matching colour beside it or not? crimeTypes = dict(crimeData["Crime type"].value_counts()) crimeType = [] totalAmount = [] numberOfCrimes = 14 for key in sorted(crimeTypes, key=crimeTypes.get, reverse=True): crimeType.append(key)

Matplotlib - Move labels into middle of pie chart

阅读更多关于 Matplotlib - Move labels into middle of pie chart

Unable to drop a column from pandas dataframe [duplicate]

阅读更多关于 Unable to drop a column from pandas dataframe [duplicate]

问题 This question already has answers here : Delete column from pandas DataFrame (17 answers) Closed 4 years ago . I have imported a Excel sheet into pandas. It has 7 columns which are numeric and 1 column which is a string (a flag). After converting the flag to a categorical variable, I am trying to drop the string column from the Pandas dataframe. However, I am not able to do it. Here's the code: [In] parts_median_temp.columns [Out] Index([u'PART_NBR', u'PRT_QTY', u'PRT_DOL', u'BTS_QTY', u'BTS

Unable to drop a column from pandas dataframe [duplicate]

阅读更多关于 Unable to drop a column from pandas dataframe [duplicate]