pandas

Pandas string extract all the matches

旧城冷巷雨未停 提交于 2021-02-16 14:39:25
问题 I am learning regex operation in pandas series string method. I was able to extract the first number from the string, but my regex is not matching the second number. How to capture both the numbers? Note that second row, the second element is NAN here. CODE: import pandas as pd df = pd.DataFrame({'a': ["number 1.23 has 1.2 ", "number 12.2 has 12 "]}) pat = r""".+\s+ (\d+\.\d+) .+ ((?:\d+\.\d+)?) .+""" df['a'].str.extract(pat,flags=re.X,expand=True) Gives: 0 1 1.23 12.2 Expected: 0 1 1.23 1.2

How to compare two dataframes ignoring column names?

孤街醉人 提交于 2021-02-16 14:35:08
问题 Suppose I want to compare the content of two dataframes, but not the column names (or index names). Is it possible to achieve this without renaming the columns? For example: df = pd.DataFrame({'A': [1,2], 'B':[3,4]}) df_equal = pd.DataFrame({'a': [1,2], 'b':[3,4]}) df_diff = pd.DataFrame({'A': [1,2], 'B':[3,5]}) In this case, df is df_equal but different to df_diff , because the values in df_equal has the same content, but the ones in df_diff . Notice that the column names in df_equal are

How to split consecutive elements in a list into sublists

无人久伴 提交于 2021-02-16 14:28:26
问题 I have the following list: indices_to_remove: [0,1,2,3,..,600,800,801,802,....,1200,1600,1601,1602,...,1800] I have basically 3 subsets of consecutive indices: 0-600 800-1200 1600-1800 I would like to create 3 different small lists that will include only consecutive numbers. Expected outcome: indices_to_remove_1 : [0,1,2,3,....,600] indices_to_remove_2 : [800,801,802,....,1200] indices_to_remove_3 : [1600,1601,1602,....., 1800] P.S: The numbers are arbitrary and random; moreover, I may

Create columns from row with same ID

给你一囗甜甜゛ 提交于 2021-02-16 14:25:06
问题 I have a df like this: Id username age 1 michael. 34 6. Mike. 65 7. Stephanie. 14 1. Mikael. 34 6. Mick. 65 As you can see, username are not writed the same for the same id. I would like to regroup all username to the same row like this: Id username username_2 Age 1 michael. mikael. 34 6. Mike. Mick. 65 7. Stephanie. 14 Thanks. 回答1: You can create MultiIndex for count duplicated Id by cumcount and then is possible reshape by unstack, last some data cleaning by add_prefix with reset_index: df1

Transform pandas groupby result with subtotals to relative values

懵懂的女人 提交于 2021-02-16 13:54:11
问题 I have come accross a nice solution to insert subtotals into a pandas groupby dataframe. However, now I would like to modify the result to show relative values with respect to the subtotals, instead of the absolute values. This is the code to show the groupby: import pandas as pd import numpy as np df = pd.DataFrame( { "Category": np.random.choice(["Group A", "Group B"], 50), "Product": np.random.choice(["Product 1", "Product 2"], 50), "Units_Sold": np.random.randint(1, 100, size=(50)), "Date

Convert data on reading csv in pandas

那年仲夏 提交于 2021-02-16 13:17:07
问题 I'm reading a .csv file into a pandas dataframe. The .csv file contains several columns. Column 'A' contains a string '20-989-98766'. Is it possible to only read the last 5 characters '98766' from the string when loading the file? df = pd.read_csv("test_data2.csv", column={'A':read the last 5 characters}) output: A 98766 95476 ..... 回答1: You can define a func and pass this as an arg to converters param for read_csv: In [57]: import io import pandas as pd def func(x): return x[-5:] t="""column

Matplotlib - Move labels into middle of pie chart

一个人想着一个人 提交于 2021-02-16 11:21:15
问题 I've got my pie chart working but I noticed that the text boxes for the actual chart doesn't seem to be working correctly. They are just clustered so I was wondering is there any way for me to move the labels into the middle where the white circle is and have the matching colour beside it or not? crimeTypes = dict(crimeData["Crime type"].value_counts()) crimeType = [] totalAmount = [] numberOfCrimes = 14 for key in sorted(crimeTypes, key=crimeTypes.get, reverse=True): crimeType.append(key)

Matplotlib - Move labels into middle of pie chart

隐身守侯 提交于 2021-02-16 11:20:17
问题 I've got my pie chart working but I noticed that the text boxes for the actual chart doesn't seem to be working correctly. They are just clustered so I was wondering is there any way for me to move the labels into the middle where the white circle is and have the matching colour beside it or not? crimeTypes = dict(crimeData["Crime type"].value_counts()) crimeType = [] totalAmount = [] numberOfCrimes = 14 for key in sorted(crimeTypes, key=crimeTypes.get, reverse=True): crimeType.append(key)

Unable to drop a column from pandas dataframe [duplicate]

人走茶凉 提交于 2021-02-16 10:25:30
问题 This question already has answers here : Delete column from pandas DataFrame (17 answers) Closed 4 years ago . I have imported a Excel sheet into pandas. It has 7 columns which are numeric and 1 column which is a string (a flag). After converting the flag to a categorical variable, I am trying to drop the string column from the Pandas dataframe. However, I am not able to do it. Here's the code: [In] parts_median_temp.columns [Out] Index([u'PART_NBR', u'PRT_QTY', u'PRT_DOL', u'BTS_QTY', u'BTS

Unable to drop a column from pandas dataframe [duplicate]

ぐ巨炮叔叔 提交于 2021-02-16 10:24:12
问题 This question already has answers here : Delete column from pandas DataFrame (17 answers) Closed 4 years ago . I have imported a Excel sheet into pandas. It has 7 columns which are numeric and 1 column which is a string (a flag). After converting the flag to a categorical variable, I am trying to drop the string column from the Pandas dataframe. However, I am not able to do it. Here's the code: [In] parts_median_temp.columns [Out] Index([u'PART_NBR', u'PRT_QTY', u'PRT_DOL', u'BTS_QTY', u'BTS