pandas

How to delete words from a dataframe column that are present in dictionary in Pandas

僤鯓⒐⒋嵵緔 提交于 2021-02-08 03:45:28
问题 An extension to : Removing list of words from a string I have following dataframe and I want to delete frequently occuring words from df.name column: df : name Bill Hayden Rock Clinton Bill Gates Vishal James James Cameroon Micky James Michael Clark Tony Waugh Tom Clark Tom Bill Avinash Clinton Shreyas Clinton Ramesh Clinton Adam Clark I'm creating a new dataframe with words and their frequency with following code : df = pd.DataFrame(data.name.str.split(expand=True).stack().value_counts()) df

how to count categorical values including zero occurrence?

无人久伴 提交于 2021-02-08 03:38:05
问题 I want to count number of code by month. This is my example dataframe. id month code 0 sally 0 s_A 1 sally 0 s_B 2 sally 0 s_C 3 sally 0 s_D 4 sally 0 s_E 5 sally 0 s_A 6 sally 0 s_A 7 sally 0 s_B 8 sally 0 s_C 9 sally 0 s_A I transformed to this Series using count(). df.groupby(['id', 'code', 'month']).month.count() id code month count sally s_A 0 12 1 10 2 3 7 15 But, I want to include zero occurrence, like this. id code month count sally s_A 0 12 1 10 2 3 3 0 4 0 5 0 6 0 7 15 8 0 9 0 10 0

How to filter string in multiple conditions python pandas

断了今生、忘了曾经 提交于 2021-02-08 03:32:50
问题 I have following dataframe import pandas as pd data=['5Star','FiveStar','five star','fiv estar'] data = pd.DataFrame(data,columns=["columnName"]) When I try to filter with one condition it works fine. data[data['columnName'].str.contains("5")] Output: columnName 0 5Star But It gives an error when doing with multiple conditions. How to filter it for conditions five and 5 ? Expected Output: columnName 0 5Star 2 five star 回答1: Use str.contains with a string with values separated by '|' : print

Counting cumulative occurrences of values based on date window in Pandas

隐身守侯 提交于 2021-02-08 03:32:16
问题 I have a DataFrame ( df ) that looks like the following: +----------+----+ | dd_mm_yy | id | +----------+----+ | 01-03-17 | A | | 01-03-17 | B | | 01-03-17 | C | | 01-05-17 | B | | 01-05-17 | D | | 01-07-17 | A | | 01-07-17 | D | | 01-08-17 | C | | 01-09-17 | B | | 01-09-17 | B | +----------+----+ This the end result i would like to compute: +----------+----+-----------+ | dd_mm_yy | id | cum_count | +----------+----+-----------+ | 01-03-17 | A | 1 | | 01-03-17 | B | 1 | | 01-03-17 | C | 1 |

How to filter string in multiple conditions python pandas

≯℡__Kan透↙ 提交于 2021-02-08 03:31:57
问题 I have following dataframe import pandas as pd data=['5Star','FiveStar','five star','fiv estar'] data = pd.DataFrame(data,columns=["columnName"]) When I try to filter with one condition it works fine. data[data['columnName'].str.contains("5")] Output: columnName 0 5Star But It gives an error when doing with multiple conditions. How to filter it for conditions five and 5 ? Expected Output: columnName 0 5Star 2 five star 回答1: Use str.contains with a string with values separated by '|' : print

How to replace value in specific index in each row with corresponding value in numpy array

心已入冬 提交于 2021-02-08 03:29:07
问题 My dataframe looks like this: datetime1 datetime2 datetime3 datetime4 id 1 5 6 5 5 2 7 2 3 5 3 4 2 3 2 4 6 4 4 7 5 7 3 8 9 and I have a numpy array like this: index_arr = [3, 2, 0, 1, 2] This numpy array refers to the index in each row, respectively, that I want to replace. The values I want to use in the replacement are in another numpy array: replace_arr = [14, 12, 23, 17, 15] so that the updated dataframe looks like this: datetime1 datetime2 datetime3 datetime4 id 1 5 6 5 14 2 7 2 12 5 3

How to check for Boolean condition in pandas dataframe

陌路散爱 提交于 2021-02-08 02:36:21
问题 I have Alcohol_df dataframe in which qualification is a column. I have created a list as follows: Graduate_list=['B.tech','b.tech','b-tech','Btech','BE', 'B.E', 'b.e','BACHELOR','bachelor','BSc', 'Bsc','bsc','BSC','BBM'] I did Alcohol_df['qualification'].isin(Graduate_list) to find which columns contains the elements from the list. I want to do some operation on the dataframe if its value is in the list.I did if ((Alcohol_df['qualification'].isin(Graduate_list)): But getting this error

Pandas count over groups

夙愿已清 提交于 2021-02-08 02:24:33
问题 I have a pandas dataframe that looks as follows: ID round player1 player2 1 1 A B 1 2 A C 1 3 B D 2 1 B C 2 2 C D 2 3 C E 3 1 B C 3 2 C D 3 3 C A The dataframe contains sport match results, where the ID column denotes one tournament, the round column denotes the round for each tournament, and player1 and player2 columns contain the names of players that played against eachother in the respective round . I now want to cumulatively count the tournament participations for, say, player A . In

Converting Pandas DataFrame to sparse matrix

吃可爱长大的小学妹 提交于 2021-02-08 02:15:43
问题 Here is my code: data=pd.get_dummies(data['movie_id']).groupby(data['user_id']).apply(max) df=pd.DataFrame(data) replace=df.replace(0,np.NaN) t=replace.fillna(-1) sparse=sp.csr_matrix(t.values) My data consist of two columns which are movie_id and user_id. user_id movie_id 5 1000 6 1007 I want to convert the data to a sparse matrix. I first created an interaction matrix where rows indicate user_id and columns indicate movie_id with positive interaction as +1 and negative interaction as -1.

Converting Pandas DataFrame to sparse matrix

拟墨画扇 提交于 2021-02-08 02:12:56
问题 Here is my code: data=pd.get_dummies(data['movie_id']).groupby(data['user_id']).apply(max) df=pd.DataFrame(data) replace=df.replace(0,np.NaN) t=replace.fillna(-1) sparse=sp.csr_matrix(t.values) My data consist of two columns which are movie_id and user_id. user_id movie_id 5 1000 6 1007 I want to convert the data to a sparse matrix. I first created an interaction matrix where rows indicate user_id and columns indicate movie_id with positive interaction as +1 and negative interaction as -1.