pandas

Removing specific word in a string in pandas

有些话、适合烂在心里 提交于 2021-02-05 07:53:06
问题 I'm trying to remove several words in each value of a column but nothing is happening. stop_words = ["and","lang","naman","the","sa","ko","na", "yan","n","yang","mo","ung","ang","ako","ng", "ndi","pag","ba","on","un","Me","at","to", "is","sia","kaya","I","s","sla","dun","po","b","pro" ] newdata['Verbatim'] = newdata['Verbatim'].replace(stop_words,'', inplace = True) I'm trying to generate a word cloud out from the result of the replacement but I am getting the same words(that doesn't mean

Join groupby column with a comma in a Pandas DataFrame

纵饮孤独 提交于 2021-02-05 07:52:05
问题 I have a dataset like this: >>> df = pd.DataFrame({'id_sin':['s123','s123','s124','s124'], 'raison':['first problem','second problem','album','dog'] }) >>> df id_sin raison 0 s123 first problem 1 s123 second problem 2 s124 album 3 s124 dog This is the expected output: id_sin raison 0 s123 first problem, second problem 1 s124 album, dog What I tried: df['raison'] = df.groupby('id_sin')['raison'].apply(lambda x: ', '.join(x)) But doesn't work... what am I missing? Thanks for help! 回答1: Try

Merge 2 Different Data Frames - Python 3.6

試著忘記壹切 提交于 2021-02-05 07:51:51
问题 Want to merge 2 table and blank should fill with first table rows. DF1: Col1 Col2 Col3 A B C DF2: Col6 Col8 1 2 3 4 5 6 7 8 9 10 I am expecting result as below: Col1 Col2 Col3 Col6 Col8 A B C 1 2 A B C 3 4 A B C 5 6 A B C 7 8 A B C 9 10 回答1: Use assign, but then is necessary change order of columns: df = df2.assign(**df1.iloc[0])[df1.columns.append(df2.columns)] print (df) Col1 Col2 Col3 Col6 Col8 0 A B C 1 2 1 A B C 3 4 2 A B C 5 6 3 A B C 7 8 4 A B C 9 10 Or concat and replace NaN s by

pandas from_json method usage

左心房为你撑大大i 提交于 2021-02-05 07:48:05
问题 I have a JSON file like below, { “A”:1, “B”:2, “C”: [ {“x”:1,“y”:2,“z”:3}, {"x":2,"y":7,"z":77} ] } pandas.from_json returns me data frame with column A,B and C. But, actually I am looking for columns with x,y and z. Is there a way to get it? 回答1: You can use json_normalize: json = { "A":1, "B":2, "C": [{"x":1,"y":2,"z":3 }, {"x":2,"y":7,"z":77}] } from pandas.io.json import json_normalize df = json_normalize(json, 'C') print (df) x y z 0 1 2 3 1 2 7 77 If need all columns: df = json

Create a frequency matrix for bigrams from a list of tuples, using numpy or pandas

穿精又带淫゛_ 提交于 2021-02-05 07:47:27
问题 I am very new to Python. I have a list of tuples, where I created bigrams. This question is pretty close to my needs my_list = [('we', 'consider'), ('what', 'to'), ('use', 'the'), ('words', 'of')] Now I am trying to convert this into a frequency matrix The desired output is consider of the to use we what words consider 0 0 0 0 0 0 0 0 of 0 0 0 0 0 0 0 0 the 0 0 0 0 0 0 0 0 to 0 0 0 0 0 0 0 0 use 0 0 1 0 0 0 0 0 we 1 0 0 0 0 0 0 0 what 0 0 0 1 0 0 0 0 words 0 1 0 0 0 0 0 0 How to do this,

Use pandas pivot_table() to convert attribute-value pairs to table

旧城冷巷雨未停 提交于 2021-02-05 07:47:26
问题 I have a set of attribute,value pairs like this: date,01-01-2018 product,eggs price, 5 date,01-10-2018 product,milk price,3 And I want to create a table like date,product,price 01-01-2018,eggs,5 01-10-2018,milk,3 I've tried adding headers 'attributes' and attribute_values', creating an arbitrary column "values" and using pd.pivot_table(av_pairs, index="value", columns=av_pairs.attributes, values=av_pairs.attribute_values) The error is pandas.core.base.DataError: No numeric types to aggregate

Selecting rows in a dataframe based on the column names of another

那年仲夏 提交于 2021-02-05 07:46:57
问题 Say I have two dfs df = pd.DataFrame({'A': [1, 2, 3,4,5], 'B': [2, 4,2,4,5], 'C': [1, -1, 3,5,10],'D': [3, -4,3,7,-3]}, columns=['A', 'B', 'C', 'D']) df = df.set_index(['A']) df2 = pd.DataFrame({'A': [1, 2, 3,4,5], 'J': ['B', 'B','C','D','C']}, columns=['A', 'J']) df2 = df2.set_index(['A']) and I would like to use df2 to select the columns of df row by row in order to get the following dataframe sel 1 2 2 4 3 3 4 7 5 10 where the first two values are from the column B of df , the third from

Strip out months from two date columns

為{幸葍}努か 提交于 2021-02-05 07:46:16
问题 I have a pandas dataframe that has contracts start and end date and a quantity. How would I strip out the individual months so they can be aggregated and graphed. ex Start Date End Date Demanded Customer 1/1/2017 3/31/2017 100 A 2/1/2017 3/31/2017 50 B strip out the months to the following Month Demand Customer 1/1/2017 100 A 2/1/2017 100 A 3/1/2017 100 A 2/1/2017 50 B 3/1/2017 50 B End result is to pivot this and then graph with months on the x-axis and total demand on the y-axis 回答1: You

panda add several new columns based on values from other columns at the same time?

不羁岁月 提交于 2021-02-05 07:45:14
问题 How to add several new columns based on values from other columns at the same time ? I only found examples to add a row one at a time. I am able to add 3 new columns but this does not seem efficient since it has to go through all the rows 3 times. Is there a way to traverse the DF once? import pandas as pd from decimal import Decimal d = [ {'A': 2, 'B': Decimal('628.00')}, {'A': 1, 'B': Decimal('383.00')}, {'A': 3, 'B': Decimal('651.00')}, {'A': 2, 'B': Decimal('575.00')}, {'A': 4, 'B':

How to compare two data frames of different size based on a column?

谁说胖子不能爱 提交于 2021-02-05 07:45:10
问题 I have two data frames with different size df1 YearDeci Year Month Day ... Magnitude Lat Lon 0 1551.997260 1551 12 31 ... 7.5 34.00 74.50 1 1661.997260 1661 12 31 ... 7.5 34.00 75.00 2 1720.535519 1720 7 15 ... 6.5 28.37 77.09 3 1734.997260 1734 12 31 ... 7.5 34.00 75.00 4 1777.997260 1777 12 31 ... 7.7 34.00 75.00 and df2 YearDeci Year Month Day Hour ... Seconds Mb Lat Lon 0 1669.510753 1669 6 4 0 ... 0 NaN 33.400 73.200 1 1720.535519 1720 7 15 0 ... 0 NaN 28.700 77.200 2 1780.000000 1780 0