pandas

Transpose dataframe based on column list

时光总嘲笑我的痴心妄想 提交于 2021-02-16 10:06:15
问题 I have a dataframe in the following structure: cNames | cValues | number [a,b,c] | [1,2,3] | 10 [a,b,d] | [55,66,77]| 20 I would like to transpose - create columns from the names in cNames . But I can't manage to achieve this with transpose because I want a column for each value in the list. The needed output: a | b | c | d | number 1 | 2 | 3 | NaN | 10 55 | 66 | NaN | 77 | 20 How can I achieve this result? Thanks! The code to create the DF: d = {'cNames': [['a','b','c'], ['a','b','d']],

Plotly stacked bar chart pandas dataframe

可紊 提交于 2021-02-16 09:26:32
问题 I have a dataframe with a varying number of columns. With positive and negative values. Is it possible to make a stacked bar chart with all of these columns in the dataframe? A B C 0 34 34 -12 1 34 223 -12 2 34 56 -12 3 34 86 -12 4 56 86 -12 5 56 43 -12 6 78 34 -12 回答1: (Updated answer for newer versions of plotly) Using px.bar will give you a stacked bar chart directly. If your bar chart for some reason is not stacked, try: fig.update_layout(barmode='stack') Complete code: import pandas as

How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?

独自空忆成欢 提交于 2021-02-16 09:14:31
问题 I want to prevent certain phrases for creeping into my models. For example, I want to prevent 'red roses' from entering into my analysis. I understand how to add individual stop words as given in Adding words to scikit-learn's CountVectorizer's stop list by doing so: from sklearn.feature_extraction import text additional_stop_words=['red','roses'] However, this also results in other ngrams like 'red tulips' or 'blue roses' not being detected. I am building a TfidfVectorizer as part of my

How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?

亡梦爱人 提交于 2021-02-16 09:14:22
问题 I want to prevent certain phrases for creeping into my models. For example, I want to prevent 'red roses' from entering into my analysis. I understand how to add individual stop words as given in Adding words to scikit-learn's CountVectorizer's stop list by doing so: from sklearn.feature_extraction import text additional_stop_words=['red','roses'] However, this also results in other ngrams like 'red tulips' or 'blue roses' not being detected. I am building a TfidfVectorizer as part of my

How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?

风格不统一 提交于 2021-02-16 09:14:06
问题 I want to prevent certain phrases for creeping into my models. For example, I want to prevent 'red roses' from entering into my analysis. I understand how to add individual stop words as given in Adding words to scikit-learn's CountVectorizer's stop list by doing so: from sklearn.feature_extraction import text additional_stop_words=['red','roses'] However, this also results in other ngrams like 'red tulips' or 'blue roses' not being detected. I am building a TfidfVectorizer as part of my

PyTorch: Dataloader for time series task

馋奶兔 提交于 2021-02-16 08:35:42
问题 I have a Pandas dataframe with n rows and k columns loaded into memory. I would like to get batches for a forecasting task where the first training example of a batch should have shape (q, k) with q referring to the number of rows from the original dataframe (e.g. 0:128). The next example should be (128:256, k) and so on. So, ultimately, one batch should have the shape (32, q, k) with 32 corresponding to the batch size. Since TensorDataset from data_utils does not work here, I am wondering

Transpose the data in a column every nth rows in PANDAS

心不动则不痛 提交于 2021-02-16 07:52:31
问题 For a research project, I need to process every individual's information from the website into an excel file. I have copied and pasted everything I need from the website onto a single column in an excel file, and I loaded that file using PANDAS. However, I need to present each individual's information horizontally instead of vertically like it is now. For example, this is what I have right now. I only have one column of unorganized data. df= pd.read_csv("ior work.csv", encoding = "ISO-8859-1"

Transpose the data in a column every nth rows in PANDAS

半城伤御伤魂 提交于 2021-02-16 07:49:21
问题 For a research project, I need to process every individual's information from the website into an excel file. I have copied and pasted everything I need from the website onto a single column in an excel file, and I loaded that file using PANDAS. However, I need to present each individual's information horizontally instead of vertically like it is now. For example, this is what I have right now. I only have one column of unorganized data. df= pd.read_csv("ior work.csv", encoding = "ISO-8859-1"

Pandas: extract hour from timedelta

有些话、适合烂在心里 提交于 2021-02-16 06:25:59
问题 This answer explains how to convert integers to hourly timesteps in Pandas. I need to do the opposite. My dataframe df1 : A 0 02:00:00 1 01:00:00 2 02:00:00 3 03:00:00 My expected dataframe df1 : A B 0 02:00:00 2 1 01:00:00 1 2 02:00:00 2 3 03:00:00 3 What I am trying: df1['B'] = df1['A'].astype(int) This fails because: TypeError: cannot astype a timedelta from [timedelta64[ns]] to [int32] What is the best way to do this? EDIT If I try df['B'] = df['A'].dt.hour , then I get: AttributeError:

Python: Creating bar plot from pivot table pandas data frame

心不动则不痛 提交于 2021-02-16 05:31:09
问题 I'm new to python and was wondering how to create a barplot on this data I created using pivot table function. #Create a pivot table for handicaps count calculation for no-show people based on their gender pv = pd.pivot_table(df_main, values=['hipertension','diabetes','alcoholism'], columns='status',index='gender',aggfunc=np.sum) #Reshape the pivot table for easier calculation data_pv = pv.unstack().unstack('status').reset_index().rename(columns={'level_0':'category','No-Show':'no_show',