data-analysis

Data column values are not changing to float

牧云@^-^@ 提交于 2019-12-10 23:13:58
问题 I have a dataframe, df, Name Stage Description 0 sri 1 sri is one of the good singer in this two 1 nan 2 thanks for reading 2 ram 1 ram is two of the good cricket player 3 ganesh 1 one driver 4 nan 2 good buddies tried df["Stage"]=pd.to_numeric(df["Stage"],downcast="float") but still the values are same 回答1: You can use df.Stage.astype(float) : In [6]: df.Stage.astype(float) Out[6]: 0 1.0 1 2.0 2 1.0 3 1.0 4 2.0 Name: Stage, dtype: float64 In [7]: df.Stage.astype(float) Using pd.to_numeric is

Filling a column in a dataframe based on a column in another dataframe in r

混江龙づ霸主 提交于 2019-12-10 19:00:32
问题 I have a dataframe of comments which looks like this(df1) Comments Apple laptops are really good for work,we should buy them Apple Iphones are too costly,we can resort to some other brands Google search is the best search engine Android phones are great these days I lost my visa card today I have another dataframe of merchent names which looks like this(df2): Merchant_Name Google Android Geoni Visa Apple MC WallMart If a merchant_name in df2 appears in a Comment in df 1 ,append that merchant

Retrieving matching word count on a datacolumn using pandas in python

浪尽此生 提交于 2019-12-10 18:47:11
问题 I have a df, Name Description Ram Ram is one of the good cricketer Sri Sri is one of the member Kumar Kumar is a keeper and a list, my_list=["one","good","ravi","ball"] I am trying to get the rows which are having atleast one keyword from my_list. I tried, mask=df["Description"].str.contains("|".join(my_list),na=False) I am getting the output_df, Name Description Ram Ram is one of ONe crickete Sri Sri is one of the member Ravi Ravi is a player, ravi is playing Kumar there is a BALL I also

Alternative to scipy.cluster.hierarchy.cut_tree()

六月ゝ 毕业季﹏ 提交于 2019-12-10 17:12:12
问题 I was doing an agglomerative hierarchical clustering experiment in Python 3 and I found scipy.cluster.hierarchy.cut_tree() is not returning the requested number of clusters for some input linkage matrices. So, by now I know there is a bug in the cut_tree() function (as described here). However, I need to be able to get a flat clustering with an assignment of k different labels to my datapoints. Do you know the algorithm to get a flat clustering with k labels from an arbitrary input linkage

SWAB segmentation algorithm on time series data

断了今生、忘了曾经 提交于 2019-12-10 14:38:10
问题 I'm trying to understand how to do segmentation on a set of time series data (daily stock prices, temperatures etc.) and came across a book that explains how to do the SWAB (sliding-window and bottom-up) segmentation algorithm, but I don't quite understand it. This segmentation is part of a sonification algorithm. The following text is from "Multimedia Data Mining and Analytics: Disruptive Innovation". The SWAB segmentation algorithm gets four parameters—the input file (time series data), the

Remove past Matplotlib plots in the same cell in Jupyter Notebook involving interactive widgets

*爱你&永不变心* 提交于 2019-12-10 13:52:55
问题 this is just a small problem that has been bugging me for a while. I have a pandas dataframe consisting of all continuous variables. I want to draw a scatter plot (using matplotlib) for any chosen pair of variables, making use of the interactive widgets in Jupyter as well. Let's say the data has 3 numeric columns: 'a','b', and 'c'. So far I have these lines of codes: def g(x,y): plt.scatter(x, y) interactive_plot = interactive(g, x=['a','b','c'], y=['a','b','c']) interactive_plot And they

Pandas - Groupby and create new DataFrame?

こ雲淡風輕ζ 提交于 2019-12-10 13:33:48
问题 This is my situation - In[1]: data Out[1]: Item Type 0 Orange Edible, Fruit 1 Banana Edible, Fruit 2 Tomato Edible, Vegetable 3 Laptop Non Edible, Electronic In[2]: type(data) Out[2]: pandas.core.frame.DataFrame What I want to do is create a data frame of only Fruits , so I need to groupby such a way that Fruit exists in Type . I've tried doing this: grouped = data.groupby(lambda x: "Fruit" in x, axis=1) I don't know if that's the way of doing it, I'm having a little tough time understanding

Tensorflow gradient and hessian evaluation

99封情书 提交于 2019-12-10 10:55:27
问题 I find a problem in the evaluation of tensorflow r1.2 gradients and hessian function. In particular I give for granted that the evaluation of a gradient is numerically done at the point of values of the defined variables, probing the response of the placeholder function. However now I am trying with to evaluate the hessian function (thus gradients) before and after the training of the model, and I always get the same results (probably according to the feeding placeholders). I use the

Removing duplicates with ignoring case sensitive and adding the next column values with the first one in pandas dataframe in python

点点圈 提交于 2019-12-10 10:44:29
问题 I have a df, Name Count Ram 1 ram 2 raM 1 Arjun 3 arjun 4 My desired output df, Name Count Ram 4 Arjun 7 I tried groupby but I cannot achieve the desired output, please help 回答1: Use agg by values of Name s converted to lower - first and sum : df = (df.groupby(df['Name'].str.lower(), as_index=False, sort=False) .agg({'Name':'first', 'Count':'sum'})) print (df) Name Count 0 Ram 4 1 Arjun 7 Detail: print (df['Name'].str.lower()) 0 ram 1 ram 2 ram 3 arjun 4 arjun Name: Name, dtype: object 回答2:

Group by two columns and count the occurrences of each combination in pandas

我的梦境 提交于 2019-12-09 16:28:43
问题 I have the following data frame: data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']}) product_id user_id p1 a1 p1 a1 p2 a1 p1 a2 p1 a2 p1 a2 p2 a3 p2 a3 p3 a3 in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame output should be