data-analysis | 易学教程

Data column values are not changing to float

阅读更多关于 Data column values are not changing to float

问题 I have a dataframe, df, Name Stage Description 0 sri 1 sri is one of the good singer in this two 1 nan 2 thanks for reading 2 ram 1 ram is two of the good cricket player 3 ganesh 1 one driver 4 nan 2 good buddies tried df["Stage"]=pd.to_numeric(df["Stage"],downcast="float") but still the values are same 回答1: You can use df.Stage.astype(float) : In [6]: df.Stage.astype(float) Out[6]: 0 1.0 1 2.0 2 1.0 3 1.0 4 2.0 Name: Stage, dtype: float64 In [7]: df.Stage.astype(float) Using pd.to_numeric is

Filling a column in a dataframe based on a column in another dataframe in r

阅读更多关于 Filling a column in a dataframe based on a column in another dataframe in r

问题 I have a dataframe of comments which looks like this(df1) Comments Apple laptops are really good for work,we should buy them Apple Iphones are too costly,we can resort to some other brands Google search is the best search engine Android phones are great these days I lost my visa card today I have another dataframe of merchent names which looks like this(df2): Merchant_Name Google Android Geoni Visa Apple MC WallMart If a merchant_name in df2 appears in a Comment in df 1 ,append that merchant

Retrieving matching word count on a datacolumn using pandas in python

阅读更多关于 Retrieving matching word count on a datacolumn using pandas in python

问题 I have a df, Name Description Ram Ram is one of the good cricketer Sri Sri is one of the member Kumar Kumar is a keeper and a list, my_list=["one","good","ravi","ball"] I am trying to get the rows which are having atleast one keyword from my_list. I tried, mask=df["Description"].str.contains("|".join(my_list),na=False) I am getting the output_df, Name Description Ram Ram is one of ONe crickete Sri Sri is one of the member Ravi Ravi is a player, ravi is playing Kumar there is a BALL I also

Alternative to scipy.cluster.hierarchy.cut_tree()

阅读更多关于 Alternative to scipy.cluster.hierarchy.cut_tree()

问题 I was doing an agglomerative hierarchical clustering experiment in Python 3 and I found scipy.cluster.hierarchy.cut_tree() is not returning the requested number of clusters for some input linkage matrices. So, by now I know there is a bug in the cut_tree() function (as described here). However, I need to be able to get a flat clustering with an assignment of k different labels to my datapoints. Do you know the algorithm to get a flat clustering with k labels from an arbitrary input linkage

SWAB segmentation algorithm on time series data

阅读更多关于 SWAB segmentation algorithm on time series data

问题 I'm trying to understand how to do segmentation on a set of time series data (daily stock prices, temperatures etc.) and came across a book that explains how to do the SWAB (sliding-window and bottom-up) segmentation algorithm, but I don't quite understand it. This segmentation is part of a sonification algorithm. The following text is from "Multimedia Data Mining and Analytics: Disruptive Innovation". The SWAB segmentation algorithm gets four parameters—the input file (time series data), the

Remove past Matplotlib plots in the same cell in Jupyter Notebook involving interactive widgets

阅读更多关于 Remove past Matplotlib plots in the same cell in Jupyter Notebook involving interactive widgets

问题 this is just a small problem that has been bugging me for a while. I have a pandas dataframe consisting of all continuous variables. I want to draw a scatter plot (using matplotlib) for any chosen pair of variables, making use of the interactive widgets in Jupyter as well. Let's say the data has 3 numeric columns: 'a','b', and 'c'. So far I have these lines of codes: def g(x,y): plt.scatter(x, y) interactive_plot = interactive(g, x=['a','b','c'], y=['a','b','c']) interactive_plot And they

Pandas - Groupby and create new DataFrame?

阅读更多关于 Pandas - Groupby and create new DataFrame?

问题 This is my situation - In[1]: data Out[1]: Item Type 0 Orange Edible, Fruit 1 Banana Edible, Fruit 2 Tomato Edible, Vegetable 3 Laptop Non Edible, Electronic In[2]: type(data) Out[2]: pandas.core.frame.DataFrame What I want to do is create a data frame of only Fruits , so I need to groupby such a way that Fruit exists in Type . I've tried doing this: grouped = data.groupby(lambda x: "Fruit" in x, axis=1) I don't know if that's the way of doing it, I'm having a little tough time understanding

Tensorflow gradient and hessian evaluation

阅读更多关于 Tensorflow gradient and hessian evaluation

问题 I find a problem in the evaluation of tensorflow r1.2 gradients and hessian function. In particular I give for granted that the evaluation of a gradient is numerically done at the point of values of the defined variables, probing the response of the placeholder function. However now I am trying with to evaluate the hessian function (thus gradients) before and after the training of the model, and I always get the same results (probably according to the feeding placeholders). I use the

Removing duplicates with ignoring case sensitive and adding the next column values with the first one in pandas dataframe in python

阅读更多关于 Removing duplicates with ignoring case sensitive and adding the next column values with the first one in pandas dataframe in python

问题 I have a df, Name Count Ram 1 ram 2 raM 1 Arjun 3 arjun 4 My desired output df, Name Count Ram 4 Arjun 7 I tried groupby but I cannot achieve the desired output, please help 回答1: Use agg by values of Name s converted to lower - first and sum : df = (df.groupby(df['Name'].str.lower(), as_index=False, sort=False) .agg({'Name':'first', 'Count':'sum'})) print (df) Name Count 0 Ram 4 1 Arjun 7 Detail: print (df['Name'].str.lower()) 0 ram 1 ram 2 ram 3 arjun 4 arjun Name: Name, dtype: object 回答2:

Group by two columns and count the occurrences of each combination in pandas

阅读更多关于 Group by two columns and count the occurrences of each combination in pandas

问题 I have the following data frame: data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']}) product_id user_id p1 a1 p1 a1 p2 a1 p1 a2 p1 a2 p1 a2 p2 a3 p2 a3 p3 a3 in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame output should be