data-science

Time difference within group by objects in Python Pandas

半城伤御伤魂 提交于 2020-01-12 12:14:38
问题 I have a dataframe that looks like this: from to datetime other ------------------------------------------------- 11 1 2016-11-06 22:00:00 - 11 1 2016-11-06 20:00:00 - 11 1 2016-11-06 15:45:00 - 11 12 2016-11-06 15:00:00 - 11 1 2016-11-06 12:00:00 - 11 18 2016-11-05 10:00:00 - 11 12 2016-11-05 10:00:00 - 12 1 2016-10-05 10:00:59 - 12 3 2016-09-06 10:00:34 - I want to groupby "from" and then "to" columns and then sort the "datetime" in descending order and then finally want to calculate the

filter dataframe in pandas by a date column

爷,独闯天下 提交于 2020-01-06 08:14:27
问题 The data is in the following link : http://www.fdic.gov/bank/individual/failed/banklist.html I want only the banks which closed in 2017. How can I do it in Pandas ? failed_banks= pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html') failed_banks[0] What should I do after these lines of code to extract the desired result? 回答1: Ideally you would use # assuming pandas successfully parsed this column as datetime object # and pandas version >= 0.16 failed_banks= pd.read_html(

R: Something is wrong; all the Accuracy metric values are missing

徘徊边缘 提交于 2020-01-05 05:39:05
问题 When running rpart , I am getting an error message saying : Something is wrong; all the Accuracy metric values are missing: The dataset can be found here and has no NAs, can someone help? > rf.5.cv.1 # Random Forest # 891 samples # 6 predictor # 2 classes: '0', '1' # No pre-processing # Resampling: Cross-Validated (10 fold, repeated 10 times) # Summary of sample sizes: 802, 802, 803, 801, 801, 802, ... # Resampling results across tuning parameters: # mtry Accuracy Kappa # 2 0.8383655 0

Python Pandas: how to convert a list of pair mappings to a row-vector format?

纵饮孤独 提交于 2020-01-04 10:04:13
问题 I have a 2-column DataFrame, column-1 corresponds to customer, column-2 corresponds to the city this customer has visited. The DataFrame looks like the following: print(df) customer visited_city 0 John London 1 Mary Melbourne 2 Steve Paris 3 John New_York 4 Peter New_York 5 Mary London 6 John Melbourne 7 John New_York I would like to convert the above DataFrame into a row-vector format , such that each row represents a unique user with the row vector indicating the cities visited. print(wide

Python Pandas: how to convert a list of pair mappings to a row-vector format?

霸气de小男生 提交于 2020-01-04 10:02:06
问题 I have a 2-column DataFrame, column-1 corresponds to customer, column-2 corresponds to the city this customer has visited. The DataFrame looks like the following: print(df) customer visited_city 0 John London 1 Mary Melbourne 2 Steve Paris 3 John New_York 4 Peter New_York 5 Mary London 6 John Melbourne 7 John New_York I would like to convert the above DataFrame into a row-vector format , such that each row represents a unique user with the row vector indicating the cities visited. print(wide

Plot scikit-learn (sklearn) SVM decision boundary / surface

元气小坏坏 提交于 2020-01-04 08:06:07
问题 I am currently performing multi class SVM with linear kernel using python's scikit library. The sample training data and testing data are as given below: Model data: x = [[20,32,45,33,32,44,0],[23,32,45,12,32,66,11],[16,32,45,12,32,44,23],[120,2,55,62,82,14,81],[30,222,115,12,42,64,91],[220,12,55,222,82,14,181],[30,222,315,12,222,64,111]] y = [0,0,0,1,1,2,2] I want to plot the decision boundary and visualize the datasets. Can someone please help to plot this type of data. The data given above

How do I calculate a grouped z score in R using dplyr?

佐手、 提交于 2020-01-03 16:36:21
问题 Using the iris dataset I'm trying to calculate a z score for each of the variables. I have the data in tidy format, by performing the following: library(reshape2) library(dplyr) test <- iris test <- melt(iris,id.vars = 'Species') That gives me the following: Species variable value 1 setosa Sepal.Length 5.1 2 setosa Sepal.Length 4.9 3 setosa Sepal.Length 4.7 4 setosa Sepal.Length 4.6 5 setosa Sepal.Length 5.0 6 setosa Sepal.Length 5.4 But when I try to create a z-score column for each group (e

R - Employee Reporting Structure

|▌冷眼眸甩不掉的悲伤 提交于 2020-01-03 03:48:10
问题 Background: I am using R along with some packages to pull JSON data from a ticketing system. I'm pulling all the users and want to build a reporting structure. I have a data set that contains employees and their managers. The columns are named as such ("Employee" and "Manager"). I am trying to build a tree of a reporting structure that goes up to the root. We are in an IT organization, but I am pulling all employee data, so this would look something like: Company -> Business Unit -> Executive

Converting a pandas crosstab into a stacked dataframe (a regular table)

半世苍凉 提交于 2020-01-02 09:41:27
问题 Given a pandas crosstab, how do you convert that into a stacked dataframe? Assume you have a stacked dataframe. First we convert it into a crosstab. Now I would like to revert back to the original stacked dataframe. I searched a problem statement that addresses this requirement, but could not find any that hits bang on. In case I have missed any, please leave a note to it in the comment section. I would like to document the best practice here. So, thank you for your support. I know that

Why doesn't ml_create_dummy_variables show new dummy variable columns in sparklyr

落爺英雄遲暮 提交于 2020-01-02 08:43:07
问题 I'm trying to create a model matrix in sparklyr. There is a function ml_create_dummy_variables() for creating dummy variables for one categorical variable at a time. As far as I can tell there is no model.matrix() equivalent for creating a model matrix in one step. It's easy to use ml_create_dummy_variables() but I don't understand why the new dummy variables aren't stored in the Spark dataframe. Consider this example: ###create dummy data to figure out how model matrix formulas work in