data-science

SVC classifier taking too much time for training

你说的曾经没有我的故事 提交于 2020-01-24 01:05:03
问题 I am using SVC classifier with Linear kernel to train my model. Train data: 42000 records model = SVC(probability=True) model.fit(self.features_train, self.labels_train) y_pred = model.predict(self.features_test) train_accuracy = model.score(self.features_train,self.labels_train) test_accuracy = model.score(self.features_test, self.labels_test) It takes more than 2 hours to train my model. Am I doing something wrong? Also, what can be done to improve the time Thanks in advance 回答1: There are

Split Column into Unknown Number of Columns by Delimiter Pandas

℡╲_俬逩灬. 提交于 2020-01-23 03:16:04
问题 I am trying to split a column into multiple columns based off comma/space seperation. my dataframe currently looks like Item Colors 0 ID-1 Red, Blue, Green 1 ID-2 Red, Blue 2 ID-3 Blue, Green 3 ID-4 Blue 4 ID-5 Red I would like to transform the 'Colors' column into Red, Blue and Green like this: Item Red Blue Green 0 ID-1 1 1 1 1 ID-2 1 1 0 2 ID-3 0 1 1 3 ID-4 0 1 0 4 ID-5 1 0 1 I really have no idea how to do this. Any help would be greatly appreciated. 回答1: You can using get_dummies pd

Neural network is not giving the expected output after training in Python

人盡茶涼 提交于 2020-01-22 20:39:31
问题 My neural network is not giving the expected output after training in Python. Is there any error in the code? Is there any way to reduce the mean squared error (MSE)? I tried to train (Run the program) the network repeatedly but it is not learning, instead it is giving the same MSE and output. Here is the Data I used: https://drive.google.com/open?id=1GLm87-5E_6YhUIPZ_CtQLV9F9wcGaTj2 Here is my code: #load and evaluate a saved model from numpy import loadtxt from tensorflow.keras.models

How can I merge merge two dictionries while performing addition operation on same on its values, if the keys match?

一曲冷凌霜 提交于 2020-01-16 19:31:28
问题 I have data that looks like this: current Now, I wrote a code that returns a dictionary like this: history I have other dictionary that looks like almost the same with more nesting, like this: latest Now, If I have these two dictionaries, I want to merge them such that if: dict1 = {201: {'U': {'INR': 10203, 'SGD': 10203, 'USD': 10203, 'YEN': 10203}, 'V': {'INR': 10203, 'SGD': 10203, 'USD': 10203, 'YEN': 10203}} and dict2= {201: {'X': {'GBP': 10203, 'SGD': 10203, 'USD': 10203, 'YEN': 10203},

Date Difference based on matching values in two columns - Pandas

做~自己de王妃 提交于 2020-01-16 02:04:12
问题 I have a dataframe, I am struggling to create a column based out of other columns, I will share the problem for a sample data. Date Target1 Close 0 2018-05-25 198.0090 188.580002 1 2018-05-25 197.6835 188.580002 2 2018-05-25 198.0090 188.580002 3 2018-05-29 196.6230 187.899994 4 2018-05-29 196.9800 187.899994 5 2018-05-30 197.1375 187.500000 6 2018-05-30 196.6965 187.500000 7 2018-05-30 196.8750 187.500000 8 2018-05-31 196.2135 186.869995 9 2018-05-31 196.2135 186.869995 10 2018-05-31 196

Randomly reassign participants to groups such that participants originally from same group don't end up in same group

若如初见. 提交于 2020-01-15 10:23:01
问题 I'm basically trying to do this Monte Carlo kind of analysis where I randomly reassign the participants in my experiment to new groups, and then reanalyze the data given the random new groups. So here's what I want to do: Participants are originally grouped into eight groups of four participants each. I want to randomly reassign each participant to a new group, but I don't want any participants to end up in a new group with another participant from their same original group . Here is how far

Randomly reassign participants to groups such that participants originally from same group don't end up in same group

巧了我就是萌 提交于 2020-01-15 10:21:14
问题 I'm basically trying to do this Monte Carlo kind of analysis where I randomly reassign the participants in my experiment to new groups, and then reanalyze the data given the random new groups. So here's what I want to do: Participants are originally grouped into eight groups of four participants each. I want to randomly reassign each participant to a new group, but I don't want any participants to end up in a new group with another participant from their same original group . Here is how far

Plotting the count of occurrences per date

为君一笑 提交于 2020-01-14 06:36:28
问题 I'm very new to pandas data frame that has a date time column, and a column that contains a string of text (headlines). Each headline will be a new row. I need to plot the date on the x-axis, and the y-axis needs to contain how many times a headline occurs on each date. So for example, one date may contain 3 headlines. What's the simplest way to do this? I can't figure out how to do it at all. Maybe add another column with a '1' for each row? If so, how would you do this? Please point me in

Trip Advisor Scraping 'moreLink'

不问归期 提交于 2020-01-14 04:06:05
问题 I've been building a web scraper in BS4 and have gotten stuck. I am using Trip Advisor as a test for other data I will be going after, but am not able to isolate the tag of the 'entire' reviews. Here is an example: https://www.tripadvisor.com/Restaurant_Review-g56010-d470148-Reviews-Chez_Nous-Humble_Texas.html Notice in the first review, there is an icon below "the wine list is...". I am able to easily isolate the partial reviews, but have not been able to figure out a way to get BS4 to pull

R : knnImputation Giving Error

回眸只為那壹抹淺笑 提交于 2020-01-13 05:18:13
问题 Getting below error in R coding. in my Brand_X.xlsx dataset, there are few NA values which I am trying to compute using KNN imputation but I am getting below error. whats wrong here? Thanks! > library(readxl) > Brand_X <- read_excel("Brand_X.xlsx") > str(Brand_X) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 101 obs. of 8 variables: $ Rel_price_lag5: num 108 111 105 103 109 104 110 114 103 108 ... $ Rel_price_lag1: num 110 109 217 241 855 271 234 297 271 999 ... $ Rel_Price : num 122 110 109 217