data-science

sklearn partial_fit() not showing accurate results as fit()

て烟熏妆下的殇ゞ 提交于 2019-12-02 05:49:56
I am training 3 lists of data L1, L2, L3. First i train all one them with SGDClassifier fit() and later instance by instance with partial_fit(). I I test the data with L4, L5. [The data in lists is image data and L4, L5 images are same as L2]. The predictions with fit() is correct and it is what i am expecting with partial_fit(). However the output of below code shows that both behave differently irrespective of 10,000 number of iterations for partial_fit(). Output: fit [1] #Tested L1. Predicts label as 1 correctly [2] #Tested L2. Predicts label as 2 correctly [3] #Tested L3. Predicts label as

How to transform a key/value string into distinct rows?

二次信任 提交于 2019-12-02 01:49:08
I have a R dataset with key value strings which looks like below: quest<-data.frame(city=c("Atlanta","New York","Atlanta","Tampa"), key_value=c("rev=63;qty=1;zip=45987","rev=10.60|34;qty=1|2;zip=12686|12694","rev=12;qty=1;zip=74268","rev=3|24|8;qty=1|6|3;zip=33684|36842|30254")) which translates to: city key_value 1 Atlanta rev=63;qty=1;zip=45987 2 New York rev=10.60|34;qty=1|2;zip=12686|12694 3 Atlanta rev=12;qty=1;zip=74268 4 Tampa rev=3|24|8;qty=1|6|3;zip=33684|36842|30254 Based on the above dataframe how can I create a new data frame which looks like below : city rev qty zip 1 Atlanta 63.0

rvest, How to have NA values in html_nodes for creating datatables

情到浓时终转凉″ 提交于 2019-12-01 21:11:27
So I'm trying to make a data table of some information on a website. This is what I've done so far. library(rvest) url <- 'https://uws-community.symplicity.com/index.php?s=student_group' page <- html_session(url) name_nodes <- html_nodes(page,".grpl-name a") name_text <- html_text(name_nodes) df <- data.frame(matrix(unlist(name_text)), stringsAsFactors = FALSE) library(tidyverse) df <- df %>% mutate(id = row_number()) desc_nodes <- html_nodes(page, ".grpl-purpose") desc_text <- html_text(desc_nodes) df <- left_join(df, data.frame(matrix(unlist(desc_text)), stringsAsFactors = FALSE) %>% mutate

InvalidArgumentError: Expected dimension in the range [-1, 1) but got 1

左心房为你撑大大i 提交于 2019-12-01 16:31:26
I'm not sure what this error means. This error occurs when I try to calculate acc : acc = accuracy.eval(feed_dict = {x: batch_images, y: batch_labels, keep_prob: 1.0}) I've tried looking up solutions, but I couldn't find any online. Any ideas on what's causing my error? Here's a link to my full code . I had a similar error but the problem for me was that I was trying to use argmax on a 1 dimensional vector. So the shape of my label was (50,) and I was trying to do a tf.argmax(y,1) on that when evaluating. The solution reference is Tensorflow: I get something wrong in accuracy The source code

How do I extract the date/year/month from pandas dataframe?

痞子三分冷 提交于 2019-12-01 04:44:20
问题 I'm trying to extract year/date/month info from the 'date' column in pandas dataframe. Here is my sample code: from datetime import datetime def date_split(calendar): for row in calendar: new_calendar={} listdate=datetime.strptime(row['date'],'%Y-%M-%D') I haven't finished the complete code, but when i test run this part I keep getting error like this: ----> 7 listdate=datetime.strptime(row['date'],'%Y-%M-%D') TypeError: string indices must be integers Anyone has any idea? Btw, this is the

Why did PCA reduced the performance of Logistic Regression?

佐手、 提交于 2019-11-30 23:45:10
I performed Logistic regression on a binary classification problem with data of 50000 X 370 dimensions.I got accuracy of about 90%.But when i did PCA + logistic on data, my accuracy reduced to 10%, I was very shocked to see this result. Can anybody explain what could have gone wrong? There is no guarantee that PCA will ever help, or not harm the learning process. In particular - if you use PCA to reduce amount of dimensions - you are removing information from your data, thus everything can happen - if the removed data was redundant, you will probably get better scores, if it was an important

ValueError: Must pass DataFrame with boolean values only

流过昼夜 提交于 2019-11-30 19:42:19
Question In this datafile, the United States is broken up into four regions using the "REGION" column. Create a query that finds the counties that belong to regions 1 or 2, whose name starts with 'Washington', and whose POPESTIMATE2015 was greater than their POPESTIMATE 2014. This function should return a 5x2 DataFrame with the columns = ['STNAME', 'CTYNAME'] and the same index ID as the census_df (sorted ascending by index). CODE def answer_eight(): counties=census_df[census_df['SUMLEV']==50] regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])] washingtons

How to get predicted class labels in TensorFlow's MNIST example?

六月ゝ 毕业季﹏ 提交于 2019-11-30 16:43:20
I am new to Neural Networks and went through the MNIST example for beginners. I am currently trying to use this example on another dataset from Kaggle that does not have test labels. If I run the model on the test data set without corresponding labels and therefore unable to compute the accuracy like in the MNIST example, I would like to be able to see the predictions. Is it possible to access observations and their predicted labels somehow and print them out nicely? Aske Doerge I think you just need to evaluate your output-tensor as stated in the tutorial: accuracy = tf.reduce_mean(tf.cast

How to optimize MAPE code in Python?

落爺英雄遲暮 提交于 2019-11-30 15:54:49
问题 I need to have a MAPE function, however I was not able to find it in standard packages ... Below, my implementation of this function. def mape(actual, predict): tmp, n = 0.0, 0 for i in range(0, len(actual)): if actual[i] <> 0: tmp += math.fabs(actual[i]-predict[i])/actual[i] n += 1 return (tmp/n) I don't like it, it's super not optimal in terms of speed. How to rewrite the code to be more Pythonic way and boost the speed? 回答1: Here's one vectorized approach with masking - def mape_vectorized

How to optimize MAPE code in Python?

隐身守侯 提交于 2019-11-30 15:25:19
I need to have a MAPE function, however I was not able to find it in standard packages ... Below, my implementation of this function. def mape(actual, predict): tmp, n = 0.0, 0 for i in range(0, len(actual)): if actual[i] <> 0: tmp += math.fabs(actual[i]-predict[i])/actual[i] n += 1 return (tmp/n) I don't like it, it's super not optimal in terms of speed. How to rewrite the code to be more Pythonic way and boost the speed? Here's one vectorized approach with masking - def mape_vectorized(a, b): mask = a <> 0 return (np.fabs(a[mask] - b[mask])/a[mask]).mean() Probably a faster one with masking