scikits

Forecasting using Pandas OLS

我与影子孤独终老i 提交于 2019-12-09 00:19:19
问题 I have been using the scikits.statsmodels OLS predict function to forecast fitted data but would now like to shift to using Pandas. The documentation refers to OLS as well as to a function called y_predict but I can't find any documentation on how to use it correctly. By way of example: exogenous = { "1998": "4760","1999": "5904","2000": "4504","2001": "9808","2002": "4241","2003": "4086","2004": "4687","2005": "7686","2006": "3740","2007": "3075","2008": "3753","2009": "4679","2010": "5468",

Scikit - 3D feature array for SVM

人盡茶涼 提交于 2019-12-08 20:55:30
I am trying to train an SVM in scikit. I am following the example and tried to adjust it to my 3d feature vectors. I tried the example from the page http://scikit-learn.org/stable/modules/svm.html and it ran through. While bugfixing I came back to the tutorial setup and found this: X = [[0, 0], [1, 1],[2,2]] y = [0, 1,1] clf = svm.SVC() clf.fit(X, y) works while X = [[0, 0,0], [1, 1,1],[2,2,2]] y = [0, 1,1] clf = svm.SVC() clf.fit(X, y) fails with: ValueError: X.shape[1] = 2 should be equal to 3, the number of features at training time what is wrong here? It's only one additional dimension...

Cannot get scikit-learn installed on OS X

五迷三道 提交于 2019-12-08 17:39:57
问题 I would like to use scikit-learn on an upcoming project and I absolutely cannot install it. I can install other packages either by building them from source or through pip without a problem. For scikit-learn, I've tried cloning the project on GitHub and installing via pip without success. Can anyone please help? Here is part of my pip.log : Downloading/unpacking scikit-learn Running setup.py egg_info for package scikit-learn Warning: Assuming default configuration (scikits/learn/{setup

Is there a way to convert nltk featuresets into a scipy.sparse array?

故事扮演 提交于 2019-12-08 08:04:40
问题 I'm trying to use scikit.learn which needs numpy/scipy arrays for input. The featureset generated in nltk consists of unigram and bigram frequencies. I could do it manually, but that'll be a lot of effort. So wondering if there's a solution i've overlooked. 回答1: Not that I know of, but note that scikit-learn can do n -gram frequency counting itself. Assuming word-level n -grams: from sklearn.feature_extraction.text import CountVectorizer, WordNGramAnalyzer v = CountVectorizer(analyzer

Scikit - 3D feature array for SVM

时光毁灭记忆、已成空白 提交于 2019-12-08 07:04:42
问题 I am trying to train an SVM in scikit. I am following the example and tried to adjust it to my 3d feature vectors. I tried the example from the page http://scikit-learn.org/stable/modules/svm.html and it ran through. While bugfixing I came back to the tutorial setup and found this: X = [[0, 0], [1, 1],[2,2]] y = [0, 1,1] clf = svm.SVC() clf.fit(X, y) works while X = [[0, 0,0], [1, 1,1],[2,2,2]] y = [0, 1,1] clf = svm.SVC() clf.fit(X, y) fails with: ValueError: X.shape[1] = 2 should be equal

scikit-learn roc_auc_score() returns accuracy values

守給你的承諾、 提交于 2019-12-06 00:52:45
问题 I am trying to compute area under the ROC curve using sklearn.metrics.roc_auc_score using the following method: roc_auc = sklearn.metrics.roc_auc_score(actual, predicted) where actual is a binary vector with ground truth classification labels and predicted is a binary vector with classification labels that my classifier has predicted. However, the value of roc_auc that I am getting is EXACTLY similar to accuracy values (proportion of samples whose labels are correctly predicted). This is not

how to Load CSV Data in scikit and using it for Naive Bayes Classification

ε祈祈猫儿з 提交于 2019-12-04 12:31:34
问题 Trying to load custom data to perform NB Classification in Scikit. Need help in loading the sample data into Scikit and then perform NB. How to load categorical values for target. Use the same data for Train and Test or use a complete set just for test. Sl No,Member ID,Member Name,Location,DOB,Gender,Marital Status,Children,Ethnicity,Insurance Plan ID,Annual Income ($),Twitter User ID 1,70000001,Fly Dorami,New York,39786,M,Single,,Asian,2002,0,548900028 2,70000002,Bennie Ariana,Pennsylvania,6

scikit-learn roc_auc_score() returns accuracy values

醉酒当歌 提交于 2019-12-04 06:53:09
I am trying to compute area under the ROC curve using sklearn.metrics.roc_auc_score using the following method: roc_auc = sklearn.metrics.roc_auc_score(actual, predicted) where actual is a binary vector with ground truth classification labels and predicted is a binary vector with classification labels that my classifier has predicted. However, the value of roc_auc that I am getting is EXACTLY similar to accuracy values (proportion of samples whose labels are correctly predicted). This is not a one-off thing. I try my classifier on various values of the parameters and every time I get the same

text classification with SciKit-learn and a large dataset

落花浮王杯 提交于 2019-12-03 21:08:01
First of all I started with python yesterday. I'm trying to do text classification with SciKit and a large dataset (250.000 tweets). For the algorithm, every tweet will be represented as a 4000 x 1 vector, so this means the input is 250.000 rows and 4000 columns. When i try to construct this in python, I run out of memory after 8500 tweets (when working with a list and appending it) and when I preallocate the memory I just get the error: MemoryError (np.zeros(4000,2500000)). Is SciKit not able to work with these large datasets \? Am I doing something wrong (as it is my second day with python)?

Numpy: How to randomly split/select an matrix into n-different matrices

不问归期 提交于 2019-12-03 12:50:59
问题 I have a numpy matrix with shape of (4601, 58). I want to split the matrix randomly as per 60%, 20%, 20% split based on number of rows This is for Machine Learning task I need Is there a numpy function that randomly selects rows? 回答1: you can use numpy.random.shuffle import numpy as np N = 4601 data = np.arange(N*58).reshape(-1, 58) np.random.shuffle(data) a = data[:int(N*0.6)] b = data[int(N*0.6):int(N*0.8)] c = data[int(N*0.8):] 回答2: A complement to HYRY's answer if you want to shuffle