svm

How to obtain the training error in svm of Scikit-learn?

和自甴很熟 提交于 2019-12-07 07:28:45
问题 My question: How do I obtain the training error in the svm module (SVC class)? I am trying to do a plot of error of the train set and test set against the number of training data used ( or other features such as C / gamma ). However, according to the SVM documentation , there is no such exposed attribute or method to return such data. I did find that RandomForestClassifier does expose a oob_score_ though. 回答1: Just compute the score on the training data: >>> model.fit(X_train, y_train).score

LibSVM turns all my training vectors into support vectors, why?

爱⌒轻易说出口 提交于 2019-12-07 06:10:43
问题 I am trying to use SVM for News article classification. I created a table that contains the features (unique words found in the documents) as rows. I created weight vectors mapping with these features. i.e if the article has a word that is part of the feature vector table that location is marked as 1 or else 0 . Ex:- Training sample generated... 1 1:1 2:1 3:1 4:1 5:1 6:1 7:1 8:1 9:1 10:1 11:1 12:1 13:1 14:1 15:1 16:1 17:1 18:1 19:1 20:1 21:1 22:1 23:1 24:1 25:1 26:1 27:1 28:1 29:1 30:1 As

Multiprocessing on a model with data frame as input

拜拜、爱过 提交于 2019-12-07 05:43:56
问题 I want to use multiprocessing on a model to get predictions using a data frame as input. I have the following code: def perform_model_predictions(model, dataFrame, cores=4): try: with Pool(processes=cores) as pool: result = pool.map(model.predict, dataFrame) return result # return model.predict(dataFrame) except AttributeError: logging.error("AttributeError occurred", exc_info=True) The error I'm getting is: raise TypeError("sparse matrix length is ambiguous; use getnnz()" TypeError: sparse

Time series prediction using support vector regression

蹲街弑〆低调 提交于 2019-12-07 05:30:25
问题 I've been trying to implement time series prediction tool using support vector regression in python language. I use SVR module from scikit-learn for non-linear Support vector regression. But I have serious problem with prediction of future events. The regression line fits the original function great (from known data) but as soon as I want to predict future steps, it returns value from the last known step. My code looks like this: import numpy as np from matplotlib import pyplot as plt from

Python: convert matrix to positive semi-definite

守給你的承諾、 提交于 2019-12-07 05:19:47
问题 I'm currently working on kernel methods, and at some point I needed to make a non positive semi-definite matrix (i.e. similarity matrix) into one PSD matrix. I tried this approach: def makePSD(mat): #make symmetric k = (mat+mat.T)/2 #make PSD min_eig = np.min(np.real(linalg.eigvals(mat))) e = np.max([0, -min_eig + 1e-4]) mat = k + e*np.eye(mat.shape[0]); return mat but it fails if I test the resulting matrix with the following function: def isPSD(A, tol=1e-8): E,V = linalg.eigh(A) return np

Speed of SVM Kernels? Linear vs RBF vs Poly

老子叫甜甜 提交于 2019-12-07 02:24:08
问题 I'm using scikitlearn in Python to create some SVM models while trying different kernels. The code is pretty simple, and follows the form of: from sklearn import svm clf = svm.SVC(kernel='rbf', C=1, gamma=0.1) clf = svm.SVC(kernel='linear', C=1, gamma=0.1) clf = svm.SVC(kernel='poly', C=1, gamma=0.1) t0 = time() clf.fit(X_train, y_train) print "Training time:", round(time() - t0, 3), "s" pred = clf.predict(X_test) The data is 8 features and a little over 3000 observations. I was surprised to

Building an SVM with Tensorflow

戏子无情 提交于 2019-12-06 18:08:11
问题 I currently have two numpy arrays: X - (157, 128) - 157 sets of 128 features Y - (157) - classifications of the feature sets This is the code I have written to attempt to build a linear classification model of these features. First of all I adapted the arrays to a Tensorflow dataset: train_input_fn = tf.estimator.inputs.numpy_input_fn( x={"x": X}, y=Y, num_epochs=None, shuffle=True) I then tried to fit an SVM model: svm = tf.contrib.learn.SVM( example_id_column='example_id', # not sure why

理解SVM(一)——入门SVM和代码实现

家住魔仙堡 提交于 2019-12-06 15:14:24
理解SVM 这篇博客我们来理解一下SVM。其实,之前好多大牛的博客已经对SVM做了很好的理论描述。例如CSDN上july的那篇三层境界介绍SVM的博文,连接如下: http://blog.csdn.net/v_july_v/article/details/7624837 那么我这里抛去一些复杂的公式推导,给出一些SVM核心思想,以及用Python实现代码,再加上我自己的理解注释。 1. SVM的核心思想 SVM的分类思想本质上和线性回归LR分类方法类似,就是求出一组权重系数,在线性表示之后可以分类。我们先使用一组trainging set来训练SVM中的权重系数,然后可以对testingset进行分类。 说的更加更大上一些:SVM就是先训练出一个分割超平面separation hyperplane, 然后该平面就是分类的决策边界,分在平面两边的就是两类。显然,经典的SVM算法只适用于两类分类问题,当然,经过改进之后,SVM也可以适用于多类分类问题。 我们希望找到离分隔超平面最近的点,确保它们离分隔面的距离尽可能远。这里点到分隔面的距离被称为间隔margin. 我们希望这个margin尽可能的大。支持向量support vector就是离分隔超平面最近的那些点,我们要最大化支持向量到分隔面的距离。 那么为了达到上面的目的,我们就要解决这样的一个问题:如何计算一个点到分隔面的距离

BAT机器学习面试1000题系列

本小妞迷上赌 提交于 2019-12-06 14:35:20
几点声明: 1、本文的内容全部来源于七月在线发布的BAT机器学习面试1000题系列; 2、文章中带斜体的文字代表是本人自己增加的内容,如有错误还请批评指正; 3、原文中有部分链接已经失效,故而本人重新加上了新的链接,如有不当,还请指正。(也已用斜体标出) 4、部分答案由于完全是摘抄自其它的博客,所以本人就只贴出答案链接,这样既可以节省版面,也可以使排版更加美观。点击对应的问题即可跳转。 最后,此博文的排版已经经过本人整理,公式已用latex语法表示,方便读者阅读。同时链接形式也做了优化,可直接跳转至相应页面,希望能够帮助读者提高阅读体验,文中如果因为本人的整理出现纰漏,还请指出,大家共同进步! 1.请简要介绍下SVM。 SVM,全称是support vector machine,中文名叫支持向量机。SVM是一个面向数据的分类算法,它的目标是为确定一个分类超平面,从而将不同的数据分隔开。 扩展: 支持向量机学习方法包括构建由简至繁的模型:线性可分支持向量机、线性支持向量机及非线性支持向量机。当训练数据线性可分时,通过硬间隔最大化,学习一个线性的分类器,即线性可分支持向量机,又称为硬间隔支持向量机;当训练数据近似线性可分时,通过软间隔最大化,也学习一个线性的分类器,即线性支持向量机,又称为软间隔支持向量机;当训练数据线性不可分时,通过使用核技巧及软间隔最大化,学习非线性支持向量机。

One class SVM probability estimates and what is the different between one class SVM and clustering

感情迁移 提交于 2019-12-06 14:03:50
I have a set of images. I would like to learn a one class SVM (OC-SVM) to model the distribution of a particular class (positive) as I dont have enough examples to represent the other classes (negative). What I understood about OC-SVM is that it tries to separate the data from the origin or in other words it tries to learn a hyper sphere to fit the one class data. My questions are, If I want to use the output of the OC-SVM as a probability estimate, how can I do it? What is the difference between the OC-SVM and any clustering algorithm (e.g. k-means)? If you want a probability estimate, don't