svm | 易学教程

How to obtain the training error in svm of Scikit-learn?

阅读更多关于 How to obtain the training error in svm of Scikit-learn?

问题 My question: How do I obtain the training error in the svm module (SVC class)? I am trying to do a plot of error of the train set and test set against the number of training data used ( or other features such as C / gamma ). However, according to the SVM documentation , there is no such exposed attribute or method to return such data. I did find that RandomForestClassifier does expose a oob_score_ though. 回答1: Just compute the score on the training data: >>> model.fit(X_train, y_train).score

LibSVM turns all my training vectors into support vectors, why?

阅读更多关于 LibSVM turns all my training vectors into support vectors, why?

问题 I am trying to use SVM for News article classification. I created a table that contains the features (unique words found in the documents) as rows. I created weight vectors mapping with these features. i.e if the article has a word that is part of the feature vector table that location is marked as 1 or else 0 . Ex:- Training sample generated... 1 1:1 2:1 3:1 4:1 5:1 6:1 7:1 8:1 9:1 10:1 11:1 12:1 13:1 14:1 15:1 16:1 17:1 18:1 19:1 20:1 21:1 22:1 23:1 24:1 25:1 26:1 27:1 28:1 29:1 30:1 As

Multiprocessing on a model with data frame as input

阅读更多关于 Multiprocessing on a model with data frame as input

问题 I want to use multiprocessing on a model to get predictions using a data frame as input. I have the following code: def perform_model_predictions(model, dataFrame, cores=4): try: with Pool(processes=cores) as pool: result = pool.map(model.predict, dataFrame) return result # return model.predict(dataFrame) except AttributeError: logging.error("AttributeError occurred", exc_info=True) The error I'm getting is: raise TypeError("sparse matrix length is ambiguous; use getnnz()" TypeError: sparse

Time series prediction using support vector regression

阅读更多关于 Time series prediction using support vector regression

问题 I've been trying to implement time series prediction tool using support vector regression in python language. I use SVR module from scikit-learn for non-linear Support vector regression. But I have serious problem with prediction of future events. The regression line fits the original function great (from known data) but as soon as I want to predict future steps, it returns value from the last known step. My code looks like this: import numpy as np from matplotlib import pyplot as plt from

Python: convert matrix to positive semi-definite

阅读更多关于 Python: convert matrix to positive semi-definite

问题 I'm currently working on kernel methods, and at some point I needed to make a non positive semi-definite matrix (i.e. similarity matrix) into one PSD matrix. I tried this approach: def makePSD(mat): #make symmetric k = (mat+mat.T)/2 #make PSD min_eig = np.min(np.real(linalg.eigvals(mat))) e = np.max([0, -min_eig + 1e-4]) mat = k + e*np.eye(mat.shape[0]); return mat but it fails if I test the resulting matrix with the following function: def isPSD(A, tol=1e-8): E,V = linalg.eigh(A) return np

Speed of SVM Kernels? Linear vs RBF vs Poly

阅读更多关于 Speed of SVM Kernels? Linear vs RBF vs Poly

问题 I'm using scikitlearn in Python to create some SVM models while trying different kernels. The code is pretty simple, and follows the form of: from sklearn import svm clf = svm.SVC(kernel='rbf', C=1, gamma=0.1) clf = svm.SVC(kernel='linear', C=1, gamma=0.1) clf = svm.SVC(kernel='poly', C=1, gamma=0.1) t0 = time() clf.fit(X_train, y_train) print "Training time:", round(time() - t0, 3), "s" pred = clf.predict(X_test) The data is 8 features and a little over 3000 observations. I was surprised to

Building an SVM with Tensorflow

阅读更多关于 Building an SVM with Tensorflow

问题 I currently have two numpy arrays: X - (157, 128) - 157 sets of 128 features Y - (157) - classifications of the feature sets This is the code I have written to attempt to build a linear classification model of these features. First of all I adapted the arrays to a Tensorflow dataset: train_input_fn = tf.estimator.inputs.numpy_input_fn( x={"x": X}, y=Y, num_epochs=None, shuffle=True) I then tried to fit an SVM model: svm = tf.contrib.learn.SVM( example_id_column='example_id', # not sure why

理解SVM（一）——入门SVM和代码实现

阅读更多关于理解SVM（一）——入门SVM和代码实现

理解SVM 这篇博客我们来理解一下SVM。其实，之前好多大牛的博客已经对SVM做了很好的理论描述。例如CSDN上july的那篇三层境界介绍SVM的博文，连接如下： http://blog.csdn.net/v_july_v/article/details/7624837 那么我这里抛去一些复杂的公式推导，给出一些SVM核心思想，以及用Python实现代码，再加上我自己的理解注释。 1. SVM的核心思想 SVM的分类思想本质上和线性回归LR分类方法类似，就是求出一组权重系数，在线性表示之后可以分类。我们先使用一组trainging set来训练SVM中的权重系数，然后可以对testingset进行分类。说的更加更大上一些：SVM就是先训练出一个分割超平面separation hyperplane, 然后该平面就是分类的决策边界，分在平面两边的就是两类。显然，经典的SVM算法只适用于两类分类问题，当然，经过改进之后，SVM也可以适用于多类分类问题。我们希望找到离分隔超平面最近的点，确保它们离分隔面的距离尽可能远。这里点到分隔面的距离被称为间隔margin. 我们希望这个margin尽可能的大。支持向量support vector就是离分隔超平面最近的那些点，我们要最大化支持向量到分隔面的距离。那么为了达到上面的目的，我们就要解决这样的一个问题：如何计算一个点到分隔面的距离

BAT机器学习面试1000题系列

阅读更多关于 BAT机器学习面试1000题系列

几点声明： 1、本文的内容全部来源于七月在线发布的BAT机器学习面试1000题系列； 2、文章中带斜体的文字代表是本人自己增加的内容，如有错误还请批评指正； 3、原文中有部分链接已经失效，故而本人重新加上了新的链接，如有不当，还请指正。（也已用斜体标出） 4、部分答案由于完全是摘抄自其它的博客，所以本人就只贴出答案链接，这样既可以节省版面，也可以使排版更加美观。点击对应的问题即可跳转。最后，此博文的排版已经经过本人整理，公式已用latex语法表示，方便读者阅读。同时链接形式也做了优化，可直接跳转至相应页面，希望能够帮助读者提高阅读体验，文中如果因为本人的整理出现纰漏，还请指出，大家共同进步！ 1.请简要介绍下SVM。 SVM，全称是support vector machine，中文名叫支持向量机。SVM是一个面向数据的分类算法，它的目标是为确定一个分类超平面，从而将不同的数据分隔开。扩展：支持向量机学习方法包括构建由简至繁的模型：线性可分支持向量机、线性支持向量机及非线性支持向量机。当训练数据线性可分时，通过硬间隔最大化，学习一个线性的分类器，即线性可分支持向量机，又称为硬间隔支持向量机；当训练数据近似线性可分时，通过软间隔最大化，也学习一个线性的分类器，即线性支持向量机，又称为软间隔支持向量机；当训练数据线性不可分时，通过使用核技巧及软间隔最大化，学习非线性支持向量机。

One class SVM probability estimates and what is the different between one class SVM and clustering

阅读更多关于 One class SVM probability estimates and what is the different between one class SVM and clustering

I have a set of images. I would like to learn a one class SVM (OC-SVM) to model the distribution of a particular class (positive) as I dont have enough examples to represent the other classes (negative). What I understood about OC-SVM is that it tries to separate the data from the origin or in other words it tries to learn a hyper sphere to fit the one class data. My questions are, If I want to use the output of the OC-SVM as a probability estimate, how can I do it? What is the difference between the OC-SVM and any clustering algorithm (e.g. k-means)? If you want a probability estimate, don't

订阅 svm