supervised-learning

How to use machine learning to calculate a graph of states from a sequence of data?

你。 提交于 2019-12-22 09:36:33
问题 Generic formulation I have a dataset consisting of a sequence of points with 12 features each. I am interested in detecting an event in this data. In the training data I know the moments the event occurred. When the event occurs I can see an observable pattern in the sequence of points before the event. The pattern is formed from about 300 consecutive points. I am interested in detecting when the event occurred in a infinite sequence of points. The analysis happens post factum. I am not

String Subsequence Kernel and SVM using Python

孤人 提交于 2019-12-21 05:17:16
问题 How can I use Subsequence String Kernel (SSK) [Lodhi 2002] to train a SVM (Support Vector Machine) in Python? 回答1: This is an update to gcedo's answer to work with the current version of shogun (Shogun 6.1.3). Working example: import numpy as np from shogun import StringCharFeatures, RAWBYTE from shogun import BinaryLabels from shogun import SubsequenceStringKernel from shogun import LibSVM strings = ['cat', 'doom', 'car', 'boom','caboom','cartoon','cart'] test = ['bat', 'soon', 'it is your

Plot SVM with Matplotlib?

笑着哭i 提交于 2019-12-18 17:38:07
问题 I have some interesting user data. It gives some information on the timeliness of certain tasks the users were asked to perform. I am trying to find out, if late - which tells me if users are on time ( 0 ), a little late ( 1 ), or quite late ( 2 ) - is predictable/explainable. I generate late from a column giving traffic light information (green = not late, red = super late). Here is what I do: #imports import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn import

Training Tagger with Custom Tags in NLTK

无人久伴 提交于 2019-12-18 05:03:16
问题 I have a document with tagged data in the format Hi here's my [KEYWORD phone number], let me know when you wanna hangout: [PHONE 7802708523]. I live in a [PROP_TYPE condo] in [CITY New York] . I want to train a model based on a set of these type of tagged documents, and then use my model to tag new documents. Is this possible in NLTK? I have looked at chunking and NLTK-Trainer scripts, but these have a restricted set of tags and corpora, while my dataset has custom tags. 回答1: As

Purpose of test data in supervised learning?

纵然是瞬间 提交于 2019-12-12 03:22:59
问题 So this question may seem a little stupid but I couldn't wrap my head around it. What is the purpose of test data? Is it only to calculate accuracy of the classifier? I'm using Naive Bayes for sentiment analysis of tweets. Once I train my classifier using training data, I use test data just to calculate accuracy of the classifier. How can I use the test data to improve classifier's performance? 回答1: In doing general supervised machine learning, the test data set plays a critical role in

Improve flow Python classifier and combine features

本秂侑毒 提交于 2019-12-11 04:42:01
问题 I am trying to create a classifier to categorize websites. I am doing this for the very first time so it's all quite new to me. Currently I am trying to do some Bag of Words on a couple of parts of the web page (e.g. title, text, headings). It looks like this: from sklearn.feature_extraction.text import CountVectorizer countvect_text = CountVectorizer(encoding="cp1252", stop_words="english") countvect_title = CountVectorizer(encoding="cp1252", stop_words="english") countvect_headings =

How is input dataset fed into neural network?

六月ゝ 毕业季﹏ 提交于 2019-12-11 00:47:25
问题 If I have 1000 observations in my dataset with 15 features and 1 label, how is the data in input neurons fed for forward pass and back propagation? Is it fed row wise for 1000 observations (one at a time) and weights are updated with each observation fed or full data is given in terms of input matrix and then with number of epochs, the network learns corresponding weight values? Also if it is fed one at time, what is epochs in that case? Thanks 回答1: Assuming that the data is formatted into

Calculate sklearn.roc_auc_score for multi-class

℡╲_俬逩灬. 提交于 2019-12-09 09:07:07
问题 I would like to calculate AUC, precision, accuracy for my classifier. I am doing supervised learning: Here is my working code. This code is working fine for binary class, but not for multi class. Please assume that you have a dataframe with binary classes: sample_features_dataframe = self._get_sample_features_dataframe() labeled_sample_features_dataframe = retrieve_labeled_sample_dataframe(sample_features_dataframe) labeled_sample_features_dataframe, binary_class_series, multi_class_series =

Why Gaussian radial basis function maps the examples into an infinite-dimensional space?

瘦欲@ 提交于 2019-12-09 05:04:22
问题 I've just run through the Wikipedia page about SVMs, and this line caught my eyes: "If the kernel used is a Gaussian radial basis function, the corresponding feature space is a Hilbert space of infinite dimensions." http://en.wikipedia.org/wiki/Support_vector_machine#Nonlinear_classification In my understanding, if I apply Gaussian kernel in SVM, the resulting feature space will be m -dimensional (where m is the number of training samples), as you choose your landmarks to be your training

Creating Neural Network for un-encountered inputs

自古美人都是妖i 提交于 2019-12-08 01:10:37
问题 I am creating a simple Multi-layered feed forward Neural Network using AForge.net NN library. My NN is a 3 Layered Activation Network trained with Supervised Learning approach using BackPropogation Learning algorithm. Following are my initial settings: //learning rate learningRate=0.1; //momentum value momentum=0; //alpha value for bipolar sigmoid activation function sigmoidAlphaValue=2.0; //number of inputs to network inputSize=5; //number of outputs from network predictionSize=1; /