classification

Using Naive Bayes Classification to Identity a Twitter User's Gender [closed]

99封情书 提交于 2019-12-11 12:16:19
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 6 years ago . I have become part of a project at school that has been a lot of fun so far and it just got a little bit more interesting. I have roughly 600,000 tweets in my possession (each contains screen name, geo location, text, etc.) and my goal is to try to classify each user as either

SVM for one Vs all acoustic signal classification

╄→尐↘猪︶ㄣ 提交于 2019-12-11 12:09:40
问题 My aim is to classify an impulsive audio signal as whether it is a gunshot or not a gunshot. I am trying to detect the gunshot event in MATLAB using svmtrain and svmclassify functions. To evaluate the accuracy of classification, the balloon burst and clapping signal are used as different classes. While accuracy of differnetiating between gunshot and either of the two classes is good, the differentiation between gunshot and combined baloon+clap signal is poor. Please guide how may I use SVM to

probability with Guassian mixture Model

孤街浪徒 提交于 2019-12-11 11:29:07
问题 I have two class with label I want to classify with Gaussian Mixture Model in matlab but I don't Know how to calculate probability my test data for two class clear all clc train_class0_data = load('train-class0.data'); train_class0_label = load('train-class0.label'); train_class1_data = load('train-class1.data'); train_class1_label = load('train-class1.label'); test_data = load('test.data'); test_label = load('test.label'); GMMObject_Class0 = gmdistribution.fit(train_class0_data,2,'Regularize

AttributeError: 'NoneType' object has no attribute 'dtype'

China☆狼群 提交于 2019-12-11 10:23:06
问题 I'm trying to implement a simple neural network using tensorflow. It is a binary classification problem. Shapes of X_train: (batch_size, 70) and Y_train: (batch_size, 2). I'm reading the data using csv. Here is my code. I'm running this on python 3.6.0. import numpy as np import csv import tensorflow as tf with open('criminal_train.csv') as fp: reader = csv.reader(fp, delimiter=',', quotechar='"') train_data = np.array([row for row in reader]) data_X = train_data[1:, 1:-1] data_Y = train_data

How to determine whether a stream is text or binary in Python?

与世无争的帅哥 提交于 2019-12-11 10:04:41
问题 Is there a way to determine (test, check or classify) whether a file (or a bytestream, or other file-like object) is text or binary, similar to the file command's magic in Unix, in a practical majority of cases? Motivation: Although guesswork should be avoided, where Python can determine this, I'd like to utilize the capability. One could cover a useful amount of cases and handle the exceptions. Preference would be given to cross-platform or pure-python methods. One way is python-magic

Extract a subset of tree from random forest model for prediction

风流意气都作罢 提交于 2019-12-11 09:33:48
问题 From Liaw's classification and regression by RF paper, "The best way to determine how many trees are necessary is to compare predictions made by a forest to predictions made by a subset of forest" I am wondering if there is a way to extract subtree for prediction with R's randomForest package. getTree seems to print out the structure. Any suggestion would be greatly appreciated. 回答1: Try this one in randomForest , predict(rf, dat, predict.all=TRUE) , you can get predictions from all the sub

Binary Classification vs. Multi Class Classification

五迷三道 提交于 2019-12-11 09:23:09
问题 I have a machine learning classification problem with 3 possible classes (Class A, Class b and Class C). Please let me know which one would be better approach? - Split the problem into 2 binary classification: First Identify whether it is Class A or Class 'Not A'. Then if it is Class 'Not A', then another binary classification to classify into Class B or Class C 回答1: Binary classification may at the end use sigmoid function (goes smooth from 0 to 1). This is how we will know how to classify

Changing multiple column values to binary values

戏子无情 提交于 2019-12-11 09:16:57
问题 I've asked this question before but the answer I got didn't quite work out as I thought it had, so that here I am. Previous question: Defining a function for changing column values and creating new datasets I am trying to define a function where it will take a dataframe and change values in a column to create multiple new dataframes. As an example, from df1 looking like: df1: class colB colC 0 1 1b 1c 1 2 2b 2c 2 3 3b 3c 3 1 4b 4c 4 2 5b 5c I am trying to create multiple binary classes to

'Multiclass-multioutput is not supported' Error in Scikit learn for Knn classifier

佐手、 提交于 2019-12-11 09:04:45
问题 I have two variables X and Y. The structure of X (i.e an np.array): [[26777 24918 26821 ... -1 -1 -1] [26777 26831 26832 ... -1 -1 -1] [26777 24918 26821 ... -1 -1 -1] ... [26811 26832 26813 ... -1 -1 -1] [26830 26831 26832 ... -1 -1 -1] [26830 26831 26832 ... -1 -1 -1]] The structure of Y : [[1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [25197, 26777, 26781], [25197, 26777, 26781], [25197, 26777, 26781],

Machine learning algorithm score changes without any change in data or step

允我心安 提交于 2019-12-11 08:55:56
问题 I am new to Machine learning and getting started with Titanic problem on Kaggle. I have written a simple algorithm to predict the result on test data. My question/confusion is, every time, I execute the algorithm with the same dataset and the same steps, the score value changes (last statement in the code). I am not able to understand this behaviour? Code: # imports import numpy as np import pandas as pd from sklearn.tree import DecisionTreeClassifier # load data train = pd.read_csv('train