unsupervised-learning

Is train/test-Split in unsupervised learning necessary/useful?

喜欢而已 提交于 2019-12-07 15:50:35
问题 In supervised learning I have the typical train/test split to learn the algorithm, e.g. Regression or Classification. Regarding unsupervised learning, my question is: Is train/test split necessary and useful? If yes, why? 回答1: Well This Depend on the Problem, the form of dataset and Class of Unsupervised algorithm used to solve the particular problem. Roughly:- Dimensionality reduction techniques are usually tested by calculating the error in reconstruction so there we can use k-fold cross

Does or will H2O provide any pretrained vectors for use with h2o word2vec?

夙愿已清 提交于 2019-12-06 08:24:07
问题 H2O recently added word2vec in its API. It is great to be able to easily train your own word vectors on a corpus you provide yourself. However even greater possibilities exist from using big data and big computers, of the type that software vendors like Google or H2O.ai, but not so many end-users of H2O, may have access to, due to network bandwidth and compute power limitations. Word embeddings can be seen as a type of unsupervised learning. As such, great value can be had in a data science

How to programmatically determine the column indices of principal components using FactoMineR package?

ぃ、小莉子 提交于 2019-12-06 03:40:17
问题 Given a data frame containing mixed variables (i.e. both categorical and continuous) like, digits = 0:9 # set seed for reproducibility set.seed(17) # function to create random string createRandString <- function(n = 5000) { a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE)) paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE)) } df <- data.frame(ID=c(1:10), name=sample(letters[1:10]), studLoc=sample(createRandString(10)), finalmark=sample(c(0:100),10),

Is train/test-Split in unsupervised learning necessary/useful?

怎甘沉沦 提交于 2019-12-05 22:46:22
In supervised learning I have the typical train/test split to learn the algorithm, e.g. Regression or Classification. Regarding unsupervised learning, my question is: Is train/test split necessary and useful? If yes, why? Well This Depend on the Problem, the form of dataset and Class of Unsupervised algorithm used to solve the particular problem. Roughly:- Dimensionality reduction techniques are usually tested by calculating the error in reconstruction so there we can use k-fold cross-validation procedure But on clustering algorithm, I would suggest doing statistical testing in order to test

Affinity Propagation preferences initialization

爷,独闯天下 提交于 2019-12-05 16:57:48
问题 I need to perform clustering without knowing in advance the number of clusters. The number of cluster may be from 1 to 5, since I may find cases where all the samples belong to the same instance, or to a limited number of group. I thought affinity propagation could be my choice, since I could control the number of clusters by setting the preference parameter. However, if I have a single cluster artificially generated and I set preference to the minimal euclidean distance among nodes (to

Unsupervised pre-training for convolutional neural network in theano

孤者浪人 提交于 2019-12-04 07:46:04
问题 I would like to design a deep net with one (or more) convolutional layers (CNN) and one or more fully connected hidden layers on top. For deep network with fully connected layers there are methods in theano for unsupervised pre-training, e.g., using denoising auto-encoders or RBMs. My question is: How can I implement (in theano) an unsupervised pre-training stage for convolutional layers? I do not expect a full implementation as an answer, but I would appreciate a link to a good tutorial or a

How to programmatically determine the column indices of principal components using FactoMineR package?

こ雲淡風輕ζ 提交于 2019-12-04 07:29:49
Given a data frame containing mixed variables (i.e. both categorical and continuous) like, digits = 0:9 # set seed for reproducibility set.seed(17) # function to create random string createRandString <- function(n = 5000) { a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE)) paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE)) } df <- data.frame(ID=c(1:10), name=sample(letters[1:10]), studLoc=sample(createRandString(10)), finalmark=sample(c(0:100),10), subj1mark=sample(c(0:100),10),subj2mark=sample(c(0:100),10) ) I perform unsupervised feature selection

how to do clustering when the shape of data is (x,y,z)?

落爺英雄遲暮 提交于 2019-12-04 07:05:48
问题 suppose i have 10 individual observations each of size (125,59). i want to group these 10 observations based on their 2d feature matrices ((125,59)).Is this possible without flattening every observation to 125*59 1D matrix ? I cant even implement PCA or LDA for feature extraction because the data is highly variant. Please note that i am trying to implement clustering through self organizing maps or neural networks. Deep learning and neural networks are completely related to the question asked

Affinity Propagation preferences initialization

廉价感情. 提交于 2019-12-04 03:23:18
I need to perform clustering without knowing in advance the number of clusters. The number of cluster may be from 1 to 5, since I may find cases where all the samples belong to the same instance, or to a limited number of group. I thought affinity propagation could be my choice, since I could control the number of clusters by setting the preference parameter. However, if I have a single cluster artificially generated and I set preference to the minimal euclidean distance among nodes (to minimize number of clusters), I get terrible over clustering. """ ==========================================

Implementing Face Recognition using Local Descriptors (Unsupervised Learning)

二次信任 提交于 2019-12-03 16:40:27
I'm trying to implement a face recognition algorithm using Python. I want to be able to receive a directory of images, and compute pair-wise distances between them, when short distances should hopefully correspond to the images belonging to the same person. The ultimate goal is to cluster images and perform some basic face identification tasks (unsupervised learning). Because of the unsupervised setting, my approach to the problem is to calculate a "face signature" (a vector in R^d for some int d) and then figure out a metric in which two faces belonging to the same person will indeed have a