unsupervised-learning | 易学教程

Unsupervised high dimension clustering

阅读更多关于 Unsupervised high dimension clustering

问题 I have dataset of records where each record is with 5 labels and the importance of each label is different. I know to labels order according to importance but don't know the differences, so the difference between two records is look like: a dist of label1 + b dist of label2 + c*dist of label3 such that a+b+c = 1. The data set contains around 3000 records and I want to cluster it(don't know the number of clusters) in some way. I thought about DBSCAN but it is not really good with high

AttributeError: 'NoneType' object has no attribute '_inbound_nodes'

阅读更多关于 AttributeError: 'NoneType' object has no attribute '_inbound_nodes'

问题 I want to implement the loss function defined here. I use fcn-VGG16 to obtain a map x, and add a activation layer.(x is the output of the fcn vgg16 net). And then just some operations to get extracted features. co_map = Activation('sigmoid')(x) #add mean values img = Lambda(AddMean, name = 'addmean')(img_input) #img map multiply img_o = Lambda(HighLight, name='highlightlayer1')([img, co_map]) img_b = Lambda(HighLight, name='highlightlayer2')([img, 1-co_map]) extractor = ResNet50(weights =

What is the difference between supervised learning and unsupervised learning?

阅读更多关于 What is the difference between supervised learning and unsupervised learning?

问题 In terms of artificial intelligence and machine learning, what is the difference between supervised and unsupervised learning? Can you provide a basic, easy explanation with an example? 回答1: Since you ask this very basic question, it looks like it's worth specifying what Machine Learning itself is. Machine Learning is a class of algorithms which is data-driven, i.e. unlike "normal" algorithms it is the data that "tells" what the "good answer" is. Example: a hypothetical non-machine learning

Semi-supervised Naive Bayes with NLTK [closed]

阅读更多关于 Semi-supervised Naive Bayes with NLTK [closed]

问题 This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center. Closed 7 years ago . I have built a semi-supervised version of NLTK's Naive Bayes in Python based on the EM (expectation-maximization algorithm). However, in some iterations

How dose the setting of steps_per_epoch and epochs affect the training result in Keras?

阅读更多关于 How dose the setting of steps_per_epoch and epochs affect the training result in Keras?

问题 My generator always yields two images from my dataset randomly and then I calculate the loss using this two samples. Say I set steps_per_epoch=40 and epochs=5 , what's the difference if I set steps_per_epoch=5 and epochs=40 (I use Adam for my optimizer)? 回答1: The epochs argument (also called iteration) refers to the number of full passes over the whole training data. The steps_per_epoch argument refers to the number of batches generated during one epoch. Therefore we have steps_per_epoch = n

Cutting dendrogram at highest level of purity

阅读更多关于 Cutting dendrogram at highest level of purity

问题 I am trying to create program that cluster documents using hierarchical agglomerative clustering, and the output of the program depends on cutting the dendrogram at such a level that I get maximum purity. So following is the algorithm I am working on right now. Create dedrogram for the documents in the dataset purity = 0 final_clusters for all the levels, lvl, in the dendrogram clusters = cut dendrogram at lvl new_purity = calculate_purity_of(clusters) if new_purity > purity purity = new

R how to select several rows to make a new dataframe

阅读更多关于 R how to select several rows to make a new dataframe

问题 I have a dataframe of more than 5000 observations. In my attempt of analysing my data using hierarchical clustering, I have 8 clusters, where some rows contain either a few 1000 or 100 observations. # Cut tree into 8 groups cutree_hclust <- cutree(hclust.unsupervised, k = 8) # Number of members in each cluster table(cutree_hclust) cutree_hclust 1 2 3 4 5 6 7 8 486 61 14 3 15 2 9 5 To get a view of what variable combination there is for each observation in the different clusters, I thought

Getting the learned representation of the data from the unsupervised learning in pylearn2

阅读更多关于 Getting the learned representation of the data from the unsupervised learning in pylearn2

问题 We can train an autoencoder in pylearn2 using below YAML file (along with pylearn2/scripts/train.py) !obj:pylearn2.train.Train { dataset: &train !obj:pylearn2.datasets.mnist.MNIST { which_set: 'train', start: 0, stop: 50000 }, model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder { nvis : 784, nhid : 500, irange : 0.05, corruptor: !obj:pylearn2.corruption.BinomialCorruptor { corruption_level: .2, }, act_enc: "tanh", act_dec: null, # Linear activation on the decoder side. }, algorithm:

Implementing Face Recognition using Local Descriptors (Unsupervised Learning)

阅读更多关于 Implementing Face Recognition using Local Descriptors (Unsupervised Learning)

问题 I'm trying to implement a face recognition algorithm using Python. I want to be able to receive a directory of images, and compute pair-wise distances between them, when short distances should hopefully correspond to the images belonging to the same person. The ultimate goal is to cluster images and perform some basic face identification tasks (unsupervised learning). Because of the unsupervised setting, my approach to the problem is to calculate a "face signature" (a vector in R^d for some

Label Propagation - Array is too big

阅读更多关于 Label Propagation - Array is too big

问题 I am using label propagation in scikit learn for semi-supervised classification. I have 17,000 data points with 7 dimensions. I am unable to use it on this data set. Its throwing a numpy big array error. However, it works fine when I work on a relatively small data set say 200 points. Can anyone suggestion a fix? label_prop_model.fit(np.array(data), labels) File "/usr/lib/pymodules/python2.7/sklearn/semi_supervised/mylabelprop.py", line 58, in fit graph_matrix = self._build_graph() File "/usr