scikit-learn

Plotting the KMeans Cluster Centers for every iteration in Python

六月ゝ 毕业季﹏ 提交于 2021-01-05 07:22:26
问题 I created a dataset with 6 clusters and visualize it with the code below, and find the cluster center points for every iteration, now i want to visualize demonstration of update of the cluster centroids in KMeans algorithm. This demonstration should include first four iterations by generating 2×2-axis figure. I found the points but i cant plot them, can you please check out my code and by looking that, help me write the algorithm to scatter plot? Here is my code so far: import seaborn as sns

Cyclical Loop Between OneHotEncoder and KNNImpute in Scikit-learn

强颜欢笑 提交于 2021-01-04 05:50:31
问题 I'm working with a really simple dataset. It has some missing values, both in categorical and numeric features. Because of this, I'm trying to use sklearn.preprocessing.KNNImpute to get the most accurate imputation I can. However, when I run the following code: imputer = KNNImputer(n_neighbors=120) imputer.fit_transform(x_train) I get the error: ValueError: could not convert string to float: 'Private' That makes sense, it obviously can't handle categorical data. But when I try to run

Cyclical Loop Between OneHotEncoder and KNNImpute in Scikit-learn

你离开我真会死。 提交于 2021-01-04 05:49:44
问题 I'm working with a really simple dataset. It has some missing values, both in categorical and numeric features. Because of this, I'm trying to use sklearn.preprocessing.KNNImpute to get the most accurate imputation I can. However, when I run the following code: imputer = KNNImputer(n_neighbors=120) imputer.fit_transform(x_train) I get the error: ValueError: could not convert string to float: 'Private' That makes sense, it obviously can't handle categorical data. But when I try to run

Tfidfvectorizer - How can I check out processed tokens?

♀尐吖头ヾ 提交于 2021-01-04 05:40:43
问题 How can I check the strings tokenized inside TfidfVertorizer() ? If I don't pass anything in the arguments, TfidfVertorizer() will tokenize the string with some pre-defined methods. I want to observe how it tokenizes strings so that I can more easily tune my model. from sklearn.feature_extraction.text import TfidfVectorizer corpus = ['This is the first document.', 'This document is the second document.', 'And this is the third one.', 'Is this the first document?'] vectorizer = TfidfVectorizer

如何手动优化神经网络模型(附链接)

。_饼干妹妹 提交于 2021-01-02 03:00:38
翻译:陈丹 校对:车前子 本文 约5400字 ,建议阅读 15 分钟 本文是一个教授如何优化神经网络模型的基础教程,提供了具体的实战代码供读者学习和实践。 深度学习的神经网络是采用随机梯度下降优化算法对训练数据进行拟合。 利用误差反向传播算法对模型的权值进行更新。优化和权值更新算法的组合是经过仔细挑选的,是目前已知的最有效的拟合神经网络的方法。 然而,也可以使用交替优化算法将神经网络模型拟合到训练数据集。这是一个有用的练习,可以了解更多关于神经网络的是如何运转的,以及应用机器学习时优化的中心性。具有非常规模型结构和不可微分传递函数的神经网络,也可能需要它。 在本教程中,您将了解如何手动优化神经网络模型的权重。 完成本教程后,您将知道: 如何从头开始开发神经网络模型的正向推理通路。 如何优化二值分类感知器模型的权值。 如何利用随机爬山算法优化多层感知器模型的权值。 我们开始吧。 图源土地管理局,权利归其所有 教程概述 本教程分为三个部分:它们是: 优化神经网络 优化感知器模型 优化多层感知器 优化神经网络 深度学习或神经网络是一种灵活的机器学习。 它们是受大脑结构和功能的启发而来的,由节点和层次组成的模型。神经网络模型的工作原理是将给定的输入向量传播到一个或多个层,以产生可用于分类或回归预测建模的数值输出。 通过反复将模型暴露在输入和输出示例中

Oversampling after splitting the dataset - Text classification

本秂侑毒 提交于 2021-01-01 13:33:30
问题 I am having some issues with the steps to follow for over-sampling a dataset. What I have done is the following: # Separate input features and target y_up = df.Label X_up = df.drop(columns=['Date','Links', 'Paths'], axis=1) # setting up testing and training sets X_train_up, X_test_up, y_train_up, y_test_up = train_test_split(X_up, y_up, test_size=0.30, random_state=27) class_0 = X_train_up[X_train_up.Label==0] class_1 = X_train_up[X_train_up.Label==1] # upsample minority class_1_upsampled =

Why is Tensorflow's Gradient Tape returning None when trying to find the gradient of loss wrt input?

拜拜、爱过 提交于 2021-01-01 09:28:50
问题 I have a CNN model built in keras which uses an SVM in its last layer. I get the prediction of this SVM by putting in an input into the CNN model, extracting the relevant features and then putting those features into my SVM to get an output prediction. This entire process I have names predict_DNR_tensor in the code below. This works fine and I am able to get a correct prediction. I am now trying to get a gradient of squared hinge loss of this prediction from my SVM wrt to the original input,

ValueError: Found input variables with inconsistent numbers of samples

匆匆过客 提交于 2021-01-01 07:01:09
问题 There are tons of samples from this error in which the problem is related with dimensions of the array or how a dataframe is read. However, I'm using just a python list for both X and Y. I'm trying to split my code in train and test with train_test_split. My code is this: X, y = file2vector(corpus_dir) assert len(X) == len(y) # both lists same length print(type(X)) print(type(y)) seed = 123 labels = list(set(y)) print(len(labels)) print(labels) cont = {} for l in y: if not l in cont: cont[l]

How can i create an instance of multi-layer perceptron network to use in bagging classifier?

只愿长相守 提交于 2021-01-01 06:44:21
问题 i am trying to create an instance of multi-layer perceptron network to use in bagging classifier. But i don't understand how to fix them. Here is my code: My task is: 1-To apply bagging classifier (with or without replacement) with eight base classifiers created at the previous step. It would be really great if you show me how can i implement this to my algorithm. I did my search but i couldn't find a way to do that 回答1: To train your BaggingClassifier : from sklearn.datasets import load

How can i create an instance of multi-layer perceptron network to use in bagging classifier?

只谈情不闲聊 提交于 2021-01-01 06:44:16
问题 i am trying to create an instance of multi-layer perceptron network to use in bagging classifier. But i don't understand how to fix them. Here is my code: My task is: 1-To apply bagging classifier (with or without replacement) with eight base classifiers created at the previous step. It would be really great if you show me how can i implement this to my algorithm. I did my search but i couldn't find a way to do that 回答1: To train your BaggingClassifier : from sklearn.datasets import load