iris | 易学教程

机器学习pipeline总结

阅读更多关于机器学习pipeline总结

# -*- coding: utf-8 -*- """scikit-learn introduction Automatically generated by Colaboratory. Original file is located at https://colab.research.google.com/drive/1quaJafg43SN7S6cNwKFr0_WYn2ELt4Ph scikit-learn官方网站：https://scikit-learn.org/stable/ 模块引入 """ from sklearn import datasets from sklearn.metrics import mean_squared_error, r2_score import matplotlib.pyplot as plt import numpy as np """#分类： - SVM(support vector machine):支持向量机 - svm.SVC() ###iris数据集 - iris feature: 花萼长度，花萼宽度，花瓣长度，花瓣宽度 - iris lable：山鸢尾，杂色鸢尾，维吉尼亚鸢尾 """ iris = datasets.load_iris() print('iris feature\n', iris.data[0:5])

numpy 常用api（三）

阅读更多关于 numpy 常用api（三）

numpy 常用api（一） 0. np.delete() 属非更易型操作，是为了获取返回值，对原始数据不会进行修改； from sklearn.datasets import load_iris import numpy as np test_idx = [0, 50, 150] X_train, y_train = np.delete(iris.data, test_idx, axis=0), np.delete(iris.target, test_idx) X_test, y_test = iris.data[test_idx], iris.target[test_idx] 1. 库下的全局函数全局函数 np.funcation_name() 一般属于非更易型函数既然非更易，就需有返回值，否则函数无意义； 2. np.zeros() >> np.zeros(()) array(0.0) # 也即接收空参的 tuple； np.logaddexp numpy.logaddexp(x1, x2[, out]) 也即计算 log(exp(x1)+exp(x2)) 如何通过logaddexp函数计算 log ⁡ ( x + y ) \log(x+y) lo g ( x + y ) （如果 x ,   y x,\,y x , y 都含有部分指数形式的话），

机器学习笔记--鸢尾花分类（二）

阅读更多关于机器学习笔记--鸢尾花分类（二）

· 训练和测试数据要验证模型是否成功，通常会把收集好带标签的数据分成两部分，一部分用来构建机器学习模型，叫做训练数据（training data）,其余的用来测试，叫做测试数据（test data）。scikit-learn 中的 train_test_split 函数一般会把75%的数据作为训练集，25%的数据作为测试集。根据train_test_split对数据分类： from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( iris_dataset['data'], iris_dataset['target'], random_state=0) #random_state = 0 是他的随机种子 print("X_train shape: {}".format(X_train.shape)) print("y_train shape: {}".format(y_train.shape)) print("X_test shape: {}".format(X_test.shape)) print("y_test shape: {}".format(y_test.shape)) 得到结果： X_train shape:

机器学习笔记--鸢尾花分类

阅读更多关于机器学习笔记--鸢尾花分类

鸢尾花分类是一个经典的机器学习应用。假设一个植物爱好者观察了很多鸢尾花，并记录了这些花的数据（花瓣长度、宽度以及花萼的长度、宽度），并且一直这些花都属于于setosa、 versicolor 或 virginica 三个品种之一。现在需要根据记录的数据预测花的种类。因为这是一个经典的数据集，所以在 scikit-learn 的 datasets 模块中。我们可以调用 load_iris 函数来加载数据： from sklearn.datasets import load_iris iris_dataset = load_iris() print("Keys of iris_dataset: \n{}".format(iris_dataset.keys())) 得到结果： Keys of iris_dataset: dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename']) 我们再分别把这几个key下的数据打印出来，首先是data，由于data太多了，所以我们只打印出了前五个： print("first 5 data: \n{}".format(iris_dataset['data'][:5])) 得到结果如下，应该是分别对应的花瓣长度、宽度，花萼长度、宽度： first

A--Scikit-Learn 实现决策树

阅读更多关于 A--Scikit-Learn 实现决策树

决策树分类方法速度很快，⽽且不需要进行数据清洗，所以通常很适合作为初步分类手段，在借助更复杂的模型进行优化之前使用。选择模型类在Scikit-Learn中分类树算法都保存在tree模块中，具体算法所属类为DecisionTreeClassifier In [1]: from sklearn.tree import DecisionTreeClassifier In [2]: DecisionTreeClassifier? #查看方法参数和输出属性部分算法参数说明：参数 | 解释 criterion：切分指数选择，即不纯度的衡量标准，除了默认的'gini'指数外还可输入信息熵‘entropy’用以计算信息增益 splitter ：切分策略，默认是以不纯度衡量指标下降最快作为切分依据进行切分，即‘best’ max_depth：选择树的最大深度，如果对其进行设置，实际上相当于强行设置收敛条件（即树伸展几层），默认为None，即伸展至所有叶节点只含有min_samples_split个数为止 min_samples_split：叶节点进行进一步切分时所需最少样本量，默认值为2，低于该值则不会再进行切分 min_samples_leaf：叶节点最小样本量，默认值为1，若小于该数量，则会进行剪枝部分属性说明：属性解释 classes_ ：数据集标签列列名称 feature

分享一个让我进入阿里中间件的个人项目

阅读更多关于分享一个让我进入阿里中间件的个人项目

作者: vangoleo 官网: http://www.vangoleo.com/iris-java/ 背景时光荏苒，进入阿里中间件团队已经快两年时间了。这期间，有幸参与了第四届中间件性能挑战赛的题目组，筹备了以“Dubbo Mesh”为主题的初赛题；和团队一起开展了Dubbo线下meetup活动；将阿里多年双11积累的中间件基础设施最佳实践和方法论，通过阿里云的商业化产品，为广大开发者和企业提供服务。很庆幸能有这样一段难忘的经历。回想起来，能进入中间件团队，和我当初的一个Github项目还有关系。今天把该项目分享给大家。 Q: 什么是中间件团队？ A: 阿里巴巴中间件技术部，是世界顶尖的Java技术团队之一，起源于淘宝平台架构组，是跟随着阿里电商业务和双十一成长起来的技术团队，解决复杂的业务场景、飞速的业务增长、高并发的大促洪峰、层出不穷的稳定性问题。产品包括高分布式RPC服务框架、高可靠分布式消息中间件、分布式数据层、海量数据存储、实时计算、系统性能优化、架构高可用等几大领域的多个产品，这些产品支撑阿里巴巴集团（淘宝、天猫、聚划算、1688、菜鸟）的所有交易和非交易业务系统，安然平稳度过双十一917亿交易成交的挑战。我们开源的中间件组件Dubbo、Rocketmq、Nacos、tengine、Seata等都被很多企业和个人在使用。来自中间件的邀请 2017年的时候

KNN算法和实现

阅读更多关于 KNN算法和实现

KNN要用到欧氏距离 KNN下面的缺点很容易使分类出错（比如下面黑色的点）下面是KNN算法的三个例子demo，第一个例子是根据算法原理实现 import matplotlib.pyplot as plt import numpy as np import operator # 已知分类的数据 x1 = np.array([3,2,1]) y1 = np.array([104,100,81]) x2 = np.array([101,99,98]) y2 = np.array([10,5,2]) scatter1 = plt.scatter(x1,y1,c='r') scatter2 = plt.scatter(x2,y2,c='b') # 未知数据 x = np.array([18]) y = np.array([90]) scatter3 = plt.scatter(x,y,c='k') #画图例 plt.legend(handles=[scatter1,scatter2,scatter3],labels=['labelA','labelB','X'],loc='best') plt.show() # 已知分类的数据 x_data = np.array([[3,104], [2,100], [1,81], [101,10], [99,5], [81,2]]) y_data =

Randomly sample groups

阅读更多关于 Randomly sample groups

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: Given a dataframe df with a column called group , how do you randomly sample k groups from it in dplyr? It should return all rows from k groups (given there are at least k unique values in df$group ), and every group in df should be equally likely to be returned. 回答1: Just use sample() to choose some number of groups iris %>% filter(Species %in% sample(levels(Species),2)) 回答2: Though why you'd want to do this in dplyr makes no sense to me: library(microbenchmark) microbenchmark(dplyr= iris %>% filter(Species %in% sample(levels(Species),2)),

R: ggfortify: “Objects of type prcomp not supported by autoplot”

阅读更多关于 R: ggfortify: “Objects of type prcomp not supported by autoplot”

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am trying to use ggfortify to visualize the results of a PCA I did using prcomp. sample code: iris.pca <- iris[c(1, 2, 3, 4)] autoplot(prcomp(iris.pca)) Error: Objects of type prcomp not supported by autoplot. Please use qplot() or ggplot() instead. What is odd is that autoplot is specifically designed to handle the results of prcomp - ggplot and qplot can't handle objects like this. I'm running R version 3.2 and just downloaded ggfortify off of github this AM. Can anyone explain this message? 回答1: I'm guessing that you didn't load the

AttributeError: module 'tensorflow.contrib.learn' has no attribute 'TensorFlowDNNClassifier'

阅读更多关于 AttributeError: module 'tensorflow.contrib.learn' has no attribute 'TensorFlowDNNClassifier'

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: This is the ml tensorflow code I am trying to execute - import tensorflow.contrib.learn as skflow from sklearn import datasets, metrics iris = datasets.load_iris() classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10, 20, 10], n_classes=3) classifier.fit(iris.data, iris.target) score = metrics.accuracy_score(iris.target, classifier.predict(iris.data)) print("Accuracy: %f" % score) It gives the following error - Traceback (most recent call last): File "C:\Users\admin\test3.py", line 5, in classifier = skflow.TensorFlowDNNClassifier

订阅 iris