estimator | 易学教程

【TensorFlow】踩坑记

阅读更多关于【TensorFlow】踩坑记

文章目录 1. tf.estimator.BoostedTreesRegressor 1. tf.estimator.BoostedTreesRegressor Error： For now , only support Numeric column with shape less than 2 , but column `estimator_features` got : ( 1 , 234 ) 运行环境： tf . __version__ == '1.14.0' 源码： /tensorflow_estimator/python/estimator/canned/boosted_trees.py def _get_transformed_features_and_merge_with_previously_transformed ( . . . ) : . . . for column in all_sorted_columns : . . . elif isinstance ( column , ( feature_column_lib . NumericColumn , fc_old . _NumericColumn ) ) : source_name = column . name tensor = transformed_features [ column ] #

机器学习中进行学习训练的一般步骤

阅读更多关于机器学习中进行学习训练的一般步骤

一般步骤一、得到数据集并做预处理 1.分割数据集（train_test_split）可以用train_test_split来处理得到的数据集，代码基本形式如下： from sklearn . model_selection import train_test_split X_train , X_test , Y_train , Y_test = train_test_split ( X_original , Y_original , test_size = 0.2 ) 2.数据集归一化使用StandScaler进行归一化，归一化目的是为了让数据之间的差别不那么大。代码基本形式如下： from sklearn . preprocessing import StandardScaler scaler = StandardScaler ( ) scaler . fit_transform ( X_train ) scaler . transform ( X_test ) 二、选择机器学习中的算法，确定模型首先，确定目的。目的是为了分类，还是为了回归。之后，确定是多类别输出、多标签输出还是单一输出，如何确定可查看我另一篇文章：点击此处然后，根据分类和回归来选择估计器estimator、得分指标（如MSE、ACCURACY）等。算法很多，不同算法对应不同estimator

线性回归的模型保存与加载

阅读更多关于线性回归的模型保存与加载

十一、模型的保存和加载 1.sklearn模型的保存和加载api from sklearn.externals import joblib 保存：joblib.dump(estimator,‘test.pkl’) 加载：estimator = joblib.load(‘test.pkl’) 2.线性回归的模型保存和加载案例 def load_dump_demo ( ) : ''' 模型保存和加载 ''' # 1.获取数据 data = load_boston ( ) # 2.数据集划分 x_train , x_test , y_train , y_test = train_test_split ( data . data , data . target , random_state = 22 ) # 3.特征工程-标准化 transfer = StandardScaler ( ) x_train = transfer . fit_transform ( x_train ) x_test = transfer . fit_transform ( x_test ) # 4.机器学习-线性回归(岭回归) # # 4.1 模型训练 # estimator = Ridge(alpha=1) # estimator.fit(x_train, y_train) # # # 4.2 模型保存 #

【机器学习模型调参】GridSearchCV模型调参利器

阅读更多关于【机器学习模型调参】GridSearchCV模型调参利器

导入模块sklearn.model_selection from sklearn . model_selection import GridSearchCV GridSearchCV 称为网格搜索交叉验证调参，它通过遍历传入的参数的所有排列组合，通过交叉验证的方式，返回所有参数组合下的评价指标得分，GridSearchCV 函数的参数详细解释如下： class sklearn . model_selection . GridSearchCV（estimator，param_grid，scoring = None，n_jobs = None，iid = 'deprecated' ，refit = True，cv = None，verbose = 0 ，pre_dispatch = '2 * n_jobs' ，error_score = nan，return_train_score = False ） GridSearchCV官方说明参数： estimator：scikit - learn 库里的算法模型； param_grid：需要搜索调参的参数字典； scoring：评价指标，可以是 auc , rmse，logloss等； n_jobs：并行计算线程个数，可以设置为 - 1 ，这样可以充分使用机器的所有处理器，并行数量越多，有利于缩短调参时间； iid：如果设置为True

Spark MLlib

阅读更多关于 Spark MLlib

Spark MLlib 一、Spark MLlib 模型选择与调参 CrossValidator TrainValidationSplit MLlib目录结构 MLlib处理流程 MLlib构成数据类型（Data Type）数学统计计算库机器学习管道（pipeline）机器学习算法二、Spark MLlib算法库 2.1 推荐算法（AlterNating Least Squares）(ALS) 2.2 ALS：Scala 部分内容原文地址：掘金：美图数据团队：从Spark MLlib到美图机器学习框架实践一、Spark MLlib 在 Spark 官网上展示了逻辑回归算法在 Spark 和 Hadoop 上运行性能比较，从下图可以看出 MLlib 比 MapReduce 快了 100 倍。 Spark MLlib 主要包括以下几方面的内容：学习算法：分类、回归、聚类和协同过滤；特征处理：特征提取、变换、降维和选择；管道(Pipeline)：用于构建、评估和调整机器学习管道的工具；持久性：保存和加载算法，模型和管道；实用工具：线性代数，统计，最优化，调参等工具。 Spark MLlib 典型流程如下：构造训练数据集构建各个 Stage Stage 组成 Pipeline 启动模型训练评估模型效果计算预测结果通过一个 Pipeline

sklearn之交叉验证

阅读更多关于 sklearn之交叉验证

一、简介　　在用机器学习训练模型的时候，会将数据集D划分成训练集和测试集，因为如果在相同的数据上训练并测试无法评估模型的效果，常用的划分方法有K折交叉验证、p次k折交叉验证、留出法、留一法、留P法、随机分配、自助法等。另外，在训练模型的时候，经常需要进行调参，当我们有一堆参数的时候，也可以用类似的较差验证的方式依次使用不同的参数建模，最后选择最好的一个参数。在sklearn中要实现主要用sklearn.model_selection包的各种类，下面进行详细介绍。二、数据集交叉验证方法 1、留出法　　留出法的方法很简单，将数据集D分为两个部分，一个作为训练集另一个作为测试集，一般会选择70%的数据作为训练集。对应的方法：　　 sklearn.model_selection.train_test_split(*arrays, **options) *arrays：数组，可以传入多个，例如同时传入x,y或者传入x,y,z。传入的数据类型为lists,、numpy arrays、scipy-sparse matrices、pandas dataframes。 test_size：如果是float数据，表示测试集的占比；如果是None则默认取占比0.25；如果是int数据，则表示测试集样本个数。 train_size：如果是float数据，表示训练集的占比

Prediction from model saved with `tf.estimator.Estimator` in Tensorflow

阅读更多关于 Prediction from model saved with `tf.estimator.Estimator` in Tensorflow

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am using tf.estimator.Estimator to train a model: def model_fn(features, labels, mode, params, config): input_image = features["input_image"] eval_metric_ops = {} predictions = {} # Create model with tf.name_scope('Model'): W = tf.Variable(tf.zeros([784, 10]), name="W") b = tf.Variable(tf.zeros([10]), name="b") logits = tf.nn.softmax(tf.matmul(input_image, W, name="MATMUL") + b, name="logits") loss = None train_op = None if mode != tf.estimator.ModeKeys.PREDICT: loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels,

How to use a decaying learning rate with an estimator in tensorflow?

阅读更多关于 How to use a decaying learning rate with an estimator in tensorflow?

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am trying to use a LinearClassifier with a GradientDescentOptimizer with a decaying learning rate. My code: def main(): # load data features = np.load('data/feature_data.npz') tx = features['arr_0'] y = features['arr_1'] ## Prepare logistic regression n_point, n_feat = tx.shape # Input functions def get_input_fn_from_numpy(tx, y, num_epochs=None, shuffle=True): # Preprocess data return tf.estimator.inputs.numpy_input_fn( x={"x":tx}, y=y, num_epochs=num_epochs, shuffle=shuffle, batch_size=128 ) cols_label = "x" feature_cols = [tf.contrib

GridSearchCV

阅读更多关于 GridSearchCV

GridSearchCV 简介：常用参数解读： estimator：所使用的分类器，如estimator=RandomForestClassifier(min_samples_split=100,min_samples_leaf=20,max_depth=8,max_features=‘sqrt‘,random_state=10), 并且传入除需要确定最佳的参数之外的其他参数。每一个分类器都需要一个scoring参数，或者score方法。 param_grid：值为字典或者列表，即需要最优化的参数的取值，param_grid =param_test1，param_test1 = {‘n_estimators‘:range(10,71,10)}。 scoring :准确度评价标准，默认None,这时需要使用score函数；或者如scoring=‘roc_auc‘，根据所选模型不同，评价准则不同。字符串（函数名），或是可调用对象，需要其函数签名形如：scorer(estimator, X, y)；如果是None，则使用estimator的误差估计函数。scoring参数选择如下：参考地址： http://scikit-learn.org/stable/modules/model_evaluation.html iid:默认True,为True时，默认为各个样本fold概率分布一致

Tensorflow API 讲解――tf.estimator.Estimator

阅读更多关于 Tensorflow API 讲解――tf.estimator.Estimator

class Estimator(builtins.object) Estimator 类，用来训练和验证 TensorFlow 模型。 Estimator 对象包含了一个模型 model_fn ，这个模型给定输入和参数，会返回训练、验证或者预测等所需要的操作节点。所有的输出（检查点、事件文件等）会写入到 model_dir ，或者其子文件夹中。如果 model_dir 为空，则默认为临时目录。 config 参数为 tf.estimator.RunConfig 对象，包含了执行环境的信息。如果没有传递 config ，则它会被 Estimator 实例化，使用的是默认配置。 params 包含了超参数。 Estimator 只传递超参数，不会检查超参数，因此 params 的结构完全取决于开发者。 Estimator 的所有方法都不能被子类覆盖（它的构造方法强制决定的）。子类应该使用 model_fn 来配置母类，或者增添方法来实现特殊的功能。 Estimator 不支持 Eager Execution（eager execution能够使用Python 的debug工具、数据结构与控制流。并且无需使用placeholder、session，计算结果能够立即得出）。 1、 __init__(self, model_fn, model_dir=None, config=None,

订阅 estimator