mean | 易学教程

Box plot showing mean as a line

阅读更多关于 Box plot showing mean as a line

问题 Is it possible to create a boxplot that shows both mean and median as a line with the standard boxplot function of R ? My current solution displays the mean as a cross: set.seed(1234) values <- runif(10,0,1) boxplot(values) points(mean(values),col="red",pch=4,lwd = 4) 回答1: For the sake of completeness, you could also overplot: set.seed(753) df <- data.frame(y=rt(100, 4), x=gl(5, 20)) bx.p <- boxplot(y~x, df) bx.p$stats[3, ] <- unclass(with(df, by(y, x, FUN = mean))) bxp(bx.p, add=T, boxfill=

《利用python进行数据分析》读书笔记--第九章数据聚合与分组运算（一）

阅读更多关于《利用python进行数据分析》读书笔记--第九章数据聚合与分组运算（一）

http://www.cnblogs.com/batteryhp/p/5046450.html 对数据进行分组并对各组应用一个函数，是数据分析的重要环节。数据准备好之后，通常的任务就是计算分组统计或生成透视表。groupby函数能高效处理数据，对数据进行切片、切块、摘要等操作。可以看出这跟SQL关系密切，但是可用的函数有很多。在本章中，可以学到：根据一个或多个键（可以是函数、数组或DataFrame列名）拆分pandas对象计算分组摘要统计，如计数、平均值、标准差、，或自定义函数对DataFrame的列应用各种各样的函数应用组内转换或其他运算，如规格化、线性回归、排名或选取子集等计算透视表和交叉表执行分位数分析以及其他分组分析对时间数据的聚合也称重采样（resampling），在第十章介绍。 1、GroupBy技术很多数据处理过程都经历“拆分-应用-合并”的过程。即根据一个或多个键进行分组、每一个应用函数、再进行合并。分组键有多种形式：列表或数组，长度与待分组的轴一样表示DataFrame某个列明的值字典或Series，给出待分组轴上的值与分组名之间的对应关系函数，用于处理轴索引或索引中的各个标签下面开始写例子。简单实例 #-*- encoding: utf-8 –*- #分组实例 import numpy as np import pandas as

机器学习回归问题(线性回归岭回归逐步回归）

阅读更多关于机器学习回归问题(线性回归岭回归逐步回归）

一.线性回归线性回归就是将输入项分别乘以一些常量，在将结果加起来得到输出。假定输入数据存放在矩阵 x 中，而回归系数存放在向量 w 中。那么预测结果可以通过Y=X的转置*W得出。所以我们求解线性回归模型的核心就在于求解w，如何求呢？首先，我们一定是希望预测出来的值和实际值之间的误差越小越好，所以我们评判w好坏，就可以采用实际值与真实值之差表示，但是这个差有正有负，为了避免正负相互抵消的情况，我们采用平方误差（也就是最小二乘法）平方误差，我们也可以叫他损失函数。我们现在就是要以w为变量求解损失函数的最小值。我们可以对w进行求导，令其为0，可得到我们所要求解w所需的计算公式。局部加权线性回归线性回归的一个问题是有可能出现欠拟合现象，因为它求的是具有小均方误差的无偏估计。显而易见，如果模型欠拟合将不能取得好的预测效果。所以有些方法允许在估计中引入一些偏差，从而降低预测的均方误差。其中的一个方法是局部加权线性回归。在该算法中，我们给待预测点附近的每个点赋予一定的权重；在这个子集上基于小均方差来进行普通的回归。局部加权线性回归的基本思想：设计代价函数时，待预测点附近的点拥有更高的权重，权重随着距离的增大而缩减——这也就是名字中“局部”和“加权”的由来。权重如何求取：区别在于此时的代价函数中多了一个权重函数W，这个W要保证，越靠近待测点附近权值越大

column vector with row means — with std::accumulate?

阅读更多关于 column vector with row means — with std::accumulate?

问题 In an effort to be as lazy as possible I read in a matrix as vector< vector<double> > data ( rows, vector<double> ( columns ) ); and try to use as many STL goodies as I can. One thing I need to do next is to compute the row means. In C-style programming that would be vector<double> rowmeans( data.size() ); for ( int i=0; i<data.size(); i++ ) for ( int j=0; j<data[i].size(); j++ ) rowmeans[i] += data[i][j]/data[i].size(); In In C++, how to compute the mean of a vector of integers using a

Tensorflow之计算tensor平均值

阅读更多关于 Tensorflow之计算tensor平均值

https://www.tensorflow.org/versions/r0.12/api_docs/python/math_ops.html#reduce_mean tf.reduce_mean(input_tensor, axis=None, keep_dims=False, name=None, reduction_indices=None) 计算tensor中各个维度上元素的平均值. 在给定维度axis上进行删减. keep_dims被设置为false的话, 原始变量的维度会减少1. 如果不对axis进行赋值, 那么返回所有元素的平均值. 例子: # 'x' is [[1., 1.] # [2., 2.]] tf.reduce_mean(x) ==> 1.5 tf.reduce_mean(x, 0) ==> [1.5, 1.5] tf.reduce_mean(x, 1) ==> [1., 2.] 来源： https://www.cnblogs.com/huangshiyu13/p/6534264.html

How to create mean and s.d. columns in data.table

阅读更多关于 How to create mean and s.d. columns in data.table

问题 The following code/outcome baffles me as to why data.table returns NA for the mean functions and not the sd function. library(data.table) test <- data.frame('id'=c(1,2,3,4,5), 'A'=seq(2,9,length=5), 'B'=seq(3,9,length=5), 'C'=seq(4,9,length=5), 'D'=seq(5,9,length=5)) test <- as.data.table(test) test[,`:=`(mean_test = mean(.SD), sd_test = sd(.SD)),by=id,.SDcols=c('A','B','C','D')] > test id A B C D mean_test sd_test 1: 1 2.00 3.0 4.00 5 NA 1.2909944 2: 2 3.75 4.5 5.25 6 NA 0.9682458 3: 3 5.50

scikit-learn：3.3. Model evaluation: quantifying the quality of predictions

阅读更多关于 scikit-learn：3.3. Model evaluation: quantifying the quality of predictions

參考：http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter 三种方法评估模型的预測质量： Estimator score method : Estimators都有 score method作为默认的评估标准，不属于本节内容。详细參考不同estimators的文档。 Scoring parameter : Model-evaluation tools using cross-validation (such as cross_validation.cross_val_score and grid_search.GridSearchCV ) rely on an internal scoring strategy. 本节讨论 The scoring parameter: defining model evaluation rules .（參考第一小节） Metric functions : The metrics module 能较全面评价预測质量，本节讨论 Classification metrics , Multilabel ranking metrics , Regression metrics and Clustering metrics .（參考二、三、四、五小节）

compute mean in python for a generator

阅读更多关于 compute mean in python for a generator

问题 I'm doing some statistics work, I have a (large) collection of random numbers to compute the mean of, I'd like to work with generators, because I just need to compute the mean, so I don't need to store the numbers. The problem is that numpy.mean breaks if you pass it a generator. I can write a simple function to do what I want, but I'm wondering if there's a proper, built-in way to do this? It would be nice if I could say "sum(values)/len(values)", but len doesn't work for genetators, and sum

Finding the mean and standard deviation of a timedelta object in pandas df

阅读更多关于 Finding the mean and standard deviation of a timedelta object in pandas df

问题 I would like to calculate the mean and standard deviation of a timedelta by bank from a dataframe with two columns shown below. When I run the code (also shown below) I get the below error: pandas.core.base.DataError: No numeric types to aggregate My dataframe: bank diff Bank of Japan 0 days 00:00:57.416000 Reserve Bank of Australia 0 days 00:00:21.452000 Reserve Bank of New Zealand 55 days 12:39:32.269000 U.S. Federal Reserve 8 days 13:27:11.387000 My code: means = dropped.groupby('bank')

using mean with .SD and .SDcols in data.table

阅读更多关于 using mean with .SD and .SDcols in data.table

问题 I am writing a very simple function to summarize columns of data.tables. I am passing one column at a time to the function, and then doing some diagnostics to figure out the options for summarization, and then doing the summarization. I am doing this in data.table to allow for some very large datasets. So, I am using .SDcols to pass in the column to summarize, and using functions on .SD in the j part of a data.table expression. Since I am passing in one column at a time, I am not using lapply

订阅 mean