mean

Wrong result from mean(x, na.rm = TRUE)

独自空忆成欢 提交于 2019-12-02 00:30:39
I want to compute the mean, min and max of a series of Managers returns, as follows: ManagerRet <-data.frame(diff(Managerprices)/lag(Managerprices,k=-1)) I then replace return = 0 with NaN since data are extracted from a database and not all the dates are populated. ManagerRet = replace(ManagerRet,ManagerRet==0,NaN) I have the following 3 function > min(ManagerRet,na.rm = TRUE) [1] -0.0091716 > max(ManagerRet,na.rm = TRUE) [1] 0.007565 > mean(ManagerRet,na.rm = TRUE)*252 [1] NaN Why the mean function returns a NaN value while min and max performe calculation properly? Below you can find the

Making a custom window type for pandas rolling mean

放肆的年华 提交于 2019-12-01 23:02:10
问题 I understand rolling allows you to specify the window type used for calculating the rolling mean. The docs list a variety of windows type options available here. However, I am trying to use a symmetrically weighted window type of length 4 whose definition is like (and is not available as built-in): (a + 2*b + 2*c + d)/6 where a,b,c and d are the four elements of the rolling window at any given time and [1/6, 2/6, 2/6, 1/6] would be the associated weights. If I go by the default window type

means and SD for columns in a dataframe with NA values

喜欢而已 提交于 2019-12-01 20:46:22
I'm trying to calculate the mean and standard deviation of several columns (except the first column) in a data.frame with NA values. I've tried colMeans , sapply , etc., to create a loop that runs through the data.frame and then stores means and standard deviations in a separate table but keep getting a "FUN" error. any help would be great. Thanks a sapply(df, function(cl) list(means=mean(cl,na.rm=TRUE), sds=sd(cl,na.rm=TRUE))) col1 col2 col3 col4 col5 means 3 8 12.5 18.25 22.5 sds 1.581139 1.581139 1.290994 1.707825 1.290994 as.data.frame( t(sapply(df, function(cl) list(means=mean(cl,na.rm

ch09-GroupBy

断了今生、忘了曾经 提交于 2019-12-01 18:28:17
split->apply->combine 12 import numpy as npimport pandas as pd 12345 df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'], 'key2' : ['one', 'two', 'one', 'two', 'one'], 'data1' : np.random.randn(5), 'data2' : np.random.randn(5)})df */ /*--> */ key1 key2 data1 data2 0 a one 1.587125 -0.517650 1 a two 0.206854 1.503013 2 b one 1.074688 -1.310088 3 b two 0.306591 2.236456 4 a one 0.462624 0.643336 123 grouped = df['data1'].groupby(df['key1'])grouped #一个groupby对象 <pandas.core.groupby.groupby.SeriesGroupBy object at 0x10637ba20> 1 grouped.mean() key1 a 0.752201 b 0.690639 Name: data1, dtype:

05_线性回归法

穿精又带淫゛_ 提交于 2019-12-01 16:41:03
5-1 简单线性回归 线性回归算法 解决回归问题 思想简单,实现容易 许多强大的非线性模型的基础 结果具有很好的可解释性 蕴含机器学习中的很多重要思想 对比分类问题,两轴均为特征。 总结一下 说明 线性回归问题描述 解决思路 近乎所有参数学习算法都是这样的套路 5-2 最小二乘法 推导,得 (1)对 b 求偏导 (2)对 a 求偏导 5-3 简单线性回归的实现 首先,看一个简单的例子 import numpy as np import matplotlib.pyplot as plt x = np.array([1., 2., 3., 4., 5.]) y = np.array([1., 3., 2., 3., 5.]) plt.scatter(x, y) plt.axis([0, 6, 0, 6]) plt.show() 接下来计算相关值 x_mean = np.mean(x) y_mean = np.mean(y) num = 0.0 d = 0.0 for x_i, y_i in zip(x, y): num += (x_i - x_mean) * (y_i - y_mean) d += (x_i - x_mean) ** 2 a = num/d b = y_mean - a * x_mean print("(a, b): ", (a, b)) # (a, b): (0.8,

(译)What does explicit keyword mean?

半腔热情 提交于 2019-12-01 16:28:35
原答案摘自Stack Overflow: What does the explicit keyword mean? 在把参数传递给函数时,如果变量类型不匹配的话,C++编译器会即尽可能的做一次隐式的类型转换来满足函数的参数要求。隐式的类型转换就会涉及到调用转换对象的单参数构造函数,下面是一个隐式转换的例子: 12345678910111213 class {public: Foo (int foo) : m_foo (foo) { } int GetFoo () { return m_foo; }private: int m_foo;}; 一个函数用了 Foo 对象作为参数: 1234 void DoBar (Foo foo){ int i = foo.GetFoo ();} 现在我们如下调用 DoBar 函数: 1234 int main (){ DoBar (42);} 很显然,实参类型不是 Foo ,而是 int ,但是在 Foo 对象里面有一个单参数的构造器接受了一个int类型来构造对象,因此编译器就会隐式的调用这个构造函数来把 int 转换成一个 Foo 。 给这个构造器显示的指定为 explicit 就是告知编译器我们不想要这种隐式的类型转换,所以编译器就会禁止这种转换,如果我们再次使用 DoBar(42) 就会报错。

http_load使用详解

久未见 提交于 2019-12-01 15:35:51
http_load使用详解 1.什么是http_load http_load是一款基于Linux平台的web服务器性能测试工具,用于测试web服务器的吞吐量与负载,web页面的性能。 2.http_load的安装 1)下载地址 wget http://www.acme.com/software/http_load/http_load-12mar2006.tar.gz 2)安装 tar xzvfhttp_load-12mar2006.tar.gz make make install 3.http_load的使用 1)创建文件 vi urls 写入要测的服务器域名或IP地址 比如urls里是http://www.baidu.com/ 亦或是192.168.0.1这一类的都可以测 2)使用示例 ./http_load -rate 5 -seconds 10 urls -parallel 简写-p :含义是并发的用户进程数。 -fetches 简写-f :含义是总计的访问次数 -rate 简写-p :含义是每秒的访问频率 -seconds简写-s :含义是总计的访问时间 执行结果: 说明执行了一个持续时间10秒的测试,每秒的频率为5。 49 fetches, 2 max parallel, 289884 bytes, in 10.0148 seconds 5916 mean bytes

Use mean in ggplot boxplots instead of median

社会主义新天地 提交于 2019-12-01 14:53:03
Is it possible to use the mean in a ggplot boxplot instead of the median? Reason I ask is that in my data the median = 0.0 and mean = 0.40 and I am interested in the mean. From the help ?geom_boxplot : library(ggplot2) # It's possible to draw a boxplot with your own computations if you # use stat = "identity": y <- rnorm(100) df <- data.frame( x = 1, y0 = min(y), y25 = quantile(y, 0.25), y50 = median(y), # <=== replace by mean y75 = quantile(y, 0.75), y100 = max(y) ) ggplot(df, aes(x)) + geom_boxplot( aes(ymin = y0, lower = y25, middle = y50, upper = y75, ymax = y100), stat = "identity" ) So

Use mean in ggplot boxplots instead of median

て烟熏妆下的殇ゞ 提交于 2019-12-01 13:34:27
问题 Is it possible to use the mean in a ggplot boxplot instead of the median? Reason I ask is that in my data the median = 0.0 and mean = 0.40 and I am interested in the mean. 回答1: From the help ?geom_boxplot : library(ggplot2) # It's possible to draw a boxplot with your own computations if you # use stat = "identity": y <- rnorm(100) df <- data.frame( x = 1, y0 = min(y), y25 = quantile(y, 0.25), y50 = median(y), # <=== replace by mean y75 = quantile(y, 0.75), y100 = max(y) ) ggplot(df, aes(x)) +

Why are `colMeans()` and `rowMeans()` functions faster than using the mean function with `lapply()`?

懵懂的女人 提交于 2019-12-01 09:17:10
What I want to ask is, algorithmically, what do the rowMeans() and colMeans() functions do to optimize speed? In addition, consider what lapply() does. It sets up repeated calls to the function mean() . So as well as the overhead of actually computing a mean (which is done in fast C code), the lapply() version repeatedly incurs the overhead of the sanity checking code and method dispatch associated with mean() . rowMeans() and colMeans() incur only a single set of sanity checks as internally, their C code is optimised to loop over the rows/columns there rather than via separate R calls.