statistics

how to calculate the Euclidean norm of a vector in R?

一笑奈何 提交于 2019-12-20 09:47:49
问题 I tried norm , but I think it gives the wrong result. (the norm of c(1, 2, 3) is sqrt(1*1+2*2+3*3) , but it returns 6 .. x1 <- 1:3 norm(x1) # Error in norm(x1) : 'A' must be a numeric matrix norm(as.matrix(x1)) # [1] 6 as.matrix(x1) # [,1] # [1,] 1 # [2,] 2 # [3,] 3 norm(as.matrix(x1)) # [1] 6 Does anyone know what's the function to calculate the norm of a vector in R? 回答1: This is a trivial function to write yourself: norm_vec <- function(x) sqrt(sum(x^2)) 回答2: norm(c(1,1), type="2") # 1

How to get the correlation between two timeseries using Pandas

我的梦境 提交于 2019-12-20 09:38:03
问题 I have two sets of temperature date, which have readings at regular (but different) time intervals. I'm trying to get the correlation between these two sets of data. I've been playing with Pandas to try to do this. I've created two timeseries, and am using TimeSeriesA.corr(TimeSeriesB) . However, if the times in the 2 timeSeries do not match up exactly (they're generally off by seconds), I get Null as an answer. I could get a decent answer if I could: a) Interpolate/fill missing times in each

Per Process disk read/write statistics in Mac OS X

萝らか妹 提交于 2019-12-20 09:23:25
问题 How do I get programatically per process disk i/o statistics in Mac OS X. In 'Activity Monitor' application or in 'top' command we can only get whole system disk i/o statistics. For reference Similar question asked for PC. 回答1: Use iotop (as root), for example: iotop -C 3 10 But the best way (for me) is: sudo fs_usage -f filesys 回答2: Since there isn't an answer here about how to do this programatically, here's some more info. You can get some info out of libproc if you can use C/C++

How to determine what is the probability distribution function from a numpy array?

依然范特西╮ 提交于 2019-12-20 09:17:05
问题 I have searched around and to my surprise it seems that this question has not been answered. I have a Numpy array containing 10000 values from measurements. I have plotted a histogram with Matplotlib, and by visual inspection the values seem to be normally distributed: However, I would like to validate this. I have found a normality test implemented under scipy.stats.mstats.normaltest, but the result says otherwise. I get this output: (masked_array(data = [1472.8855375088663], mask = [False],

Scikit-learn is returning coefficient of determination (R^2) values less than -1

混江龙づ霸主 提交于 2019-12-20 09:04:29
问题 I'm doing a simple linear model. I have fire = load_data() regr = linear_model.LinearRegression() scores = cross_validation.cross_val_score(regr, fire.data, fire.target, cv=10, scoring='r2') print scores which yields [ 0.00000000e+00 0.00000000e+00 -8.27299054e+02 -5.80431382e+00 -1.04444147e-01 -1.19367785e+00 -1.24843536e+00 -3.39950443e-01 1.95018287e-02 -9.73940970e-02] How is this possible? When I do the same thing with the built in diabetes data, it works perfectly fine, but for my data

Is there a good R API for accessing Google Docs?

核能气质少年 提交于 2019-12-20 08:59:32
问题 I'm using R for data analysis, and I'm sharing some data with collaborators via Google docs. Is there a simple interface that I can use to access a R data.frame object to and from a Google Docs spreadsheet? If not, is there a similar API in other languages? 回答1: There are two packages: RGoogleDocs on Omegahat: the package allows you to get a list of the documents and details about each of them, download the contents of a document, remove a document, and upload a document, even binary files.

Trend lines ( regression, curve fitting) java library

情到浓时终转凉″ 提交于 2019-12-20 08:52:55
问题 I'm trying to develop an application that would compute the same trend lines that excel does, but for larger datasets. But I'm not able to find any java library that calculates such regressions. For the linera model I'm using Apache Commons math, and for the other there was a great numerical library from Michael Thomas Flanagan but since january it is no longer available: http://www.ee.ucl.ac.uk/~mflanaga/java/ Do you know any other libraries, code repositories to calculate these regressions

Pointwise mutual information on text

痴心易碎 提交于 2019-12-20 08:40:48
问题 I was wondering how one would calculate the pointwise mutual information for text classification. To be more exact, I want to classify tweets in categories. I have a dataset of tweets (which are annotated), and I have a dictionary per category of words which belong to that category. Given this information, how is it possible to calculate the PMI for each category per tweet, to classify a tweet in one of these categories. 回答1: PMI is a measure of association between a feature (in your case a

Implementing a Kolmogorov Smirnov test in python scipy

╄→尐↘猪︶ㄣ 提交于 2019-12-20 08:38:53
问题 I have a data set on N numbers that I want to test for normality. I know scipy.stats has a kstest function but there are no examples on how to use it and how to interpret the results. Is anyone here familiar with it that can give me some advice? According to the documentation, using kstest returns two numbers, the KS test statistic D and the p-value. If the p-value is greater than the significance level (say 5%), then we cannot reject the hypothesis that the data come from the given

Implementing a Kolmogorov Smirnov test in python scipy

无人久伴 提交于 2019-12-20 08:38:17
问题 I have a data set on N numbers that I want to test for normality. I know scipy.stats has a kstest function but there are no examples on how to use it and how to interpret the results. Is anyone here familiar with it that can give me some advice? According to the documentation, using kstest returns two numbers, the KS test statistic D and the p-value. If the p-value is greater than the significance level (say 5%), then we cannot reject the hypothesis that the data come from the given