statistics

How to calculate percentage change from different rows over different spans

早过忘川 提交于 2020-01-21 10:02:49
问题 I am trying to calculate the percentage change in price for quarterly data of companies recognized by a gvkey (1001, 1384, etc...). and it's corresponding quarterly stock price, PRCCQ . gvkey PRCCQ 1 1004 23.750 2 1004 13.875 3 1004 11.250 4 1004 10.375 5 1004 13.600 6 1004 14.000 7 1004 17.060 8 1004 8.150 9 1004 7.400 10 1004 11.440 11 1004 6.200 12 1004 5.500 13 1004 4.450 14 1004 4.500 15 1004 8.010 What I am trying to do is add 8 columns showing 1 quarter return, 2 quarter return, etc.

How to calculate percentage change from different rows over different spans

拈花ヽ惹草 提交于 2020-01-21 10:02:25
问题 I am trying to calculate the percentage change in price for quarterly data of companies recognized by a gvkey (1001, 1384, etc...). and it's corresponding quarterly stock price, PRCCQ . gvkey PRCCQ 1 1004 23.750 2 1004 13.875 3 1004 11.250 4 1004 10.375 5 1004 13.600 6 1004 14.000 7 1004 17.060 8 1004 8.150 9 1004 7.400 10 1004 11.440 11 1004 6.200 12 1004 5.500 13 1004 4.450 14 1004 4.500 15 1004 8.010 What I am trying to do is add 8 columns showing 1 quarter return, 2 quarter return, etc.

R cluster analysis and dendrogram with correlation matrix

℡╲_俬逩灬. 提交于 2020-01-21 09:21:50
问题 I have to perform a cluster analysis on a big amount of data. Since I have a lot of missing values I made a correlation matrix. corloads = cor(df1[,2:185], use = "pairwise.complete.obs") Now I have problems how to go on. I read a lot of articles and examples, but nothing really works for me. How can I find out how many clusters are good for me? I already tried this: dissimilarity = 1 - corloads distance = as.dist(dissimilarity) plot(hclust(distance), main="Dissimilarity = 1 - Correlation",

How to get both MSE and R2 from a sklearn GridSearchCV?

五迷三道 提交于 2020-01-21 05:38:45
问题 I can use a GridSearchCV on a pipeline and specify scoring to either be 'MSE' or 'R2' . I can then access gridsearchcv._best_score to recover the one I specified. How do I also get the other score for the solution found by GridSearchCV? If I run GridSearchCV again with the other scoring parameter, it might not find the same solution, and so the score it reports might not correspond to the same model as the one for which we have the first value. Maybe I can extract the parameters and supply

Z3 real arithmetic and statistics

耗尽温柔 提交于 2020-01-20 08:43:24
问题 Given a problem that is encoded using Z3's reals, which of the statistics that Z3 /smt2 /st produces might be helpful in order to judge if the reals engine "has problems/does lots of work"? In my case, I have two mostly equivalent encodings of the problem, both using reals. The "small" difference in the encoding, however, makes a big difference in runtime, namely, that encoding A takes 2:30min and encoding B 13min. The Z3 statistics show that conflicts and quant-instantiations are mostly

one way ANOVA with repeated measurements - Not within-subjects

[亡魂溺海] 提交于 2020-01-17 07:45:32
问题 I'm trying to conduct a one-way ANOVA with repeated measurements; however, the repeated measurements are independent, they do not represent a measurement of a subject under different conditions, but simply a replication of the same conditions. This means if I obtain two measurements, for example, for one subject and they are different, it's only due to randomness. I looked around and there seems to be a within-subjects ANOVA, but that assumes that the measurements per subject are correlated,

one way ANOVA with repeated measurements - Not within-subjects

青春壹個敷衍的年華 提交于 2020-01-17 07:45:08
问题 I'm trying to conduct a one-way ANOVA with repeated measurements; however, the repeated measurements are independent, they do not represent a measurement of a subject under different conditions, but simply a replication of the same conditions. This means if I obtain two measurements, for example, for one subject and they are different, it's only due to randomness. I looked around and there seems to be a within-subjects ANOVA, but that assumes that the measurements per subject are correlated,

one way ANOVA and TUKEY in R with conditions

前提是你 提交于 2020-01-16 16:29:09
问题 I am trying to find the mean differences between my variable stim_ending_t which contains the following 6 factors: 1, 1.5, 2, 2.5, 3, 3.5 You can access the df Here stim_ending_t visbility soundvolume Opening_text m sd coefVar <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> 1 1 0 0 Now focus on the Image 1.70 1.14 0.670 2 1 0 0 Now focus on the Sound 1.57 0.794 0.504 3 1 0 1 Now focus on the Image 1.55 1.09 0.701 4 1 0 1 Now focus on the Sound 1.77 0.953 0.540 5 1 1 0 Now focus on the Image 1.38 0

Computing autocorrelation of vectors with numpy

末鹿安然 提交于 2020-01-16 14:32:42
问题 I'm struggling to come up with a non-obfuscating, efficient way of using numpy to compute a self correlation function in a set of 3D vectors. I have a set of vectors in a 3d space, saved in an array a = array([[ 0.24463039, 0.58350592, 0.77438803], [ 0.30475903, 0.73007075, 0.61165238], [ 0.17605543, 0.70955876, 0.68229821], [ 0.32425896, 0.57572195, 0.7506 ], [ 0.24341381, 0.50183697, 0.83000565], [ 0.38364726, 0.62338687, 0.68132488]]) their self correlation function is defined as in case

Loop through a .csv file in R, computing relative frequencies?

穿精又带淫゛_ 提交于 2020-01-16 12:36:12
问题 I'm new to R and I'm trying to create a .R script that will open up a .csv file of mine and compute some frequencies. There are headers in this file and the values associated with them are either 1,0,NA, or -4. What I want to do is go through each vertical row and then compute the frequencies of them. I'm sure this is an easy script, but I'm not sure how the syntax of R works yet. Can anyone get me started on this please? 回答1: The exact script is going to vary based on your input and what