statistics

R computing mean, median, variance from file with frequency distribution

隐身守侯 提交于 2019-12-22 08:08:38
问题 I am very new to R tool and my questions might be a little too obvious. I have a file that has the following data: Score Frequency 100 10 200 30 300 40 How do I read this file and compute the mean, median, variance and standard deviation? If this above table was just raw scores without any frequency information, then I can do this: x <- scan(file="scores.txt", what = integer()) median(x) and so on, but I am not able to understand how to do these computations when given a frequency table. 回答1:

partial correlation coefficient in pandas dataframe python

二次信任 提交于 2019-12-22 07:59:26
问题 I have a data in pandas dataframe like: df = X1 X2 X3 Y 0 1 2 10 5.077 1 2 2 9 32.330 2 3 3 5 65.140 3 4 4 4 47.270 4 5 2 9 80.570 and I want to do multiple regression analysis. Here Y is dependent variables and x1, x2 and x3 are independent variables. correlation between each independent variables with dependent variable is: df.corr(): X1 X2 X3 Y X1 1.000000 0.353553 -0.409644 0.896626 X2 0.353553 1.000000 -0.951747 0.204882 X3 -0.409644 -0.951747 1.000000 -0.389641 Y 0.896626 0.204882 -0

Distribution plot of an array

江枫思渺然 提交于 2019-12-22 07:18:08
问题 I have a numpy array containing float values in [-10..10]. I would like to plot a distribution-graph of the values, like this (here it is done for a binomial random variable) : For example I would like bars counting the number of elements in each interval [-10, -9.5], [-9.5, -9], ..., [9.5, 10]. How to prepare such a distribution plot with Python? 回答1: Indeed matplotlib, more precisely you'll find samples of code corresponding to what you are after at: http://matplotlib.org/examples/pylab

nls - Convergence failure: singular convergence (7)

。_饼干妹妹 提交于 2019-12-22 06:52:39
问题 The following nls code throws the following error Convergence failure: singular convergence (7) for fm2 (for Data2 ). But the same code for similar dataset works fine ( fm1 for Data1 ). Any help to figure out this problem will be highly appreciate. Thanks Works Fine for this Data Set Data1 <- structure(list(D = c(0L, 0L, 0L, 0L, 5L, 5L, 5L, 5L, 10L, 10L, 10L, 10L, 15L, 15L, 15L, 15L, 20L, 20L, 20L, 20L), Y = c(11.6, 9.3, 10.7, 9.2, 7.8, 8, 8.6, 7.9, 7.7, 7.6, 7.5, 7.5, 7.2, 7.3, 7, 6.5, 6.3,

How can I take multiple vectors and recode their datatypes in R?

痴心易碎 提交于 2019-12-22 06:25:17
问题 I'm looking for an elegant way to change multiple vectors' datatypes in R. I'm working with an educational dataset: 426 students' answers to eight multiple choice questions ( 1 = correct, 0 = incorrect), plus a column indicating which instructor ( 1, 2, or 3 ) taught their course. As it stands, my data is sitting pretty in data.df , like this: str(data.df) 'data.frame': 426 obs. of 9 variables: $ ques01: int 1 1 1 1 1 1 0 0 0 1 ... $ ques02: int 0 0 1 1 1 1 1 1 1 1 ... $ ques03: int 0 0 1 1 0

Overall Title for Plotting Window

拟墨画扇 提交于 2019-12-22 05:35:52
问题 If I create a plotting window in R with m rows and n columns, how can I give the "overall" graphic a main title? For example, I might have three scatterplots showing the relationship between GPA and SAT score for 3 different schools. How could I give one master title to all three plots, such as, "SAT score vs. GPA for 3 schools in CA"? 回答1: The most obvious methods that come to my mind are to use either Lattice or ggplot2. Here's an example using lattice: library(lattice) depthgroup<-equal

How are Structured and Unstructured data distinguished?

孤街醉人 提交于 2019-12-22 05:25:13
问题 What are the differences between structured data and unstructured data? How that difference affect the respective data mining approaches? 回答1: The terms i am familiar with are structured and unstructured data(same as what's in your Q except for the suffix). I work with both types of data in machine learning and I am not aware of any formal definition; however, i suspect that nearly everyone whose work requires a distinction between these two types of data has no trouble distinguishing them.

Tracking down the assumptions made by SciPy's `ttest_ind()` function

此生再无相见时 提交于 2019-12-22 05:15:31
问题 I'm trying to write my own Python code to compute t-statistics and p-values for one and two tailed independent t tests. I can use the normal approximation, but for the moment I am trying to just use the t-distribution. I've been unsuccessful in matching the results of SciPy's stats library on my test data. I could use a fresh pair of eyes to see if I'm just making a dumb mistake somewhere. Note, this is cross-posted from Cross-Validated because it's been up for a while over there with no

Javascript: remove outlier from an array?

会有一股神秘感。 提交于 2019-12-22 04:42:30
问题 values = [8160,8160,6160,22684,0,0,60720,1380,1380,57128] how can I remove outliers like 0, 57218, 60720 and 22684? Is there a library which can do this? 回答1: This all depends on your interpretation of what an "outlier" is. A common approach: High outliers are anything beyond the 3rd quartile + 1.5 * the inter-quartile range (IQR) Low outliers are anything beneath the 1st quartile - 1.5 * IQR This is also the approach described by Wolfram's Mathworld. This is easily wrapped up in a function :

McNemar's test in Python and comparison of classification machine learning models [closed]

倾然丶 夕夏残阳落幕 提交于 2019-12-22 03:46:06
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . Is there a good McNemar's test implemented in Python? I don't see it anywhere in Scipy.stats or Scikit-Learn. I may have overlooked some other good packages. Please recommend. McNemar's Test is almost THE test for comparing two classification algorithms/models given a holdout test set (not through K-fold or