statistics | 易学教程

R computing mean, median, variance from file with frequency distribution

阅读更多关于 R computing mean, median, variance from file with frequency distribution

问题 I am very new to R tool and my questions might be a little too obvious. I have a file that has the following data: Score Frequency 100 10 200 30 300 40 How do I read this file and compute the mean, median, variance and standard deviation? If this above table was just raw scores without any frequency information, then I can do this: x <- scan(file="scores.txt", what = integer()) median(x) and so on, but I am not able to understand how to do these computations when given a frequency table. 回答1:

partial correlation coefficient in pandas dataframe python

阅读更多关于 partial correlation coefficient in pandas dataframe python

问题 I have a data in pandas dataframe like: df = X1 X2 X3 Y 0 1 2 10 5.077 1 2 2 9 32.330 2 3 3 5 65.140 3 4 4 4 47.270 4 5 2 9 80.570 and I want to do multiple regression analysis. Here Y is dependent variables and x1, x2 and x3 are independent variables. correlation between each independent variables with dependent variable is: df.corr(): X1 X2 X3 Y X1 1.000000 0.353553 -0.409644 0.896626 X2 0.353553 1.000000 -0.951747 0.204882 X3 -0.409644 -0.951747 1.000000 -0.389641 Y 0.896626 0.204882 -0

Distribution plot of an array

阅读更多关于 Distribution plot of an array

问题 I have a numpy array containing float values in [-10..10]. I would like to plot a distribution-graph of the values, like this (here it is done for a binomial random variable) : For example I would like bars counting the number of elements in each interval [-10, -9.5], [-9.5, -9], ..., [9.5, 10]. How to prepare such a distribution plot with Python? 回答1: Indeed matplotlib, more precisely you'll find samples of code corresponding to what you are after at: http://matplotlib.org/examples/pylab

nls - Convergence failure: singular convergence (7)

阅读更多关于 nls - Convergence failure: singular convergence (7)

问题 The following nls code throws the following error Convergence failure: singular convergence (7) for fm2 (for Data2 ). But the same code for similar dataset works fine ( fm1 for Data1 ). Any help to figure out this problem will be highly appreciate. Thanks Works Fine for this Data Set Data1 <- structure(list(D = c(0L, 0L, 0L, 0L, 5L, 5L, 5L, 5L, 10L, 10L, 10L, 10L, 15L, 15L, 15L, 15L, 20L, 20L, 20L, 20L), Y = c(11.6, 9.3, 10.7, 9.2, 7.8, 8, 8.6, 7.9, 7.7, 7.6, 7.5, 7.5, 7.2, 7.3, 7, 6.5, 6.3,

How can I take multiple vectors and recode their datatypes in R?

阅读更多关于 How can I take multiple vectors and recode their datatypes in R?

问题 I'm looking for an elegant way to change multiple vectors' datatypes in R. I'm working with an educational dataset: 426 students' answers to eight multiple choice questions ( 1 = correct, 0 = incorrect), plus a column indicating which instructor ( 1, 2, or 3 ) taught their course. As it stands, my data is sitting pretty in data.df , like this: str(data.df) 'data.frame': 426 obs. of 9 variables: $ ques01: int 1 1 1 1 1 1 0 0 0 1 ... $ ques02: int 0 0 1 1 1 1 1 1 1 1 ... $ ques03: int 0 0 1 1 0

Overall Title for Plotting Window

阅读更多关于 Overall Title for Plotting Window

问题 If I create a plotting window in R with m rows and n columns, how can I give the "overall" graphic a main title? For example, I might have three scatterplots showing the relationship between GPA and SAT score for 3 different schools. How could I give one master title to all three plots, such as, "SAT score vs. GPA for 3 schools in CA"? 回答1: The most obvious methods that come to my mind are to use either Lattice or ggplot2. Here's an example using lattice: library(lattice) depthgroup<-equal

How are Structured and Unstructured data distinguished?

阅读更多关于 How are Structured and Unstructured data distinguished?

问题 What are the differences between structured data and unstructured data? How that difference affect the respective data mining approaches? 回答1: The terms i am familiar with are structured and unstructured data(same as what's in your Q except for the suffix). I work with both types of data in machine learning and I am not aware of any formal definition; however, i suspect that nearly everyone whose work requires a distinction between these two types of data has no trouble distinguishing them.

Tracking down the assumptions made by SciPy's `ttest_ind()` function

阅读更多关于 Tracking down the assumptions made by SciPy's `ttest_ind()` function

问题 I'm trying to write my own Python code to compute t-statistics and p-values for one and two tailed independent t tests. I can use the normal approximation, but for the moment I am trying to just use the t-distribution. I've been unsuccessful in matching the results of SciPy's stats library on my test data. I could use a fresh pair of eyes to see if I'm just making a dumb mistake somewhere. Note, this is cross-posted from Cross-Validated because it's been up for a while over there with no

Javascript: remove outlier from an array?

阅读更多关于 Javascript: remove outlier from an array?

问题 values = [8160,8160,6160,22684,0,0,60720,1380,1380,57128] how can I remove outliers like 0, 57218, 60720 and 22684? Is there a library which can do this? 回答1: This all depends on your interpretation of what an "outlier" is. A common approach: High outliers are anything beyond the 3rd quartile + 1.5 * the inter-quartile range (IQR) Low outliers are anything beneath the 1st quartile - 1.5 * IQR This is also the approach described by Wolfram's Mathworld. This is easily wrapped up in a function :

McNemar's test in Python and comparison of classification machine learning models [closed]

阅读更多关于 McNemar's test in Python and comparison of classification machine learning models [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . Is there a good McNemar's test implemented in Python? I don't see it anywhere in Scipy.stats or Scikit-Learn. I may have overlooked some other good packages. Please recommend. McNemar's Test is almost THE test for comparing two classification algorithms/models given a holdout test set (not through K-fold or