statistics

Plot many categories

南楼画角 提交于 2019-12-13 07:45:53
问题 I've data as follow, each experiment lead to the apparition of a composition, and each composition belong to one or many categories. I want to plot occurence number of each composition: DF <- read.table(text = " Comp Category Comp1 1 Comp2 1 Comp3 4,2 Comp4 1,3 Comp1 1,2 Comp3 3 ", header = TRUE) barplot(table(DF$Comp)) So this worked perfectly for me. After that, as composition belong to one or many categories. there's comma separations between categories.I Want to barplot the compo in X and

Using BigQuery to find outliers with standard deviation results combined with WHERE clause

こ雲淡風輕ζ 提交于 2019-12-13 07:26:29
问题 Standard deviation analysis can be a useful way to find outliers. Is there a way to incorporate the result of this query (finding the value of the fourth standard deviation away from the mean)... SELECT (AVG(weight_pounds) + STDDEV(weight_pounds) * 4) as high FROM [publicdata:samples.natality]; result = 12.721342001626912 ...Into another query that produces information about which states and dates have the most babies born heavier that 4 standard deviations from average? SELECT state, year,

Having issues using order function in R

ⅰ亾dé卋堺 提交于 2019-12-13 06:39:28
问题 My data.frame is stateData and when I execute stateData[order(stateData$"heart failure"),] , with heart failure being a column name, I'm getting my dataframe back with the heart failure column having increasing values like this: 10.0, 10.1, 10.3, 10.7, 15.0, 15.1, 15.9, 8.1, 8.3, 8.9, 9.0, 9.1 Here are details: dput(head(stateData)) heart failure = structure(c(97L, 44L, 25L, 6L, 52L, 57L ), .Label = c("10.0", "7.2", "7.3", "7.4", "7.5", "7.6", "7.7", "7.8", "7.9", "8.0", "8.1", "8.2", "8.3",

Statistical analysis on Bell shaped (Gaussian) curve

无人久伴 提交于 2019-12-13 06:25:43
问题 In my application I am getting images (captured by a high speed camera) containing projections of some light sources on the screen. 1-My first task is to plot a PDF or intensity distribution plot for the light intensity, which should come as bell shape or Gaussian, since at the center the light intensity will be maximum and at the ends it will be diminishing. Like this(just for example, not the exact case for me): In worst cases I will be having a series of light sources illuminated

R: How to pivot and count data.frame (ex: list of medical conditions and the number of patients with each)

夙愿已清 提交于 2019-12-13 05:53:40
问题 I'm trying to get better with dplyr and tidyr but I'm not used to "thinking in R". An example may be best. The table I've generated from my data in sql looks like this: ╔═══════════╦════════════╦═════╦════════╦══════════════╦══════════╦══════════════╗ ║ patientid ║ had_stroke ║ age ║ gender ║ hypertension ║ diabetes ║ estrogen HRT ║ ╠═══════════╬════════════╬═════╬════════╬══════════════╬══════════╬══════════════╣ ║ 934988 ║ 1 ║ 65 ║ M ║ 1 ║ 1 ║ 0 ║ ║ 94044 ║ 0 ║ 69 ║ F ║ 1 ║ 0 ║ 0 ║ ║ 689348

Plotting Confidence intervals to multiple lines in ggplot2

北城余情 提交于 2019-12-13 05:23:36
问题 I am fairly new to R and I am working on analyzing some data in ggplot2. I have one set of data that has hormone values for a type of animal. The animals came from two sites (Control, New). I analyzed the data using an ANCOVA and plotted the predicted regression lines based on the model. What I would really like to do, is plot dotted confidence interval lines around both lines on my graph. I can't seem to find/figure out how to perform this using the ggplot2 package. I moved to the package

Directly plot a statistical model with ggplots

感情迁移 提交于 2019-12-13 05:17:57
问题 The package visreg allows to plot directly a statistical model which I find very convenient to check if anything gone wrong and to check if we understand correctly what the estimates mean. I would love to combine the functionality of visreg with the incredible flexibility of ggplot . I'd like to be able to directly call the model in a ggplot code line. Is this feasible (eventually by directly modifying the visreg function)? For example: require(visreg) require(ggplot2) y = c(rnorm(40,10,1),

How do I subset/split this table bases on the values of one column in R?

牧云@^-^@ 提交于 2019-12-13 05:16:02
问题 The data can be found here: https://www.dropbox.com/s/l7pc11hhiwr8zzn/data.csv?dl=0 , or else as nlschools in the library MASS. I'd like to split this table based on the value of nlschools$SES, having the table divide into tables where nlschools$SES is <=30 , 30 < SES <= 40 and > 40 , and with all the columns remaining. I have tried using cut with intervals like 0:30 , but the result is very confusing and does not have the complete set of columns remaining. I hope what I'm trying to achieve

Is there a python t test for difference? [duplicate]

末鹿安然 提交于 2019-12-13 05:07:18
问题 This question already has answers here : How to calculate the statistics “t-test” with numpy (3 answers) Closed 5 years ago . Is there a t test in a python package where you can test for difference? If there is how do you use it? For example test two vectors: a=np.random.randn(5, 7) b=np.random.randn(5, 7) I have found the t test : ttest_ind in statsmodels. However I would like to specify the difference to test for, this can not be passed into the ttest_ind function. Does anybody know another

Ranking algorithm with missing values and bias

对着背影说爱祢 提交于 2019-12-13 05:00:12
问题 The problem is : A set of 5 independent users where asked to rate 50 products given to them. All 50 products would have been used by the users in some point of time. Some users have more bias towards certain products. One user did not truly complete the survey and gave random values. It is not necessary for the users to rate all the products. Now given a 4 sample dataset , rank the products based on ratings datset : product #user1 #user2 #user3 #user4 #user5 0 29 - 10 90 12 1 - - - - 7 2 - -