data-manipulation

Calculating effect sizes between 3 groups for a set of variables in a dataset

一笑奈何 提交于 2019-12-07 19:35:55
问题 I would like to calculate the effect sizes of 3 treatments on 3 variables (x1, x2, x3). Suppose I have the following dataset: set.seed(1234) data <- data.frame( dose=factor(c(rep(1,25), rep(2,35), rep(3,40)), labels = c("low", "middle", "high")), x1 = rnorm(100, 0, 2), x2 = rnorm(100, 3, 3), x3 = rnorm(100, 9, 4) ) Now, I would like to calculate, for each combination of treatments, its effect size. I have found this function to calculate Cohen's d. cohens_d <- function(x, y) { lx <- length(x)

spatial data / compute metrics on neighbors in R

谁说我不能喝 提交于 2019-12-07 11:58:50
问题 I have the 2D spatial data in the form (xBin, yBin, value). e.g.: DT = data.table(x=c(rep(1,3),rep(2,3),rep(3,3)),y=rep(c(1,2,3),3),value=100*c(1:9)) For each bin I want to compute the sum of variable "value" over all neighboring bins. A bin is considered a neighbor if both of its indices - x and y are within one unit from the current bin e.g. for x=2, y=2, I want to compute valueNeighbors(x=2,y=2) = value(x=1,y=1)+value(1,2)+value(1,3) +value(2,1)+value(2,3) +value(3,1)+value(3,2)+value(3,3)

How to Detect and Mark Change within a Column in Another Column

女生的网名这么多〃 提交于 2019-12-07 03:06:29
问题 I'm trying to mark when a process starts and ends. The code needs to detect when the change begins and when it ends, marking it so in another column. Example data: date process 2007 0 2008 1 2009 1 2010 1 2011 1 2012 1 2013 0 Goal: date process Status 2007 0 NA 2008 1 Process_START 2009 1 NA 2010 1 NA 2011 1 NA 2012 1 Process_END 2013 0 NA 回答1: Maybe by calculating diff and lagging it in both directions: dif <- diff(df1$process) df1$Status <- factor(c(NA, dif) - 2 * c(dif, NA), levels = -3:3)

Appending a row of sums for each level of a factor

痞子三分冷 提交于 2019-12-06 15:01:27
I want to append a row of sums for each Reg like this Reg Res Pop 1 Total 1000915 2 A Urban 500414 3 A Rural 500501 4 Total 999938 5 B Urban 499922 6 B Rural 500016 7 Total 1000912 8 C Urban 501638 9 C Rural 499274 10 Total 999629 11 D Urban 499804 12 D Rural 499825 13 Total 1000303 14 E Urban 499917 15 E Rural 500386 MWE is below: Reg <- rep(LETTERS[1:5], each = 2) Res <- rep(c("Urban", "Rural"), times = 5) set.seed(12345) Pop <- rpois(n = 10, lambda = 500000) df <- data.frame(Reg, Res, Pop) df Reg Res Pop 1 A Urban 500414 2 A Rural 500501 3 B Urban 499922 4 B Rural 500016 5 C Urban 501638 6

create OLAP cube in R programming language

流过昼夜 提交于 2019-12-06 13:57:17
问题 Hi I have following data Function SB `Country Region` `+1 Function` `+1 SB` `+1 Country Region` <chr> <chr> <chr> <chr> <chr> <chr> 1 ENG SB10 AMER ENG SB10 AMER 2 IT SB07 EMEA IT SB07 EMEA 3 QLT SB05 EMEA QLT SB05 EMEA 4 MFG SB07 EMEA MFG SB07 EMEA 5 MFG SB04 EMEA MFG SB05 EMEA 6 SCM SB08 EMEA SCM SB08 EMEA i want to create 3 dimensional OLAP cube in which column Function SB Country Region should be in row and +1 Function , +1 SB , +1 Country Region should be in column . output should be of

R: merge two data frames when either of two criteria matches

我是研究僧i 提交于 2019-12-06 13:01:01
问题 Say I have two dataframes like the following: n = c(2, 3, 5, 5, 6, 7) s = c("aa", "bb", "cc", "dd", "ee", "ff") b = c(2, 4, 5, 4, 3, 2) df = data.frame(n, s, b) # n s b #1 2 aa 2 #2 3 bb 4 #3 5 cc 5 #4 5 dd 4 #5 6 ee 3 #6 7 ff 2 n2 = c(5, 6, 7, 6) s2 = c("aa", "bb", "cc", "ll") b2 = c("hh", "nn", "ff", "dd") df2 = data.frame(n2, s2, b2) # n2 s2 b2 #1 5 aa hh #2 6 bb nn #3 7 cc ff #4 6 ll dd I want to merge them to achieve the following result: #n s b n2 s2 b2 #2 aa 2 5 aa hh #3 bb 4 6 bb nn

Calculating effect sizes between 3 groups for a set of variables in a dataset

蓝咒 提交于 2019-12-06 12:56:31
I would like to calculate the effect sizes of 3 treatments on 3 variables (x1, x2, x3). Suppose I have the following dataset: set.seed(1234) data <- data.frame( dose=factor(c(rep(1,25), rep(2,35), rep(3,40)), labels = c("low", "middle", "high")), x1 = rnorm(100, 0, 2), x2 = rnorm(100, 3, 3), x3 = rnorm(100, 9, 4) ) Now, I would like to calculate, for each combination of treatments, its effect size. I have found this function to calculate Cohen's d. cohens_d <- function(x, y) { lx <- length(x)- 1 ly <- length(y)- 1 md <- abs(mean(x) - mean(y)) csd <- lx * var(x) + ly * var(y) csd <- csd/(lx +

How to separate one column to multiple column (complex column)

时光总嘲笑我的痴心妄想 提交于 2019-12-06 08:39:35
问题 I am trying to separate column "Grade" to multiple columns according to their subject and grade grade<-read.csv("https://raw.githubusercontent.com/tuyenhavan/Statistics/Dataset/High_school_Grade.csv",sep=";") # Rename the column names names(grade)<-c("Student_ID","Name","Venue","Grade") head(grade) # Separate `Grade` into `subject` variables and coresponding `Grade`columns library(tidyverse) df<- grade %>% separate(Grade,paste("V",1:7,sep="_"),sep=":") head(df) # It still is not separating

Tcl/Tk write in a specific line

倖福魔咒の 提交于 2019-12-06 07:02:51
I want to write in a specific line in Textdocument but there´s a Problem with my code, i don´t know where the bug is. set fp [open C:/Users/user/Desktop/tst/settings.txt w] set count 0 while {[gets $fp line]!=-1} { incr count if {$count==28} { break } } puts $fp "TEST" close $fp The File only contains TEST. Has anybody an idea? You are using 'w' as access argument, which truncates the file. So you will loose all data from file while opening. Read more about open command You can use 'r+' or 'a+'. Also To write after a particular line you can move the pointer to the desired location. set fp

R - Calculate difference (similarity measure) between similar datasets

时间秒杀一切 提交于 2019-12-06 06:08:41
I have seen many questions that touch on this topic but haven't yet found an answer. If I have missed a question that does answer this question, please do mark this and point us to the question. Scenario: We have a benchmark dataset, we have imputation methods, we systematically delete values from the benchmark and use two different imputation methods. Thus we have a benchmark, imputedData1 and imputedData2. Question: Is there a function that can produce a number that represents the difference between the benchmark and imputedData1 or/and the difference between the benchmark and imputedData2.