data-manipulation | 易学教程

Calculating effect sizes between 3 groups for a set of variables in a dataset

阅读更多关于 Calculating effect sizes between 3 groups for a set of variables in a dataset

问题 I would like to calculate the effect sizes of 3 treatments on 3 variables (x1, x2, x3). Suppose I have the following dataset: set.seed(1234) data <- data.frame( dose=factor(c(rep(1,25), rep(2,35), rep(3,40)), labels = c("low", "middle", "high")), x1 = rnorm(100, 0, 2), x2 = rnorm(100, 3, 3), x3 = rnorm(100, 9, 4) ) Now, I would like to calculate, for each combination of treatments, its effect size. I have found this function to calculate Cohen's d. cohens_d <- function(x, y) { lx <- length(x)

spatial data / compute metrics on neighbors in R

阅读更多关于 spatial data / compute metrics on neighbors in R

问题 I have the 2D spatial data in the form (xBin, yBin, value). e.g.: DT = data.table(x=c(rep(1,3),rep(2,3),rep(3,3)),y=rep(c(1,2,3),3),value=100*c(1:9)) For each bin I want to compute the sum of variable "value" over all neighboring bins. A bin is considered a neighbor if both of its indices - x and y are within one unit from the current bin e.g. for x=2, y=2, I want to compute valueNeighbors(x=2,y=2) = value(x=1,y=1)+value(1,2)+value(1,3) +value(2,1)+value(2,3) +value(3,1)+value(3,2)+value(3,3)

How to Detect and Mark Change within a Column in Another Column

阅读更多关于 How to Detect and Mark Change within a Column in Another Column

问题 I'm trying to mark when a process starts and ends. The code needs to detect when the change begins and when it ends, marking it so in another column. Example data: date process 2007 0 2008 1 2009 1 2010 1 2011 1 2012 1 2013 0 Goal: date process Status 2007 0 NA 2008 1 Process_START 2009 1 NA 2010 1 NA 2011 1 NA 2012 1 Process_END 2013 0 NA 回答1: Maybe by calculating diff and lagging it in both directions: dif <- diff(df1$process) df1$Status <- factor(c(NA, dif) - 2 * c(dif, NA), levels = -3:3)

Appending a row of sums for each level of a factor

阅读更多关于 Appending a row of sums for each level of a factor

I want to append a row of sums for each Reg like this Reg Res Pop 1 Total 1000915 2 A Urban 500414 3 A Rural 500501 4 Total 999938 5 B Urban 499922 6 B Rural 500016 7 Total 1000912 8 C Urban 501638 9 C Rural 499274 10 Total 999629 11 D Urban 499804 12 D Rural 499825 13 Total 1000303 14 E Urban 499917 15 E Rural 500386 MWE is below: Reg <- rep(LETTERS[1:5], each = 2) Res <- rep(c("Urban", "Rural"), times = 5) set.seed(12345) Pop <- rpois(n = 10, lambda = 500000) df <- data.frame(Reg, Res, Pop) df Reg Res Pop 1 A Urban 500414 2 A Rural 500501 3 B Urban 499922 4 B Rural 500016 5 C Urban 501638 6

create OLAP cube in R programming language

阅读更多关于 create OLAP cube in R programming language

问题 Hi I have following data Function SB `Country Region` `+1 Function` `+1 SB` `+1 Country Region` <chr> <chr> <chr> <chr> <chr> <chr> 1 ENG SB10 AMER ENG SB10 AMER 2 IT SB07 EMEA IT SB07 EMEA 3 QLT SB05 EMEA QLT SB05 EMEA 4 MFG SB07 EMEA MFG SB07 EMEA 5 MFG SB04 EMEA MFG SB05 EMEA 6 SCM SB08 EMEA SCM SB08 EMEA i want to create 3 dimensional OLAP cube in which column Function SB Country Region should be in row and +1 Function , +1 SB , +1 Country Region should be in column . output should be of

R: merge two data frames when either of two criteria matches

阅读更多关于 R: merge two data frames when either of two criteria matches

问题 Say I have two dataframes like the following: n = c(2, 3, 5, 5, 6, 7) s = c("aa", "bb", "cc", "dd", "ee", "ff") b = c(2, 4, 5, 4, 3, 2) df = data.frame(n, s, b) # n s b #1 2 aa 2 #2 3 bb 4 #3 5 cc 5 #4 5 dd 4 #5 6 ee 3 #6 7 ff 2 n2 = c(5, 6, 7, 6) s2 = c("aa", "bb", "cc", "ll") b2 = c("hh", "nn", "ff", "dd") df2 = data.frame(n2, s2, b2) # n2 s2 b2 #1 5 aa hh #2 6 bb nn #3 7 cc ff #4 6 ll dd I want to merge them to achieve the following result: #n s b n2 s2 b2 #2 aa 2 5 aa hh #3 bb 4 6 bb nn

Calculating effect sizes between 3 groups for a set of variables in a dataset

阅读更多关于 Calculating effect sizes between 3 groups for a set of variables in a dataset

I would like to calculate the effect sizes of 3 treatments on 3 variables (x1, x2, x3). Suppose I have the following dataset: set.seed(1234) data <- data.frame( dose=factor(c(rep(1,25), rep(2,35), rep(3,40)), labels = c("low", "middle", "high")), x1 = rnorm(100, 0, 2), x2 = rnorm(100, 3, 3), x3 = rnorm(100, 9, 4) ) Now, I would like to calculate, for each combination of treatments, its effect size. I have found this function to calculate Cohen's d. cohens_d <- function(x, y) { lx <- length(x)- 1 ly <- length(y)- 1 md <- abs(mean(x) - mean(y)) csd <- lx * var(x) + ly * var(y) csd <- csd/(lx +

How to separate one column to multiple column (complex column)

阅读更多关于 How to separate one column to multiple column (complex column)

问题 I am trying to separate column "Grade" to multiple columns according to their subject and grade grade<-read.csv("https://raw.githubusercontent.com/tuyenhavan/Statistics/Dataset/High_school_Grade.csv",sep=";") # Rename the column names names(grade)<-c("Student_ID","Name","Venue","Grade") head(grade) # Separate `Grade` into `subject` variables and coresponding `Grade`columns library(tidyverse) df<- grade %>% separate(Grade,paste("V",1:7,sep="_"),sep=":") head(df) # It still is not separating

Tcl/Tk write in a specific line

阅读更多关于 Tcl/Tk write in a specific line

I want to write in a specific line in Textdocument but there´s a Problem with my code, i don´t know where the bug is. set fp [open C:/Users/user/Desktop/tst/settings.txt w] set count 0 while {[gets $fp line]!=-1} { incr count if {$count==28} { break } } puts $fp "TEST" close $fp The File only contains TEST. Has anybody an idea? You are using 'w' as access argument, which truncates the file. So you will loose all data from file while opening. Read more about open command You can use 'r+' or 'a+'. Also To write after a particular line you can move the pointer to the desired location. set fp

R - Calculate difference (similarity measure) between similar datasets

阅读更多关于 R - Calculate difference (similarity measure) between similar datasets

I have seen many questions that touch on this topic but haven't yet found an answer. If I have missed a question that does answer this question, please do mark this and point us to the question. Scenario: We have a benchmark dataset, we have imputation methods, we systematically delete values from the benchmark and use two different imputation methods. Thus we have a benchmark, imputedData1 and imputedData2. Question: Is there a function that can produce a number that represents the difference between the benchmark and imputedData1 or/and the difference between the benchmark and imputedData2.