Assigning values to a df$column based on another column in the same df

后端 未结 2 494
情书的邮戳
情书的邮戳 2021-01-29 04:33
df2 <- data.frame(Mean = c(.5,4,2.3,1.2,3.7,3.3,.8), Numbers = \"NA\")


for(i in 1:length(df2$Mean)){
        if(df2$Mean[i] <= .5) {
        df2$Number[i] = 0
           


        
相关标签:
2条回答
  • 2021-01-29 04:46
    df$Number <- findInterval( df$Mean, c( seq(0.5, 3.5, by=1) , Inf) )
    

    There was an edge case where df$Mean = 3.5 that was not covered by your definition. My method gives it a 4.

    The findInterval function is really doing something very similar to the cut function, except it returns a numeric value rather that a factor. It sets up a bunch of intervals and tells you which interval each item would fall into.

    0 讨论(0)
  • 2021-01-29 04:57

    genotype changes the copy of AllSamples that exists inside the function. When this function ends, that internal copy is destroyed (along with your changes to it); the original version of it (in your global workspace, most likely) is unchanged. If you make your function return AllSamples and then overwrite the original with the return value, that would work.

    genotype <- function (AllSamples){
        for(i in 1:length(AllSamples$Mean.Regression)){
            ...
        }
        AllSamples
    }
    

    Then it would be called like

    AllSamples <- genotype(AllSamples)
    

    A more idiomatic approach would be to not change the data.frame in genotype, but to just create the new column (as a vector), return that, and assign that to the column of AllSamples.

    genotype <- function (AllSamples){
        CopyNumber <- rep(0, length(AllSamples$Mean.Regression))
        for(i in seq_along(AllSamples$Mean.Regression)){
            if(AllSamples$Mean.Regression[i] < .5) {
                CopyNumber[i] <- 0
            } else if(AllSamples$Mean.Regression[i] > .5 & AllSamples$Mean.Regression[i] < 1.5) {
                CopyNumber[i] <- 1
            } else if(AllSamples$Mean.Regression[i] > 1.5 & AllSamples$Mean.Regression[i] < 2.5) {
                CopyNumber[i] <- 2
            } else if(AllSamples$Mean.Regression[i] > 2.5 & AllSamples$Mean.Regression[i] < 3.5) {
                CopyNumber[i] <- 3
            } else {
                CopyNumber[i] <- 4
            }
        }
        CopyNumber
    }   
    

    which would be called as

    AllSamples$CopyNumber <- genotype(AllSamples)
    

    The real, real way to do this is to use vectorized functions rather than explicit loops.

    genotype <- function(AllSamples) {
        cut(AllSamples$Mean.Regression,
            breaks = c(-Inf, 0.5, 1.5, 2.5, 3.5, Inf),
            labels = FALSE) - 1
    }
    

    which you call as

    AllSamples$CopyNumber <- genotype(AllSamples)
    
    0 讨论(0)
提交回复
热议问题