How to retrieve the most repeated value in a column present in a data frame

前端 未结 6 2138
礼貌的吻别
礼貌的吻别 2020-12-16 00:08

I am trying to retrieve the most repeated value in a particular column present in a data frame.Here is my sample data and code below.A

data(\"Forbes2000\", p         


        
相关标签:
6条回答
  • 2020-12-16 00:46

    you can create a function:

    get_mode <- function(x){
      return(names(sort(table(x), decreasing = T, na.last = T)[1]))
    }
    

    and then do

    get_mode(Forbes3000$category)
    

    The reason I created a function is that I have to this kind of thing very often.

    0 讨论(0)
  • 2020-12-16 00:48
    tail(names(sort(table(Forbes2000$category))), 1)
    
    0 讨论(0)
  • 2020-12-16 00:57

    Another way with the data.table package, which is faster for large data sets:

    set.seed(1)
    x=sample(seq(1,100), 5000000, replace = TRUE)
    

    method 1 (solution proposed above)

    start.time <- Sys.time()
    tt <- table(x)
    names(tt[tt==max(tt)])
    end.time <- Sys.time()
    time.taken <- end.time - start.time
    time.taken
    

    Time difference of 4.883488 secs

    method 2 (DATA TABLE)

    start.time <- Sys.time()
    ds <- data.table( x )
    setkey(ds, x)
    sorted <- ds[,.N,by=list(x)]
    
    most_repeated_value <- sorted[order(-N)]$x[1]
    most_repeated_value
    
    end.time <- Sys.time()
    time.taken <- end.time - start.time
    time.taken
    

    Time difference of 0.328033 secs

    0 讨论(0)
  • 2020-12-16 01:04

    I know my answer is coming a little late, but I built the following function that does the job in less than a second for my dataframe that contains more than 50,000 rows:

    print_count_of_unique_values <- function(df, column_name, remove_items_with_freq_equal_or_lower_than = 0, return_df = F, 
                                             sort_desc = T, return_most_frequent_value = F)
    {
      temp <- df[column_name]
      output <- as.data.frame(table(temp))
      names(output) <- c("Item","Frequency")
      output_df <- output[  output[[2]] > remove_items_with_freq_equal_or_lower_than,  ]
    
      if (sort_desc){
        output_df <- output_df[order(output_df[[2]], decreasing = T), ]
      }
    
      cat("\nThis is the (head) count of the unique values in dataframe column '", column_name,"':\n")
      print(head(output_df))
    
      if (return_df){
        return(output_df)
      }
    
      if (return_most_frequent_value){
          output_df$Item <- as.character(output_df$Item)
          output_df$Frequency <- as.numeric(output_df$Frequency)
          most_freq_item <- output_df[1, "Item"]
          cat("\nReturning most frequent item: ", most_freq_item)
          return(most_freq_item)
      }
    }
    

    so if you have a dataframe called "df" and a column called "name" and you want to know the most comment value in the "name" column, you could run:

    most_common_name <- print_count_of_unique_values(df=df, column_name = "name", return_most_frequent_value = T)    
    
    0 讨论(0)
  • 2020-12-16 01:06

    You can use table(Forbes2000$CategoryName, useNA="ifany"). This will give you the list of all possible values in the chosen category and the number of times each value was used in that particular data frame.

    0 讨论(0)
  • In case two or more categories may be tied for most frequent, use something like this:

    x <- c("Insurance", "Insurance", "Capital Goods", "Food markets", "Food markets")
    tt <- table(x)
    names(tt[tt==max(tt)])
    [1] "Food markets" "Insurance" 
    
    0 讨论(0)
提交回复
热议问题