I am trying to retrieve the most repeated value in a particular column present in a data frame.Here is my sample data and code below.A
data(\"Forbes2000\", p
you can create a function:
get_mode <- function(x){
return(names(sort(table(x), decreasing = T, na.last = T)[1]))
}
and then do
get_mode(Forbes3000$category)
The reason I created a function is that I have to this kind of thing very often.
tail(names(sort(table(Forbes2000$category))), 1)
Another way with the data.table package, which is faster for large data sets:
set.seed(1)
x=sample(seq(1,100), 5000000, replace = TRUE)
method 1 (solution proposed above)
start.time <- Sys.time()
tt <- table(x)
names(tt[tt==max(tt)])
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
Time difference of 4.883488 secs
method 2 (DATA TABLE)
start.time <- Sys.time()
ds <- data.table( x )
setkey(ds, x)
sorted <- ds[,.N,by=list(x)]
most_repeated_value <- sorted[order(-N)]$x[1]
most_repeated_value
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
Time difference of 0.328033 secs
I know my answer is coming a little late, but I built the following function that does the job in less than a second for my dataframe that contains more than 50,000 rows:
print_count_of_unique_values <- function(df, column_name, remove_items_with_freq_equal_or_lower_than = 0, return_df = F,
sort_desc = T, return_most_frequent_value = F)
{
temp <- df[column_name]
output <- as.data.frame(table(temp))
names(output) <- c("Item","Frequency")
output_df <- output[ output[[2]] > remove_items_with_freq_equal_or_lower_than, ]
if (sort_desc){
output_df <- output_df[order(output_df[[2]], decreasing = T), ]
}
cat("\nThis is the (head) count of the unique values in dataframe column '", column_name,"':\n")
print(head(output_df))
if (return_df){
return(output_df)
}
if (return_most_frequent_value){
output_df$Item <- as.character(output_df$Item)
output_df$Frequency <- as.numeric(output_df$Frequency)
most_freq_item <- output_df[1, "Item"]
cat("\nReturning most frequent item: ", most_freq_item)
return(most_freq_item)
}
}
so if you have a dataframe called "df" and a column called "name" and you want to know the most comment value in the "name" column, you could run:
most_common_name <- print_count_of_unique_values(df=df, column_name = "name", return_most_frequent_value = T)
You can use table(Forbes2000$CategoryName, useNA="ifany")
. This will give you the list of all possible values in the chosen category and the number of times each value was used in that particular data frame.
In case two or more categories may be tied for most frequent, use something like this:
x <- c("Insurance", "Insurance", "Capital Goods", "Food markets", "Food markets")
tt <- table(x)
names(tt[tt==max(tt)])
[1] "Food markets" "Insurance"