I often find myself trying to create a categorical variable from a numerical variable + a user-provided set of ranges.
For instance, say that I have a data.frame wi
Use cut()
, already:
df$VCAT2 <- cut(df$V, c(0,9.999,10,20,Inf), labels=F)
Notice the trick I pull to define a very small bin at 10:
10 - 10*.Machine$double.eps
)cut(..., labels)
argument.A way I bin numbers is to remove the remainder using the modulus opperator, %%
. E.g. to bin into groups of 20:
#create raw data
unbinned<-c(1.1,1.53,5,8.3,33.5,49.22,55,57.9,79.6,81,95,201,213)
rawdata<-as.data.frame(unbinned)
#bin the data into groups of 20
binneddata<-mutate(rawdata,binned=unbinned-unbinned %% 20)
#print the data
binneddata
This produces the output:
unbinned binned
1 1.10 0
2 1.53 0
3 5.00 0
4 8.30 0
5 33.50 20
6 49.22 40
7 55.00 40
8 57.90 40
9 79.60 60
10 81.00 80
11 95.00 80
12 201.00 200
13 213.00 200
So 0 represents 0-<20, 20 represents 20-<40, 40 ,40-<60 etc. (of course divide the binned
value by 20 to get sequential groups like in the original question)
Bonus
If you want to use the binned values as categorical variables in ggplot
etc. by converting them into strings, they will order strangely, e.g. 200 will come before 40, because '2' comes before '4' in the alphabet, to get around this, use the sprintf
function to create leading zeros. (the 3
in %03d
should be the number of digits you expect the longest number to be):
#convert the data into strings with leading zeros
binnedstring<-mutate(binneddata,bin_as_character=sprintf('%03d',binned))
#print the data
binnedstring
giving the output:
unbinned binned bin_as_character
1 1.10 0 000
2 1.53 0 000
3 5.00 0 000
4 8.30 0 000
5 33.50 20 020
etc.
If you want to have 000-<020
, create the upper bound using arithmetic and concatenate using the paste function:
#make human readable bin value
binnedstringband<-mutate(
binnedstring,
nextband=binned+20,
human_readable=paste(bin_as_character,'-<',sprintf('%03d',nextband),sep='')
)
#print the data
binnedstringband
Giving:
unbinned binned bin_as_character nextband human_readable
1 1.10 0 000 20 000-<020
2 1.53 0 000 20 000-<020
3 5.00 0 000 20 000-<020
4 8.30 0 000 20 000-<020
5 33.50 20 020 40 020-<040
etc.