r factor - Replace value in a column based on a Frequency Count using R -


i have dataset multiple columns. many of these columns contain on 32 factors, run random forest (for example), want replace values in column based on frequency count.

one of column reads this:

$ country                                     : factor w/ 92 levels "china","india","usa",..: 30 39 39 20 89 30 16 21 30 30 ... 

what retain top n (where n value between 5 , 20) countries, , replace remaining values "other". know how calculate frequency of values using table function, can't seem find solution replacing values on basis of such rule. how can done?

some example data:

set.seed(1) x <- factor(sample(1:5,100,prob=c(1,3,4,2,5),replace=true)) table(x) # 1  2  3  4  5  # 4 26 30 13 27  

replace levels other top 3 (levels 2/3/5) "other":

levels(x)[rank(table(x)) < 3] <- "other"  table(x) #other     2     3     5  #   17    26    30    27 

Comments

Popular posts from this blog

sql - VB.NET Operand type clash: date is incompatible with int error -

SVG stroke-linecap doesn't work for circles in Firefox? -

python - TypeError: Scalar value for argument 'color' is not numeric in openCV -