r factor - Replace value in a column based on a Frequency Count using R -


i have dataset multiple columns. many of these columns contain on 32 factors, run random forest (for example), want replace values in column based on frequency count.

one of column reads this:

$ country                                     : factor w/ 92 levels "china","india","usa",..: 30 39 39 20 89 30 16 21 30 30 ... 

what retain top n (where n value between 5 , 20) countries, , replace remaining values "other". know how calculate frequency of values using table function, can't seem find solution replacing values on basis of such rule. how can done?

some example data:

set.seed(1) x <- factor(sample(1:5,100,prob=c(1,3,4,2,5),replace=true)) table(x) # 1  2  3  4  5  # 4 26 30 13 27  

replace levels other top 3 (levels 2/3/5) "other":

levels(x)[rank(table(x)) < 3] <- "other"  table(x) #other     2     3     5  #   17    26    30    27 

Comments

Popular posts from this blog

android - Why am I getting the message 'Youractivity.java is not an activity subclass or alias' -

python - How do I create a list index that loops through integers in another list -

c# - “System.Security.Cryptography.CryptographicException: Keyset does not exist” when reading private key from remote machine -