Combining all elements in a vector of lists based on the common first element of each list in the vector in R -


i have large vector of lists (about 300,000 rows). example, let's consider following:

vec = c(    list(c("a",10,11,12)),    list(c("b",10,11,15)),   list(c("a",10,12,12,16)),   list(c("a",11,12,16,17)) ) 

now, want following:

for each unique first element of each list in vector, need unique elements occurring corresponding in lists in vector, along respective frequencies.

output like:

for a, have elements 10, 11 12, 16 & 17 frequencies 2,2,4,2 & 1 respectively. b, 10, 11, 15 frequencies 1,1,1.

many in advance, ankur.

here's 1 way it.

first, simpler way create list is:

l <- list(c("a", 10, 11, 12),            c("b", 10, 11, 15),            c("a", 10, 12, 12, 16),            c("a", 11, 12, 16, 17)) 

now can split first character, , tabulate first character.

tapply(l, sapply(l, '[[', 1), function(x)    table(unlist(lapply(x, function(x) x[-1]))))  ## $a ##  ## 10 11 12 16 17  ##  2  2  4  2  1  ##  ## $b ##  ## 10 11 15  ##  1  1  1  

scaling list comprising 300,000 elements of similar size:

l <- replicate(300000, c(sample(letters, 1), sample(100, sample(3:4, 1))))  system.time(   freqs <- tapply(l, sapply(l, '[[', 1), function(x)      table(unlist(lapply(x, function(x) x[-1])))) )  ## user  system elapsed  ## 0.68    0.00    0.69  

if want sort vectors of resulting list, per op's comment below, can modify function applied groups of l:

tapply(l, sapply(l, '[[', 1), function(x)    sort(table(unlist(lapply(x, function(x) x[-1]))), decreasing=true))  ## $a ##  ## 12 10 11 16 17  ##  4  2  2  2  1  ##  ## $b ##  ## 10 11 15  ##  1  1  1  

if want tabulate values particular group, e.g. group a (the vectors begin a), can either subset above result:

l2 <- tapply(l, sapply(l, '[[', 1), function(x)    sort(table(unlist(lapply(x, function(x) x[-1]))), decreasing=true),    simplify=false)  l2$a 

(note i've added simplify=false work if number of unique elements same across groups.)

it's more efficient perform operation group of interest, though, in case maybe following better:

sort(table(unlist(   lapply(split(l, sapply(l, '[[', 1))$a, function(x) x[-1]) )), decreasing=true) 

where split first splits l groups according vectors' first element, , subset group a $a.


Comments

Popular posts from this blog

sql - VB.NET Operand type clash: date is incompatible with int error -

SVG stroke-linecap doesn't work for circles in Firefox? -

python - TypeError: Scalar value for argument 'color' is not numeric in openCV -