Combining all elements in a vector of lists based on the common first element of each list in the vector in R -
i have large vector of lists (about 300,000
rows). example, let's consider following:
vec = c( list(c("a",10,11,12)), list(c("b",10,11,15)), list(c("a",10,12,12,16)), list(c("a",11,12,16,17)) )
now, want following:
for each unique first element of each list in vector, need unique elements occurring corresponding in lists in vector, along respective frequencies.
output like:
for a
, have elements 10, 11 12, 16 & 17
frequencies 2,2,4,2 & 1
respectively. b
, 10, 11, 15
frequencies 1,1,1
.
many in advance, ankur.
here's 1 way it.
first, simpler way create list is:
l <- list(c("a", 10, 11, 12), c("b", 10, 11, 15), c("a", 10, 12, 12, 16), c("a", 11, 12, 16, 17))
now can split first character, , tabulate first character.
tapply(l, sapply(l, '[[', 1), function(x) table(unlist(lapply(x, function(x) x[-1])))) ## $a ## ## 10 11 12 16 17 ## 2 2 4 2 1 ## ## $b ## ## 10 11 15 ## 1 1 1
scaling list comprising 300,000 elements of similar size:
l <- replicate(300000, c(sample(letters, 1), sample(100, sample(3:4, 1)))) system.time( freqs <- tapply(l, sapply(l, '[[', 1), function(x) table(unlist(lapply(x, function(x) x[-1])))) ) ## user system elapsed ## 0.68 0.00 0.69
if want sort vectors of resulting list, per op's comment below, can modify function applied groups of l
:
tapply(l, sapply(l, '[[', 1), function(x) sort(table(unlist(lapply(x, function(x) x[-1]))), decreasing=true)) ## $a ## ## 12 10 11 16 17 ## 4 2 2 2 1 ## ## $b ## ## 10 11 15 ## 1 1 1
if want tabulate values particular group, e.g. group a
(the vectors begin a
), can either subset above result:
l2 <- tapply(l, sapply(l, '[[', 1), function(x) sort(table(unlist(lapply(x, function(x) x[-1]))), decreasing=true), simplify=false) l2$a
(note i've added simplify=false
work if number of unique elements same across groups.)
it's more efficient perform operation group of interest, though, in case maybe following better:
sort(table(unlist( lapply(split(l, sapply(l, '[[', 1))$a, function(x) x[-1]) )), decreasing=true)
where split
first splits l
groups according vectors' first element, , subset group a
$a
.
Comments
Post a Comment