r - how can I eliminate a loop over a datatable? -
i've 2 data.table
s shown below:
n = 10 a.dt <- data.table(a1 = c(rnorm(n,0,1)), a2 = na)) b.dt <- data.table(b1 = c(rnorm(n,0,1)), b2 = 1:n) setkey(a.dt,a1) setkey(b.dt,b1)
i tried change previous data.frame
implementation data.table
implementation changing for-loop shown below:
for (i in 1:nrow(b.dt)) { (j in nrow(a.dt):1) { if (b.dt[i,b2] <= n/2 && b.dt[i,b1] < a.dt[j,a1]) { a.dt[j,]$a2 <- b.dt[i,]$b1 break } } }
i following error message:
error in `[<-.data.table`(`*tmp*`, j, a2, value = -0.391987468746123) : object "a2" not found
i think way access data.table
not quite right. new it. guess there quicker way of doing cycling , down 2 datatables.
i'd know if loop shown above simplified/vectorised.
edit data.table data copy/paste:
# a.dt a1 a2 1 -1.4917779 na 2 -1.0731161 na 3 -0.7533091 na 4 -0.3673273 na 5 -0.159569 na 6 -0.1551948 na 7 -0.0430574 na 8 0.1783496 na 9 0.4276034 na 10 1.0697412 na # b.dt b1 b2 1 0.64229018 1 2 1.00527902 2 3 0.24746294 3 4 -0.50288835 4 5 0.34447791 5 6 -0.22205129 6 7 0.60099079 7 8 -0.70242284 8 9 0.6298599 9 10 0.08917988 10
the output expect:
# output a1 a2 1 -1.4917779 na 2 -1.0731161 na 3 -0.7533091 na 4 -0.3673273 na 5 -0.159569 na 6 -0.1551948 na 7 -0.0430574 na 8 0.1783496 -0.50288835 9 0.4276034 0.24746294 10 1.0697412 0.64229018
the algorithm goes down 1 table, , each row go other table, check conditions , modify values accordingly. more specifically, goes down b.dt, , each row in b.dt goes a.dt , assigns a2 first value of b1 such b1 smaller a1. additional condition checked before assignment (b2 being equal or smaller 5 in example).
0.64229018 first value in b.dt, , assigned last unit of a.dt. 1.00527902 second value in b.dt, left unassigned because bigger other values in a.dt. 0.24746294 third value in b.dt, , assigned second last unit in a.dt. -0.50288835 fourth value in b.dt, , assigned unit #8 in a.dt 0.34447791 fifth value in b.dt, , left unassigned because big.
this of course simplified problem (and therefore may not make sense). time , input.
your code run changing:
a.dt[j,]$a2 <- b.dt[i,]$b1
to
a.dt$a2[j,] <- b.dt[i,]$b1
as more efficient use of data.table
, i'll leave more expert i...
Comments
Post a Comment