for loop - R, Create data.frame conditional on colnames and row entries of existing df -
i have follow question.
i creating data.frame conditional on column names , specific row entries of existing data.frame. below how resolved using for loop (thanks @roland's suggestion... real data violated requirements of @eddi's answer), has been running on actual data set (200x500,000+ rows.cols) more 2 hours now...
(the following generated data.frames similar actual data.)
set.seed(1) <- data.frame(year=c(1986:1990), events=round(runif(5,0,5),digits=2)) b <- data.frame(year=c(rep(1986:1990,each=2,length.out=40),1986:1990), region=c(rep(c("x","y"),10),rep(c("y","z"),10),rep("y",5)), state=c(rep(c("ny","pa","nc","fl"),each=10),rep("al",5)), events=round(runif(45,0,5),digits=2)) d <- matrix(rbinom(200,1,0.5),10,20, dimnames=list(c(1:10), rep(1986:1990,each=4))) e <- data.frame(id=sprintf("%02d",1:10), as.data.frame(d), region=c("x","y","x","z","z","y","y","z","y","y"), state=c("pa","al","ny","nc","nc","nc","fl","fl","al","al")) (i in seq_len(nrow(d))) { (j in seq_len(ncol(d))) { d[i,j] <- ifelse(d[i,j]==0, a$events[a$year==colnames(d)[j]], b$events[b$year==colnames(d)[j] & b$state==e$state[i] & b$region==e$region[i]]) } }
is there better/faster way this?
a simpler way (i think - not involve melting, dcasting , merging) follows:
first, , b arrays, should indexed year (for a) , year/state/region (for b):
at = a$events; names(at) = a$year bt = tapply(b$events,list(b$year,b$state,b$region),function(x) min(x)) # note, used min(x) in tapply on safe side, functions returns scalar # create result of more complex case (lookup in b) ids = cbind(colnames(d)[col(d)], as.character(e$state[row(d)]), as.character(e$region[row(d)]) ) vals=bt[ids]; dim(vals)=dim(d) # , compute desired result ifelse result = ifelse(d==0,at[colnames(d)[col(d)]],vals) # , that's it!
this should faster (avoiding nested loops), haven't profiled that. let know how works on full data
Comments
Post a Comment