r - How can I simplify a lattice xyplot with millions of data points? -

- August 15, 2014

i have multiple sets of time history data collected @ approximately 500 hz 12 hours @ time.

i've plotted data using xyplot type="l" on log time scale, phenomenon largely logarithmic decay.

the resulting plots enormous pdf files take long time render , inflate file size of sweaved document, assume each individual data point being plotted, total overkill. plots reasonably reproduced orders of magnitude fewer points.

switching type="smooth" fixes rendering , file size issue, loess smoothing drastically alters shape of lines, after toying loess smoothing parameters, i've given on loess smoothing option here.

is there simple way either post-process plot simplify it, or sub-sample data before plotting?

if subsampling data, think beneficial in sort of inverse-log way, data near 0 has high time frequency (use 500 hz source data), time goes on frequency of data decreases (even 0.01 hz more sufficient near t=12 hours)--this give more-or-less equal plot resolution across log time scale.

after trying type="spline" , again being unhappy extent changes shape of data, decided go subsampling approach, reduced data density before plotting.

the function wrote subsample along log scale, "plot resolution" more or less constant.

## log.subsample(data,time,n.per.decade)  ## subsamples time-sampled data.frame there no more ## n.per.decade samples in each decade.  ## usage ## data: data.frame, data frame object, must contain column ##       times ## ## time: charater, name of data frame column time ##       values ## n.per.decade: max number of rows per decade of time  ## value ## returns data.frame object same columns data, ## subsampled such there no more n.per.decade rows in ## each decade of time. rows in data time < 0 dropped.  log.subsample <- function(data,time,n.per.decade){     time.col <- grep(x=colnames(data),pattern=time)     min.time <- min(data[,time.col])     if(min.time < 0){         data <- data[data[,time.col]>0,]         min.time <- min(data[,time.col])         droplevels(data)     }     max.time <- max(data[,time.col])     stopifnot(max.time > 0)     min.decade <- floor(log10(min.time))     max.decade <- ceiling(log10(max.time))      time.seq <- seq(from=min.decade, to=max.decade, by=1/n.per.decade)     time.seq <- 10^time.seq     for(i in 1:length(time.seq)){         tmp <- which(data[,time.col] >= time.seq[i])[1]         if(!is.na(tmp)){             if(!exists("indices.to.keep")){                 indices.to.keep <- tmp             }             else{                 indices.to.keep <- c(indices.to.keep,tmp)             }         }     }     indices.to.keep <- unique(indices.to.keep)     result <- data[indices.to.keep,]     result <- droplevels(result)     return(result) }

the issue here if there "groups" in data plot, subsampling function needs run on each group individually, , data frame needs built pass xyplot()

it great if tell me if it's possible "inject" subsampling routine xyplot() call somehow, such called each individual group of data in turn, eliminating need break data up, run subsampling routine, , put data before calling xyplot()

Search This Blog

IO

r - How can I simplify a lattice xyplot with millions of data points? -

Comments

Post a Comment

Popular posts from this blog

javascript - DIV "hiding" when changing dropdown value -

html - Accumulated Depreciation of Assets on php -

c# - WPF DataGrids for hierarchical information -