To get a better understanding of how PReMiuM handles real data a small dataset is extracted from the G2F data.

Load the libraries to manipulate data, run PReMiuM, and tictoc, which is a simple package to replace the use of proc.time for checking runtime on procedures..

library(tidyverse)
library(PReMiuM)
library(tictoc)

The PReMiuM profRegr function returns a lot of output. Creating a separate directory for the output can help when recovering files later.

setwd("~/GitHub/EnviroTyping/sandbox/interim_datasets/2015/")
dir.create("output")
setwd("./output")

Next, import a dataset. This one in particular has 5 different hybrids (2 replicates of each) found in 5 locations for a total of 50 observations. The columns are the variables which comprise of the hybrid label, the yield in bushels per acre collected at the end of the season, and weather variables temperature and dew point for months May through October.

df <- read_rds("../toy_data.rds")
head(df)

As mentioned in the PReMiuM manual it is a good idea to set the seed in the profRegr function and outside the PReMiuM workflow. Also depending on the size of the dataset (particularly the number of columns), the runtime on profRegr can increase dramatically. Here is process time is captured with the tic and toc functions. As this is model is mixed the continuous covariates need to be identified.

cont_vars <- names(df[,3:10])
cont_vars
set.seed(1234)
tic()
runInfoObj <- profRegr(covNames, outcome = 'Yield',
                       yModel = 'Normal', xModel = "Mixed",
                       discreteCovs = "Pedi",
                       continuousCovs = cont_vars,
                       data = df, nSweeps = 100, nBurn = 900,
                       nProgress = 25, seed = 1234)
toc()

The object created by profRegr can be useful for analysis later so keeping a copy of it is a good idea. It also can be quite large depending on the size of the dataset and number of iterations during the MCMC run, so compressing it to move across the servers is also wise.

write_rds(runInfoObj, path = "../runInfoObj.rds", compress = "xz")

Next, we complete the final portions of the PReMiuM workflow. The object riskProfObj is also saved for future work. A summary of the analysis can be seen in the plot created.

calcDists <- calcDissimilarityMatrix(runInfoObj)
clusObj <- calcOptimalClustering(calcDists)
riskProfObj <- calcAvgRiskAndProfile(clusObj)
write_rds(riskProfObj, "../riskProfObj.rds", compress = "xz")

clusterOrderObj<-plotRiskProfile(riskProfObj,"summary.png")