Fit the Bayesian Multi-Output Regression Stacking model using the maize dataset and apply a random partition cross-validation

2019-02-11

This example shows how to evaluate prediction accuracy with multiple environments and multiple traits using a random partition as cross validation. The WheatToy data set available in the BMTME package and next is loaded using the data() function as shown below:

library(BMTME)
data('WheatToy')

With this command, you load the following 2 objects into the R environment:

Since the data set includes multiple environments and multiple traits, the BMORS model could be used, also, for its implementation we will see three different designs for the linear predictor, Using only the (1) lines effect, (2) Environment and lines effect and (3) Environment, lines effect and the interaction of the environment with the lines effect. Remember that the first step that we need do, is order the dataset as follows:

phenoWheatToy <- phenoWheatToy[order(phenoWheatToy$Env, phenoWheatToy$Gid),]

Linear predictor: Lines effect

To use only the effect of lines in the linear predictor, we will first occupy the matrix design of the lines, we can construct this with the following code,

LG <- cholesky(genoWheatToy)
ZG <- model.matrix(~0 + as.factor(phenoWheatToy$Gid))
Z.G <- ZG %*% LG

Once the matrix design has been generated, the linear predictor is constructed, for this purpose, a new object will be created that we will call ETA, that consist of a series of nested lists, where each sub list will contain one of the matrix designs, as well as the model with which this matrix will be modeled. In this case only contain the matrix design of the lines.

ETA <- list(Gen = list(X = Z.G, model = 'BRR'))

Next, we provide some commands as pre-analysis for building the cross validation:

pheno <- phenoWheatToy[, c(1:3)] #Use only the first trait to do a cv
colnames(pheno) <- c('Line', 'Env', 'Response')

This is done because the cross validation that will be implemented assumes that for those lines that are selected as missing all traits are missed. The cross validation is implemented with random partitions, with the function: CV.RandomPart(), as shown below,

CrossValidation <- CV.RandomPart(pheno, NPartitions = 4, PTesting = 0.2, set_seed = 123)

Before making the adjustment of the model, the phenotype matrix is obtained, for this purpose we use the following code,

Y <- as.matrix(phenoWheatToy[, c(3,4)])

Finally, the model is adjusted and for demonstration purposes only 5000 iterations are used to implement the model. In addition, we will show the information of the predictive capacity obtained in each trait for each evaluated environment using the summary() function.

pm <- BMORS(Y, ETA = ETA, nIter = 5000, burnIn = 2000, thin = 2, progressBar = TRUE,
              testingSet = CrossValidation,  digits = 4)
summary(pm)
#>   Environment Trait Pearson SE_Pearson  MAAPE SE_MAAPE
#> 1      Bed2IR  DTHD  0.9674     0.0196 0.2832   0.0762
#> 2      Bed2IR  PTHT -0.0226     0.2762 0.6689   0.0748
#> 3      Bed5IR  DTHD  0.8014     0.1034 0.5040   0.0644
#> 4      Bed5IR  PTHT -0.2934     0.2174 0.6196   0.0501
#> 5        Drip  DTHD  0.9354     0.0376 0.8576   0.0945
#> 6        Drip  PTHT  0.2458     0.1156 0.8944   0.0875

Linear predictor: Environment + Lines

To use only the effect of the environments and lines in the linear predictor, we will first occupy the matrix design of the environment, we can construct this with the following code,

Z.E <- model.matrix(~0 + as.factor(phenoWheatToy$Env))

Once the matrix design has been generated, the linear predictor is constructed, for this purpose, a new object will be created that we will call ETA2, in this case contain the matrix design of the lines and the matrix design of the environment.

ETA2 <- list(Env = list(X = Z.E, model = "BRR"),
             Gen = list(X = Z.G, model = 'BRR'))

Finally, the model is adjusted and for demonstration purposes only 2000 iterations are used to implement the model. Remember that in the past section we construct the dataset object and the crossvalidation object, and we reuse that objects in this model.

In addition, we will show the information of the predictive capacity obtained in each trait for each evaluated environment using the summary() function.

pm <- BMORS(Y, ETA = ETA, nIter = 2000, burnIn = 1000, thin = 2, progressBar = TRUE,
              testingSet = CrossValidation,  digits = 4)
summary(pm)
#>   Environment Trait Pearson SE_Pearson  MAAPE SE_MAAPE
#> 1      Bed2IR  DTHD  0.9658     0.0208 0.2790   0.0765
#> 2      Bed2IR  PTHT -0.0307     0.2904 0.6714   0.0794
#> 3      Bed5IR  DTHD  0.8027     0.1008 0.5082   0.0610
#> 4      Bed5IR  PTHT -0.2738     0.2171 0.6118   0.0544
#> 5        Drip  DTHD  0.9330     0.0389 0.8569   0.0959
#> 6        Drip  PTHT  0.1805     0.1096 0.8945   0.0857

Linear predictor: Env + Lines + EnvXLines

And finally, to use only the effect of the environments and lines with the interaction effect of the environment with the lines in the linear predictor, we will first occupy the matrix design of the interaction, with the following code the construction of all the matrix designs are presented for easy understanding,

# Line effect (section 3.1.1)
LG <- cholesky(genoWheatToy)
ZG <- model.matrix(~0 + as.factor(phenoWheatToy$Gid))
Z.G <- ZG %*% LG

# Environment effect (section 3.1.2)
Z.E <- model.matrix(~0 + as.factor(phenoWheatToy$Env))

#Interaction effect
ZEG <- model.matrix(~0 + as.factor(phenoWheatToy$Gid):as.factor(phenoWheatToy$Env))
G2 <- kronecker(diag(length(unique(phenoWheatToy$Env))), data.matrix(genoWheatToy))
LG2 <- cholesky(G2)
Z.EG <- ZEG %*% LG2

Once the matrix design has been generated, the linear predictor is constructed, for this purpose, a new object will be created that we will call ETA3, in this case contain the effect of environment, lines and the interaction of both.

ETA3 <- list(Env = list(X = Z.E, model = "BRR"),
             Gen = list(X = Z.G, model = 'BRR'),
             EnvGen = list(X = Z.EG, model = "BRR"))

Finally, the model is adjusted and for demonstration purposes only 2000 iterations are used to implement the model. Remember that in the first section we construct the dataset object and the crossvalidation object, and we reuse that objects in this model.

In addition, we will show the information of the predictive capacity obtained in each trait for each evaluated environment using the summary() function.

pm <- BMORS(Y, ETA = ETA3, nIter = 2000, burnIn = 1000, thin = 2, progressBar = TRUE,
              testingSet = CrossValidation,  digits = 4)
summary(pm)
#>   Environment Trait Pearson SE_Pearson  MAAPE SE_MAAPE
#> 1      Bed2IR  DTHD  0.9734     0.0160 0.2663   0.0403
#> 2      Bed2IR  PTHT  0.2258     0.2801 0.7856   0.0956
#> 3      Bed5IR  DTHD  0.8206     0.0876 0.3452   0.0389
#> 4      Bed5IR  PTHT -0.0519     0.2314 0.4870   0.1315
#> 5        Drip  DTHD  0.9329     0.0365 0.4848   0.0944
#> 6        Drip  PTHT  0.2726     0.2585 0.6763   0.0431

It is important to point out that it is possible to obtain a boxplot with the results using the following commands.

par(mar = c(7, 4, 2, 1)) #Bottom, Left, Top, Right
boxplot(pm, las = 2)

In this figure we can observe that the best average predictive accuracy is obtained with the trait-environment combination DTHD_Drip, while the lowest average predictive accuracy obtained was in the trait-environment PTHT_Bed5IR.