Molecular markers increase the efficiency of selection in
plant breeding programs. Genomic selection (GS) is one of the latest
applications of molecular markers for speeding up the process of plant breeding
and genetic gains. GS involves the prediction of unobserved phenotypes from the
genotype of the populations. In GS, a ‘training population’ for which the
genotype (marker) and phenotype data are known is used to train a model by
which to predict unobserved phenotype, as 'Genomic Estimated Breeding Value’
(GEBV), for a ’test population’ for which only the genotype data is known.
Training a model from the training population is simply estimating the vector
of marker effects. The accuracy of GEBV is paramount for the utility of GS,
which in turn relies on the selection of the markers and accuracy of their
effect estimates. Marker set optimization is critical to achieve higher
prediction accuracies. In my poster, I present a method for ‘marker-set
optimization’ by selectively removing a subset of markers (hereby referred to
as trivial markers) from the genotype data whose effect size do not seem to be
significantly different from zero, thereby increasing the prediction accuracy.
This method was demonstrated in the poster by a dataset containing 392 breeding
lines and 14,432 genome-wide markers. The 392 breeding lines were randomly divided
into two equal subsets, #1 and #2, that were used reciprocally to predict each
other. Marker effects sizes were estimated by bootstrapped Ridge-Regression
BLUP. Genetic markers were identified as trivial and removed from the dataset
if the 95% ranked-based confidence interval of the estimated marker effect
overlaps with zero. By applying this methodology, 69.6% and 67.3% of the marker
data were detected to be trivial for ‘days to heading’ and ‘grain yield’
respectively. These trivial markers were subsequently removed from the GS
pipeline. This removal increased prediction accuracy, measured and expressed as
Pearson correlation of GEBVs with real phenotype. For example, we observed an
improvement from 0.315 to 0.688 (up 218%) for ‘days to heading’. We augmented
this method to fusarium resistance traits by 100 iterations of bootstrap
resampling (300 training: 92 test individuals) cross validation. The mean prediction
accuracies across 100 iterations before marker set optimization were 0.304,
0.235, 0.404, 0.261, and 0.416 for INC, SEV, FDK, FHBdx, and DON respectively,
which increased to 0.557, 0.491, 0.610, 0.508, and 0.598, respectively, after
marker set optimization.