USWBSI Abstract Viewer

2022 National Fusarium Head Blight Forum


Variety Development and Host Resistance (VDHR)

Poster # 166

Genomic Selection: Marker Set Optimization Improves Prediction Accuracy

Authors & Affiliations:

Niels Steigenga1 and Mohsen Mohammadi1
1. Department of Agronomy, Purdue University, West Lafayette IN
Corresponding Author: Mohsen Mohammad, mohamm20@purdue.edu

Corresponding Author:

Mohsen Mohammadi
mohamm20@purdue.edu

Abstract:

Molecular markers increase the efficiency of selection in plant breeding programs. Genomic selection (GS) is one of the latest applications of molecular markers for speeding up the process of plant breeding and genetic gains. GS involves the prediction of unobserved phenotypes from the genotype of the populations. In GS, a ‘training population’ for which the genotype (marker) and phenotype data are known is used to train a model by which to predict unobserved phenotype, as 'Genomic Estimated Breeding Value’ (GEBV), for a ’test population’ for which only the genotype data is known. Training a model from the training population is simply estimating the vector of marker effects. The accuracy of GEBV is paramount for the utility of GS, which in turn relies on the selection of the markers and accuracy of their effect estimates. Marker set optimization is critical to achieve higher prediction accuracies. In my poster, I present a method for ‘marker-set optimization’ by selectively removing a subset of markers (hereby referred to as trivial markers) from the genotype data whose effect size do not seem to be significantly different from zero, thereby increasing the prediction accuracy. This method was demonstrated in the poster by a dataset containing 392 breeding lines and 14,432 genome-wide markers. The 392 breeding lines were randomly divided into two equal subsets, #1 and #2, that were used reciprocally to predict each other. Marker effects sizes were estimated by bootstrapped Ridge-Regression BLUP. Genetic markers were identified as trivial and removed from the dataset if the 95% ranked-based confidence interval of the estimated marker effect overlaps with zero. By applying this methodology, 69.6% and 67.3% of the marker data were detected to be trivial for ‘days to heading’ and ‘grain yield’ respectively. These trivial markers were subsequently removed from the GS pipeline. This removal increased prediction accuracy, measured and expressed as Pearson correlation of GEBVs with real phenotype. For example, we observed an improvement from 0.315 to 0.688 (up 218%) for ‘days to heading’. We augmented this method to fusarium resistance traits by 100 iterations of bootstrap resampling (300 training: 92 test individuals) cross validation. The mean prediction accuracies across 100 iterations before marker set optimization were 0.304, 0.235, 0.404, 0.261, and 0.416 for INC, SEV, FDK, FHBdx, and DON respectively, which increased to 0.557, 0.491, 0.610, 0.508, and 0.598, respectively, after marker set optimization. 


©Copyright 2022 by individual authors. All rights reserved. No part of this abstract or paper publication may be reproduced without prior permission from the applicable author(s).