Wheat genomics and breeding: bridging the gap

Babar Hussain, Bala A. Akpınar, Michael Alaux, Ahmed M. Algharib, Deepmala Sehgal, Zulfiqar Ali, Rudi Appels, Gudbjorg I. Aradottir, Jacqueline Batley, Arnaud Bellec, Alison R. Bentley, Halise B. Cagirici, Luigi Cattivelli, Fred Choulet, James Cockram, Francesca Desiderio, Pierre Devaux, Munevver Dogramaci, Gabriel Dorado, Susanne Dreisigacker, David Edwards, Khaoula ElHassouni, Kellye Eversole, Tzion Fahima, Melania Figueroa, Sergio Gálvez, Kulvinder S. Gill, Liubov Govta, Alvina Gul, Goetz Hensel, Pilar Hernandez, Leonardo C. Herrera, Amir Ibrahim, Benjamin Kilian, Viktor Korzun , Tamar Krugman, Yinghui Li, Shuyu Liu, Amer F. Mahmoud, Alexey Morgounov, Tugdem Muslu, Faiza Naseer, Frank Ordon, Etienne Paux, Dragan Perovic, Gadi V..P. Reddy, Jochen Christoph Reif, Matthew Reynolds, Rajib Roychowdhury, Jackie Rudd, Taner Z. Sen, Sivakumar Sukumaran, Vijay Kumar Tiwari, Naimat Ullah, Turgay Unver, Selami Yazar, Hikmet Budak


The 17 Gbp wheat genome: challenges and opportunities
Wheat (Triticum aestivum L.) is the key crop for feeding the earth`s growing population and remains a staple food in many regions of the world. It is cultivated on more than 220 million hectares worldwide, and global production exceeds 749 million tons annually (http://faostat.fao.org/). Bread wheat is a hexaploid species (2n = 6x = 42, genome AABBDD) that evolved via natural hybridisation between tetraploid domesticated wheat T. turgidum ssp. dicoccum (which contributed the AA and BB sub-genomes) and the wild grass species Aegilops tauschii (DD sub-genome), followed by the domestication of the resulting hexaploid spelt wheat (T. spelta) [1]. The wheat genome is ~17 Gbp in size and contains a high degree of complexity, particularly in terms of chromosomal duplications and rearrangements, and the very high percentage of repetitive sequences [2,3]. Therefore, the reference sequence of the wheat genome was completed after the references of many other plant genomes.
Wheat breeding targets are numerous and varied given the wide geographic area across which wheat is grown. However, the principal common targets are grain yield, quality determinants, and tolerance to biotic and abiotic stresses. The complexity of the wheat genome makes improving qualitative and quantitative traits through molecular approaches challenging. An example of this is drought tolerance, which is conferred by various signalling molecules, micro-RNAs (miRNAs), transcriptional factors, quantitative trait loci (QTL), transcripts, proteomes, ionomes, and metabolites, resulting in a complex signalling cascade for the control of drought tolerance [4]. Furthermore, multiple genes are involved in the production and regulation of these molecules which leads to a complex signalling cascade responsible for conferring abiotic/biotic stress tolerance. Hence, knowledge of the sequence, as well as the precise location, annotation, and casual polymorphisms of the genes involved is vital for utilizing the genomic data in breeding programs aimed at achieving specific and desired traits or phenotypes.
Due to its larger genome size in comparison to other major crops with smaller genomes, efforts to sequence and annotate the wheat genome have been extremely time-consuming and often involved sequencing of individual chromosomes [5,6]. The International Wheat Genome Sequencing Consortium (IWGSC) reported a draft sequence of bread wheat (cv. Chinese Spring) in 2014, derived from sequencing flow sorted chromosomes/chromosome arms. The draft assembly totalled 12.7 Gbp, comprising 124,201 gene loci distributed across A, B, and D sub-genomes [2]. However, this assembly contained only 12.7 Gbp, approximately threequarters of the whole genome. Furthermore, the genome sequences of the chromosomes/chromosome arms were fragmented with many gaps as well as many incomplete, absent, or incorrectly assigned genes making it hard for scientists to find and elucidate specific genes [2,7,8] Despite the incompleteness of this version, it was highly useful for breeders as it provided valuable information at the chromosomal/chromosome arm level. A draft whole genome sequence of wheat was obtained by combining long Pacific Biosciences (PacBio) reads (>10,000 bases long) with short (150-bp) Illumina reads, with 15.34 Gbp and an average contig size of 0.23 Mbp [9]. Low coverage sequence data for 16 varieties were released in 2012 and used as the basis for the first draft wheat pangenome [10]. This study highlighted the fact that due to gene presence/absence variation, a single reference does not represent the gene content of the species [11][12][13].
The reference sequence of wheat was achieved by the International Wheat Genome Sequencing Consortium in several steps. A whole-genome sequence based on Illumina technology and a draft assembly was released in 2016 (IWGSC WGS v0.4) was comprised of Illumina short sequence reads assembled with NRGene`s DeNovoMagic TM (https://wheaturgi.versailles.inra.fr/Seq-Repository/Assemblies) [14]. This was then combined with physical maps of the chromosome/chromosome arm and other genomic resources that had been developed over thirteen years by numerous laboratories around the world to develop IWGSC RefSeq v1.0. In 2018, the fully annotated reference genome assembly was released (IWGSC RefSeqv1.1) [14] with the precise location and annotation of 107,891 high-confidence genes and more than 4 million molecular markers along the 21 chromosomes. The chromosome-scale assembly covered approximately 94% of the bread wheat genome (cv. Chinese Spring) with a total assembly size of 14.5 Gbp. A key feature of this new genome is the long scaffolds of which 90% were larger than 4.1 Mbp and the longest super scaffold was 166 Mbp (i.e., larger than the 135 Mbp Arabidopsis thaliana genome and half the size of the rice genome) [14]. Accordingly, RefSeqv1.0, with the highest sequence contiguity, has become a tool for wheat genomics and breeding activities worldwide. In 2020, the release of genome assemblies for 15 additional wheat accessions with diverse origins across the globe [15] has further consolidated the position of wheat in the genomics era and provides additional resources to underpin breeding strategies. The availability of multiple, high quality genome assemblies for wheat has highlighted the genomic diversity present in the global breeding program as seen in the introgressions from wild relatives, structural rearrangements, and variation in gene content originating from various breeding efforts aimed for diverse and multiple traits [15]. An advantage of having multiple assemblies is that it enables the discovery of new sequences and genes that were not present in previous versions of the wheat genome; thus, creating new opportunities to identify, characterise, and exploit the beneficial alleles/haplotypes present for wheat improvement.
In summary, these genome assemblies represent an essential, highly efficient resource for wheat researchers and breeders to identify and clone major genes and QTL, to elucidate regulatory regions, including miRNAs and transcription factors, and gene networks involved in yield, as well as biotic and abiotic stress tolerance in wheat, thereby facilitating their use in wheat improvement programs.
In this paper, we consider the potential of current whole genome assemblies to improve the accuracy and resolution of genetic mapping, QTL mapping, genome-wide association studies (GWAS) and the use of sequence-based markers for efficient marker assisted (MAS) and genomic selection (GS). We present the advances in the identification and cloning of major wheat candidate genes for wheat breeding programs. We also discuss the progress made in CRISPR/Cas9-mediated genome editing and bioinformatics tools for wheat improvement, at the start of what might be considered a golden era of genomics-assisted breeding. Ultimately, this information will help breeders achieve greater genetic gains in wheat, and so promote the aim of sustainably intensifying global food production.

Advances in genomic tools for molecular breeding in wheat
Since the start of industrial crop breeding practices more than 100 years ago, breeding activities have resulted in consistent incremental genetic gains in wheat agronomic performance [16]. Over most of this period, conventional breeding techniques have relied on phenotypic selection. Phenotypes, however, are commonly influenced by the environment, and even in ideal conditions, it can take more than six years to obtain the desired levels of homozygosity required for lines generated by inbreeding. More recently, technological advances such as MAS and GS have helped to accelerate wheat breeding by assisting the identification of desirable parents, progenies and/or traits in a reliable and efficient manner [4,17]. Until relatively recently, MAS had not been widely adopted in wheat breeding programmes due to a lack of reliable molecular markers that are either diagnostic for the allelic state or poor linkage to the underlying causative polymorphism(s) to be of use for breeding purposes. Next-generation sequencing (NGS) technologies have greatly increased our capability to identify DNA variants, resulting in the availability of large numbers of genetic markers across the wheat genome [18]. Such resources allow, for example, accelerated gene cloning and the development of functional markers. For example, single nucleotide polymorphism (SNP) markers are prevalent, with more than 68,000 SNPs linked to genes on chromosome 5D of the diploid wheat progenitor Aegilops tauschii compared to chromosome 5D of bread wheat [19]. The use of SNPs for gene mapping, germplasm characterization and breeding has allowed fast progress due to their sequencetagged qualities and their often-co-dominant nature. This also makes their use rapid and costeffective [19,20], particularly for tracking particular haplotypes within the genome, in order to better monitor genetic changes in relation to phenotype [21]. An example for the success of haplotype-based analysis of wheat genomes can be found in the identification of genome regions associated with high-altitude adaptation and response to harsh environmental constraints in the study Guo et al [22] carried out on Tibetan wheat lines and landraces and the increase in predictability of resistance to leaf rust (Puccinia triticina) in European wheat hybrids [23].
NGS has facilitated the development of various high-throughput genotyping platforms in wheat, including high-density SNP arrays such as a 9K SNP iSelect array [24], 15K SNP array (TraitGenetics) [25], the Axiom Wheat Breeders' 35K SNP Array [20], the 90K SNP iSelect array [26], a 660K SNP array [26,27], and the Axiom® 820K array (CerealsDB). New designs for SNP arrays continue to be developed [18] based on the extensive genome sequence data for wheat in order to focus on SNPs that are relevant to haplotype mapping (Keeble-Gagnere et al., 2021; submitted). These resources have been used to genotype a wide range of wheat population types to date, including the construction of high-density linkage maps and QTL mapping in various population types including bi-parental (composed of recombinant inbred lines (RILs) or doubled haploid (DHs) lines), near isogenic lines (NILs), association mapping panels and multi-parent populations [4,28,29] The availability of fully annotated wheat genomes means that we can select wheat germplasm for different traits to utilize the hidden genetic variation in crop improvement with more confidence and ease. For example, 504 SSRs, 6,689 expressed sequence tags (ESTs), 3,025 diversity array technologies (DArTs), 4,512,979 insertion sitebased polymorphism (ISBP) and 205,807 SNP markers were identified in cv. Chinese Spring [14]. Such extensive databases of molecular markers provide the power needed to apply MAS and GS (discussed in Section 4) in wheat. Here, we describe the use of recent sequencing-based SNP genotyping platforms and genotyping-by-sequencing (GBS) for QTL mapping and GWAS with a particular focus on the identification and utilization of candidate genes in wheat molecular breeding.

Advances in molecular breeding for grain yield and related traits
Wheat grain yield (GY) is controlled by numerous genetic components, most of which are quantitative in nature. Due to this underlying complexity, QTL mapping is commonly used for the dissection of grain yield and yield components, in order to identify markers for MAS. Prior to the 2014 draft sequence, several QTL studies have been reported to use redundant SSR markers for QTL mapping of GY and related traits as reviewed recently [30], but most of these regions were not incorporated into wheat cultivars in breeding programs. The IWGSC draft sequence published in 2014 [2] enabled the use of genotyping arrays and GBS with deep coverage to construct high-density linkage maps and identify several candidate genes [31][32][33][34][35][36][37]. Major and stable QTL for plant height, anthesis date, flag-leaf length and width; as well as spike length, density and spikelet (n) per spike were mapped on chromosome 2D and 4B with individual phenotypic variation (PV) range of 10.10 -30.68%. Other QTL were mapped on chromosomes 4A and 6D. The markers were associated with candidate genes coding for TGTCTC auxin response elements, F-box protein TIR1, Flowering Locus T-like protein, MADS-box transcription factor 8 and twelve genes encoding SAUR-like auxin-responsive family proteins [31]. Three independent studies identified haplotype SNP markers and major stable QTL for seed number per pod (SNPP), thousand-grain weight (TGW), grain length, flag leaf length, width and area on chromosomes 7A [32,33,35] and 5A [35]. These QTLs were associated with candidate genes such as WHEAT ORTHOLOG OF ABERRANT PANICLE ORGANIZATION 1 (WAPO1) and TaGASR7. Among these, a well-studied and reproducible yield QTL on the long arm of chromosome 7A has been located to an 87-kb region (674,019,191-674,106,327 bp, IWGSC RefSeq v1.0) containing two full and two partial genes. The ortholog of one of these genes (TraesCS7A01G481600) was APO1, which is known to significantly affect panicle attributes [32]. This APO1 ortholog was the best candidate for the spikelets per spike phenotype and was associated with two amino acid changes (C47F and D384N) in the coding region. In the genomic region carrying the chromosome 7A APO1 gene, three major haplotypes were associated with the spikelets per spike phenotype and two of these show enrichment in modern germplasm [32,38]. More recently, genetic analysis using a wheat multi-founder population genotyped with a 20K SNP array found that allelic variation at the homoeologous location on chromosome 7B was associated with haplotype variation at the WAPO-B1 gene [39]. Another recent example of the use of high-density SNP arrays for the genetic mapping of yield components was the use of a 660K SNP array led to the identification of a major stable QTL for grain number per spike on chromosome 4A that corresponded to 65 putative genes [34] and contributed 8.0 -21.2% to PV.
GWAS [42,43] based on GBS and the 15K SNP array genotyping identified 118 and 74 significant MTAs, respectively, for yield and related traits that associated with candidate genes such as Gibberellin 2-oxidase 2 (GA2ox2), Pre-rRNA-processing protein TSR2, Glutathione peroxidase (GPx), ATP-dependent zinc metalloprotease (FtsH 1), mother of flowering time and terminal flowering 1 (MFT1), WAPO1, B2 heat stress response protein, and gibberellin oxidase protein (GA2ox-A1) as detailed in Table 1. Furthermore, total 27 QTLs were combined to develop a consensus map consisting of 140,315 markers and 376 QTL, including 221 for grain yield-related traits. The projection of this map and the relevant QTL onto the wheat syntenome (having 99,386 genes) identified 32 metaQTL (mQTL), including 18 grain yield mQTL associated with 15,772 genes (28,630 SNPs), 37 of which were major candidate genes [47] including ATPase, GIF1, Ppd-D1, Prog1, Gn1-a, NYC1, emp4, DEP1, GW2, GS2 and Rc3. Taken together, these studies illustrate the information provided through GWAS which can be used for improving wheat through MAS. However, a recent study involving 2,000 wheat accessions suggests that validation of MTAs in wheat remains a challenge [48] further research is needed before it can be widely used for wheat breeding.
Despite the large number of genetic mapping studies undertaken, map-based positional cloning has been limited in wheat, largely because of the lack of a high-quality, fully annotated reference genome sequence. One of the earliest examples was the cloning of the vernalisation gene, VRN1 [49]. Despite advances in cloning methods (Section 3), only a single QTL for an increase in grain number per spikelet, GNI1, has been cloned in wheat [50]. However, sequences of several genes such as TaGW2 [51], cell wall invertase, TaCwi-A1 [52], TaGASR7-A1 [53], TaGS-D1 [54], IAA-glucose hydrolase gene, TaTGW6 [55] and TaTGW-7A [56] have also been cloned for characterizing their roles in the improvement of yield and related traits. Their chromosomal location and targeted traits are described in Table 1. The availability of the annotated reference sequence, multiple sequenced genomes, other genomic resources, and the subsequently improved candidate gene selection, will allow gene cloning in wheat to advance in a manner comparable to that of other cereals with high quality reference sequences.

Advances in molecular breeding for drought tolerance
Drought is the most devastating abiotic stress curtailing productivity of all crops and in wheat, results in significant yield losses of up to 50% [4]. Therefore, the development of droughttolerant varieties has been a prime objective in global wheat breeding programs. Using traditional bi-parental mapping approaches, QTL for various agronomic and physiological traits responsive to drought stress have been genetically mapped (Table 2), and has been reviewed extensively [4,57]. The most common physiological traits that have been targeted for QTL mapping of wheat drought tolerance include, but are not limited to, canopy temperature, carbon isotope discrimination, chlorophyll content, water-soluble carbohydrates, ABA production, relative water content, stay-green trait, photosynthetic capacity/rate, cell membrane thermostability and importantly, various root architectural traits such as root elongation rate, primary root length, lateral root length, root angle, deep root ratio, root-shoot ratio, root biomass and deep root length [58][59][60][61][62][63][64][65][66][67]. For root related traits alone, more than 634 QTL have been reported and were projected onto a consensus map [58]. This study led to the identification of 94 consensus root mQTL, of which 35 were related to drought response and these mQTL were linked to 68 candidate genes ( Table 2). Pleiotropic QTL linked to both physiological traits and yield-related traits have also been reported. For example, QTL for chlorophyll content, water use efficiency, photosynthetic rate, and internal CO2 concentration were co-located with QTL for GY and/or yield components [59].
Increased abscisic acid (ABA) levels have been suggested to impart drought tolerance in wheat by accelerating the accumulation of osmolytes [4]. A major QTL for ABA responsiveness was mapped on chromosome 6D in an F2 population derived from a cross between synthetic wheat lines contrasting for ABA-responsiveness [60]. This 6D QTL regulated the expression of late embryogenesis abundant (LEA) genes under drought conditions. A few studies have investigated the genetics of stomatal traits under drought stress by QTL mapping [61,62]. Most importantly, two QTL, one on chromosome 5A [61] and another on chromosome 7A [62], were identified and both QTL were co-located with QTL of yield components or harvest index.
The stay-green trait (delayed leaf senescence) has been found to be correlated with adaptation to drought stress in wheat. Normalized difference vegetation index (NDVI) is used as an indirect selection criterion for stay-green and higher yield under drought in wheat [57]. Shi et al. [63] reported a major QTL for NDVI on chromosome 5A and several pleiotropic QTL for NDVI and agronomic traits on chromosomes 1B, 3D, 4D and 7A. Such pleiotropic regions shared by NDVI, biomass and yield component will aid breeders to utilize the trait as an indirect selection criterion for GY improvement. Consistent QTLs (2B, 4A, 7B) for drought and heat stress-related agronomic and physiological traits (stay-green, canopy temperature) were identified in a phenology controlled population of Seri/Babax [68,69]. The genetic map was updated using 90K SNPs and DArTseq markers and QTLs were identified for GY, TGW, GN, NDVI, and CT [70]. Another study identified QTLs for heat and drought stress QTLs on a synthetic derived population with a flowering time range of 3 days [71].
The drought sensitivity index (DSI) of agronomic traits, as an indirect measure of drought tolerance, has been used in QTL mapping studies in wheat as an indirect measure of drought tolerance [72][73][74]. Gahlaut et al. [73], for example, mapped DSI of nine drought-responsive agronomic traits in a DH population, which was evaluated under 22 environments in India under both irrigated and rain-fed conditions. Most importantly, the authors reported four stable QTL on chromosomes 5A and 7A. Additionally, two more studies identified stable QTL of agronomic traits under drought stress on chromosomes 3B [72], 2B, 5A, 6A and 7D [74].
Due to the availability of high-density SNPs in the genomics era, GWAS became a leading approach for the dissection of complex traits such as drought tolerance in wheat. The vast majority of GWAS research in wheat has been done by mapping solely GY and yield components under drought stress [21,66,67,75]; others have explored a wide range of physiological traits or a combination of physiological and yield-related traits [65,73,76,77]. Recently, root architectural traits including root/shoot dry weight ratio, root length and root biomass under drought stress have been extensively investigated by GWAS [58,64]. An analysis of GWAS publications in wheat reveals that drought stress tolerance-linked MTAs have repeatedly been reported on chromosome 4A [65][66][67]. Ballesta et al. [67], for example, mapped four types of indices (i.e., stress susceptibility index, stress tolerance index, tolerance index and yield stability index) based on GY and yield components on chromosome 4A. Edae et al. [65] identified MTAs for DSI, leaf senescence, green leaf area and flag leaf traits on chromosome 4A. Sehgal et al. [66] reported two stable QTL on chromosome 4A associated with GY under drought and heat stress environments.
With the availability of dense genome-wide markers from SNP arrays and other high-density wheat genotyping platforms (9K, 15K, 90K, 660K and 820K SNP arrays, genotyping-bysequencing and DArTseq), the most recent investigations have also explored haplotypes-based GWAS approaches for identifying stable QTL for drought stress tolerance [21,75,77,78]. Sehgal et al. [21], for example, used a combination of haplotypes-based GWAS with epistatic interactions to untangle the genetic architecture of GY under multiple stress environments (including mild and severe drought stress) using a large panel of 6,333 advanced lines from the International Maize and Wheat Improvement Centre (CIMMYT). They reported four and ten stable haplotype association with grain yield under mild and severe drought stress environments, respectively. Most importantly, the authors identified a significant association of a haplotype block close to the Vrn-B1 flowering time gene on chromosome 5B, with GY in more than 70% of the trials under severe drought stress. Vrn-B1 is significantly correlated with adaptation to low temperature thus this shows a shared tolerance mechanism for both abiotic stresses.
In candidate gene-based association mapping approaches, re-sequencing of genes with known or predicted biochemical function is performed and SNP variation identified within a CG is used to investigate associations with traits [60]. Using this approach allelic variation in four drought-related genes has been investigated in wheat. Edae et al. [77] reported associations of SNPs in three genes, DREB1A, ERA1 and 1-FEH, with multiple agronomic and physiological traits. These are known to be drought stress-induced genes in ABA-dependent (ERA1) and ABA-independent (DREB1A, 1-FEH) pathways. In another CG-based association mapping study, allelic variation in TaSnRK2.8 (an SNF-1 type serine-threonine protein kinase) showed association with plant height, flag leaf width and water-soluble carbohydrates under drought conditions [76]. Other CGs involved in wheat AB/ABA-dependent/ABA-independent signalling pathways have been described, including DREB1, WRKY1, DREB1A, HKT-1, DREB2, DREB3, ERA1-B, ERA1-D, 1-FEH-A, and 1-FEH-B [3]. Finally, a drought tolerance QTL on chromosome 6D was shown to regulate the expression of late embryogenesis abundant (LEA) genes such as TaABA8 ′ OH1, CYCB2, and CDKA1 [60], which suggests these genes play a role in drought tolerance. Furthermore, several clusters of drought-responsive genes (DReG) were found on the long arm of the group 5 chromosomes with orthology to a known QTL of O. sativa. In particular, this region contains the genes PSY3, NCED, VRN1, UGDH and the dehydrin DHN38 that increase their expression under field drought stress conditions [79]. Similarly, transcriptomics analysis of high yielding and drought tolerant US wheat cultivars TAM 111 and TAM 112 identified several DReGs i.e. aquaporin, dehydrogenase, kinase, phosphatase synthase, phosphorylase and sugar transporter were down-regulated and dehydrin, ABA-inducible protein kinases, LEA protein, heat shock protein, caleosin, lyase, amylase and oxidoreductase were up-regulated under drought stress [80]. These studies provide useful insight for future drought tolerant breeding.

Advances in molecular breeding for heat tolerance
Similar to drought stress, heat stress has been projected to become a major threat to wheat production in a changing climate. A 4-6% reduction in average global yields of wheat is predicted for each 1 °C increase in global mean air temperature [81]. Heat stress at reproductive stage resulted in 66% reduction in green yield in wheat [82]. In the past decade, multiple QTL mapping studies in wheat using bi-parental populations and different traits as indicators of heat tolerance have been reported ( Table 3). Many of these studies consistently reported QTL hotspots on chromosome 3B [83][84][85]. Bennett et al. [83], for example, identified two QTL for canopy temperature and GY on chromosome 3B. Sharma et al. [85] identified a significant genomic region on chromosome 3B for Fv/Fm trait (maximum quantum efficiency of photosystem II) in three mapping populations.
The GWAS approach to dissecting heat tolerance in wheat has gained importance in the past two years [21,66,93]. Tadesse et al. [93] explored 197 spring wheat genotypes from the International Center for Agricultural Research in the Dry Areas (ICARDA) under the heatstressed environments in Sudan and Egypt. Through MTA and GWAS, the authors identified astable genomic regions on chromosome 4A and 5A associated with yield at both geographical locations. Furthermore, they delineated a suitable marker combination, with one marker each from chromosomes 4A and 5A which together resulted in a yield advantage by 15%. Sehgal et al. [21] identified 15 stable haplotypes associated with grain yield under heat stress environments by haplotypes-based GWAS in a large panel of advanced lines from CIMMYT and reported a haplotype block hotspot region for heat tolerance on chromosome 7A. Similarly, MTAs for grain yield, heat and drought susceptibility indices and yield stability coefficient were also reported [66] and were associated with the flowering time genes Vrn-B1, Ppd-D1 and Vrn-D3. To add to this, another study reported MTAs for spike ethylene content under heat stress conditions on the WAMI population [94].

Advances in molecular breeding for salinity tolerance
Although a large area of the world's land is considered saline, research on salinity stress responses in crops is limited by the complexity of these responses and their interactions with other stresses [95]. Compared to drought tolerance, QTL mapping for salt tolerance (ST) is scarce and mainly focussed on sodium exclusion (NAX), K + concentration (KC), and grain yield under salinity (Table 4). Before the release of the wheat draft sequence in 2014, QTL for grain yield, NAX, KC and shoot weight were focussed [96]. QTL studies for ions other than Na + and K + under salinity are rare. QTL for Clin wheat were found to differ between hydroponic and field conditions, and a major chromosome 5A QTL for Clcontributed 27.0-32.0% to PV in the field. Additionally, 19 QTL for Mg 2+ and Ca 2+ were mapped at the same location as of Cl -QTL [97] with potential candidate genes, chloride channel (CLC) and cation chloride co-transporter (CCC).
In recent years, SNP-based genotyping platforms, including the 9K [98], 35K [20,99], 90K [100] and 660K [101,102] arrays, have been used to identify novel and major QTL, MTAs and CGs that can be used in MAS and genomic selection for future wheat breeding (  [98]. Importantly, both of these studies [20,98] identified major QTL for KC and NAX on chromosomes 6A and 7A, respectively.

Advances in molecular breeding for frost tolerance
An important limiting factor for wheat production in North America, North and Eastern Europe and Russia is low temperature [103]. As Polar Regions become more unstable due to climate change, the risk of extreme weather events including freezing temperatures increases [104]. Therefore, resilience to frost is an important crop trait to consider. Frost tolerance is a complex biological process involving pathways encompassing a large number of genes. The main pathway is frost response, and a prolonged period of low temperature (vernalization) regarded as an avoidance mechanism to prevent frost damage to sensitive reproductive organs. Two major frost tolerance loci, Frost Resistance 1 (FR1) and FR2, were identified on the long arm of chromosome 5A [105]. Zhao et al. [106] described an additional frost tolerance QTL on chromosome 5B in wheat germplasm from central Europe. During the last decade, several QTL associated with frost tolerance were identified on different wheat chromosomes (i.e. 1A, 1D, 2A, 2B, 3A, 5A, 5B, 6A, 6B, 6D and 7B) [107,108], the majority of genes assumed to be involved in frost tolerance have been identified on chromosome 5 [103,105,106,109]. Until recently, only a few studies reported the identification of QTL regions associated with frost tolerance by GWAS [103,110]. Babben et al. [103] demonstrated the utilization of the IWGSC RefSeq v1.0 in the specific primer development for highly conserved gene families in wheat. It showed that a candidate gene association genetics approach is a useful tool for identifying new alleles of genes important for the response to flowering time. Sequence analysis concluded that C-repeat binding factors (CBF)-A3, 5, 10, 13, 14, 15 and 18, vernalisation response genes (VRN-A1, VRN-B3) and photoperiod response genes (PPD-B1 and PPD-D1) were associated with frost tolerance in wheat.
During the last decade, several components encompassing messenger molecules, protein kinases and phosphatases, and transcription factors, which are involved in cold-stress signalling pathways, have been reported by studies using wheat sequence information [103,[111][112][113]. The CBFs, Inducer of CBF Expression (ICEs) and cold-responsive (CORs) genes or ICE-CBF-COR are part of the main cold signalling pathway that plays a major role in controlling frost tolerance for crop species [111,112]. ICE genes belong to the MYC family transcription factor and MYC subfamily of bHLH (Basic Helix-Loop-Helix) [111]. ICE factors are known as positive CBF expression regulators which placed upstream region of the low-temperature signalling pathway. Two ICE homologs such as TaICE41 and TaICE87 have been identified in wheat [111]. The TaICE41, TaICE87 and five MYC-like BHLHs were positively regulated upstream of the CBF (C-repeat binding factor) mediated transcriptional cascade controlling cold tolerance in wheat [113].
The C-repeat binding factor/dehydration responsive element binding factor (CBF/DREB) is a member of the AP2/ERF multi-gene family, which represent critical regulators of the freezing tolerance mechanisms in plants. These genes are regulated through inducers of CBF expression (ICEs) by binding to MYC recognition cis-elements (CANNTG) in the promoter [112], which constitutes the signal regulatory pathway, ICE-CBF-COR. This cascade mediates the response to cold stress. CORs (cold-responsive) are pointed to proteins that are encoded by coldresponsive or cold-regulated genes. These genes are induced by CBF or DREB2 proteins through binding to cis-acting dehydration responsive elements (DRE) or C-repeat (CRT) motif (5′-CCGAC-3 [111]. In addition to the role of CBF genes in cold response, vernalization (VRN) genes are responsible for natural differences in frost tolerance in wheat. Changes in the regulatory regions of vernalization genes (VRN1 and VRN3 genes or in the coding regions of VRN2) cause a delay in flowering time in plants. In summary, this body of research has provided targets to address frost tolerance breeding in wheat and genes/loci that could be incorporated in MAS and GS.

Advances in molecular breeding for disease resistance
Food crops, such as wheat, have always been adversely affected by diseases and pests [104]. Genomic technologies available nowadays can be incorporated into wheat breeding program to complement traditional resistance breeding and meet these pressing challenges [114]. Modern breeding strategies are dealing with diverse forms of resistance, including both qualitative and quantitative types of resistance. Qualitative resistance tends to confer complete or nearcomplete resistance encoded by single resistance genes (R-genes) encoding immunoreceptors [nucleotide-binding leucine-rich repeat (NLR) proteins], also known as major genes [114]. Within a single host there may be multiple resistance genes (R-genes) for recognizing different races of a pathogen, and pyramiding these genes within elite varieties is vital for maintaining sustainable resistance [17].
Contrary to this, an incomplete or partial phenotype shown by quantitative disease resistance (QDR) is regulated by multiple genes of small effect encoded by minor genes and map to quantitative trait loci (QTL) [114]. Generating polygenic resistance by incorporating multiple loci with small effect, into one cultivar, using classical breeding strategies can be even more challenging than transferring monogenic resistance [114]. Transgenic or cis-genic approaches can provide a more realistic solution for wheat breeding with durable resistance, as demonstrated for the construction of a five-transgene cassette that confers broad-spectrum resistance to wheat stem rust [115].

Leaf and stem rust
In recent years, high throughput genotyping, QTL mapping and GWAS have commonly used approaches for the identification of resistance loci, R-genes and candidate genes due to their ability to identify significant SNPs controlling specific traits. For example, linkage maps constructed with genotyping data from a 90K SNP array for five doubled haploid populations mapped QTL for leaf rust resistance on chromosome 1A, 2A, 2B, 3B, 2D, 4B, 5A, 6A, 6B, 7A, 7B and 7D that were associated with Lr3, Lr16, Lr17a, Lr23, Lr34/Yr18, Lr72 resistance genes [116]. Similarly, genotyping with 90K array identified two dominant alleles for conditioned resistance to the Ug99 race, also known as TTKSK. Furthermore, two major QTL were mapped on 2BL and 6DS for seedling and adult plant resistance to stem rust [104,106]. The loci for recognition of two Ug99 races, BCCBC and TTKSK, were found to be located at the same location (KASP_IWB1208 marker) on chromosome 2BL and was associated with Sr28 [117].
Recently, the 9K and 90K SNP genotyping arrays and GBS have been utilized by multiple groups for GWAS to dissect the genetic architecture of stem rust resistance [118][119][120][121][122]. For example, genotyping with the 9K array and GWAS identified 12 significant MTAs for resistance to the notorious wheat stem rust (Puccinia graminis f. sp. tritici (Pgt)) [118]. Among them, SNPs found on chromosomes 4A and 4B were co-localized with SrND643, Sr37, and Ug99 resistance genes. Whereas 7DL SNPs coincided with Sr25 and other stem rust resistance genes. These SNPs were found to be located within genes annotated to be regulatory factors, plant disease resistance genes, or metabolic enzymes, six of which were validated to be Ug99 resistance genes by Kompetitive Allele-Specific PCR (KASP) assay. To add to this, the wheat 90K array was used to identify 22,310 significant high-quality SNP markers and several major significant MTAs for resistance against four races of stem rust (Puccinia graminis) that were on 1A, 2B, 3B, 2D, 4A that were co-localized with the Sr6, Sr7a, and Sr9b genes onto 1AL. Most of the genotypes possessed three or more Sr genes (Sr57, Sr12, Sr11, Sr9b, Sr8a, Sr7a, and Sr6) in various combinations [119]. Another GWAS [120] based on 1,411 wheat accessions and 5,390 SNPs, identified significant MTAs for resistance to four races of stem rust (TTTTF, BCCBC, TRTTF, TTKSK or Ug99) ten of which were associated with Sr8a, Sr9h, Sr28, Sr31, Sr36, Sr39, Sr40, Sr47 including three novel loci. Furthermore, GBS data has been used to identify markers for disease resistance with GWAS e.g. a GBS data for 270 wheat accessions identified ∼35,000 high-quality SNPs, of which 32 were significant MTAs (GWAS) for stem rust resistance located mainly in close proximity to the Sr6 gene on chromosome 2D [121]. Similarly, the wheat 90K array, SNPs and GWAS have been used to characterise disease resistance to stem and stripe rust in Ethiopian wheat bread lines [122]. Other candidate genes associated with MTAs were members of the NLR (nucleotide-binding domain leucine-rich repeat) gene family, nuclear monodehydroascorbate reductase 6 (MDAR6), solanesyldiphosphate synthase 1 (DSDS1), enhancer of AG-4 protein 2 (AG4), phosphatase 2C (PP2C), and importin-9 (IPT9) and are listed in Table 5.

Stripe (yellow) rust
Another devastating wheat disease, Puccinia striiformis f. sp. tritici (Pst) fungus mediated stripe rust (or yellow rust) is responsible for significant yield losses worldwide. Sometimes, severe yield losses happen due to the pathogen attack in the early seedling stage and the progress in disease development during the season [123]. The use of major race-specific R-genes in wheat varieties is an effective and environmentally safe way for wheat disease management.
Wheat genotyping assays and GBS has been utilized in recent years to map genomic regions associated with stripe rust resistance [124,125]. The advances in wheat genomics have facilitated the cloning of nine stripe rust resistance genes (Yr5, Yr7, YrSP, Yr15, Yr18/Lr34, Yr36, Yr46, YrAS2388 and YrU1) out of the >80 genes that have been identified and mapped so far in different genetic backgrounds of wheat [126]. The cloning of the broad-spectrum stripe rust R-gene Yr15, derived from wild emmer wheat, have led to the discovery of a novel protein family, the Tandem kinase-pseudokinases (TKPs), that emerged as a new class of disease resistance protein family providing plant innate immunity, that is present not only in wheat but also across the whole plant kingdom [127]. Five plant disease resistance genes have been identified so far to contain a structure with tandem kinase domains including three wheat genes i.e., the wheat stripe rust R-gene Yr15 (WTK1) [127,128], wheat stem rust R-gene Sr60 (WTK2) [129] , and the wheat powdery mildew (Blumeria graminis f. sp. tritici (Bgt)) R-gene Pm24 (WTK3) [130]. More than 20 WTK copies have been found to be scattered across the three wheat genomes -AA, BB and DD, including the orthologous group in chr 1 and the paralogous groups on chr 6 [131]. WTK1 orthologs, paralogs, and homologs were found also in the diploid wheat relatives, Triticum urartu (AA), Aegilops speltoides (SS) and Aegilops tauschii (DD), representing the ancestral A, B, and D genomes, respectively, as well as in rye (Secale cereale), barley (Hordeum vulgare) and other cereal species. The protein sequences of TKPs were obtained from the genome assemblies of wild and cultivated wheat species and were used for phylogenetic analyses [127]. Furthermore, it is important for successful deployment of R-genes in wheat breeding programs to identify if a cloned gene differ from other genes localized in the same chromosome region or may represent different alleles of the same gene. For instance, it was found that Yr15, YrG303-, and YrH52-mediated resistances to yellow rust are encoded by a Wtk1 as a single locus [128]. In the future, we expect that many more such cases will be revealed and narrow down the list of temporarily designated R-genes in wheat.
Various studies have used GWAS to identify powdery mildew resistance loci in wheat [134][135][136]. Liu  Ten Pm genes have been cloned so far, alongside with advances in wheat genomics resources (Table 5). Pm3/Pm8, Pm2, Pm21, Pm60, Pm5e, Pm41 and Pm1a encode NLR immune receptors from different wheat relatives [137][138][139][140][141][142][143], while a tandem kinase protein encoded by Pm24 [130]. Furthermore, two non-NLR genes -Pm38 and Pm46, showed broad-spectrum multi adult plant resistance to powdery mildew and rust diseases. An ABC transporter is encoded by the former (Lr34/Yr18/Sr57/Pm38 multi-resistance gene) [144] and a hexose transporter is encoded by the latter (Lr67/Yr46/Sr55/Pm46 multi-resistance gene) gene [145]. The cloning of these Pm genes enables the development of high-throughput diagnostic functional markers that can be used in MAS resistance breeding programs ( Table 5). Some of these Pm genes have been wieldy used for the protection of wheat cultivars for many years. For example, Triticeae grass Dasypyrum villosum (2n = 2x = 14, VV) harbor Pm21 which confers broad-spectrum resistance, and it has been transferred into wheat cultivars (T6AL.6VS wheat -D. villosum translocation line) in China since 1995 [146]. Some of these NLR proteins could be overcome by the fast evolution of virulent Bgt isolates, especially when the gene is widely deployed in wheat fields. For example, wheat-rye (Secale cereale L.) T1BL·1RS translocation carrying Pm8 has lost the resistance function in wheat production [147]Different alleles of those cloned Pm genes that might be resistant to different Bgt isolates have been identified and could also be used for MAS. For example, 17 alleles of the Pm3 gene have been identified mediating resistance to distinct race spectra of Bgt. Pm3a has a range of resistance that fully encompasses that of Pm3f, but also extends to additional races [148]. Therefore, enriching the Pm gene pools is very important for resistance breeding. Pm24 is a rare natural allele of tandem kinase protein (TKP) with putative kinase-pseudokinase domains, conferring broad-spectrum resistance to wheat powdery mildew disease. However, there are some other Pm genes that were not cloned yet, such as Pm30, found in ~80% of Chinese cultivars, as detected by closely linked-markers [149]. The absence of functional molecular markers is limiting the diagnosis of the potential Pm alleles and their deployment in wheat breeding via MAS and genome editing.

Blotches and Fusarium head blight
Furthermore, GWAS has applied as well for dissecting the genetic bases for various diseases. For example, a GBS of 273 wheat accessions identified 19,992 SNPs, and ten significant MTAs for Fusarium head blight (Fhb) on chromosomes 1D, 3B, 4A, 4D, 6A, 7A, and 7D were mapped [150]. The combination of these favorable alleles in genotypes caused reduced deoxynivalenol concentration, incidence, and severity of the disease, while several favorable SNPs were linked to the previously map-based cloned Fhb1 gene on chromosome 3B [150]. Similarly, genomic regions for black point reaction [151] were associated with candidate genes for F-box repeat, Polyphenol oxidase (PPO-A1), RPP8-like, Serine/threonine-protein kinase, Peroxisomal biogenesis factor 2 (PEX2). Besides, a wheat 9K SNP array was used for genotyping and a subsequent GWAS identified significant MTAs for resistance to bacterial leaf streak (caused by Xanthomonas translucens), leaf spot blotch (caused by Cochliobolus sativus) and Stagonospora nodorum blotch diseases and stripe rust, respectively [24,124,152,153] including both novel and known genes of resistance as further detailed in Table 5.

Viral diseases
Soil-borne yellow mosaic-inducing virus diseases ( Fig.1) seriously threaten the global production of autumn-sowing wheat. The diseases are caused by infection of soil-borne plasmodiophorid Polymyxa graminis L. under natural field conditions. The yellow mosaicinducing virus diseases are infected by the bipartite viruses belonging to two genera, the Bymovirus of the Potyviridae and the Furovirus of the Virgaviridae, and usually cause mixed infections [154]. Due to transmission by the P. graminis, which has been detected down to a soil depth of 60 cm, chemical measures are neither effective nor acceptable for economic and ecological reasons; therefore, the only possibility of controlling these viruses is through breeding resistant or tolerant cultivars. Soil-borne wheat mosaic virus (SBWMV) and Soilborne cereal mosaic virus (SBCMV) as well as Wheat spindle streak mosaic virus (WSSMV) belonging to the soil-borne Furoviruses or Bymoviruses, respectively, are serious constraints to winter wheat cultivation in Europe and North America [154,155] while in Asia, Bymovirus Wheat Yellow Mosaic Virus (WYMV), are major constraints to winter wheat cultivation. In bread wheat, the genetic analysis of Furovirus resistance has found one major locus: Sbm1, on chromosome 5D and was shown to be effective against SBCMV in Europe [156]. The same major QTL QSbm.uga-5DL was identified in all environments with very significant LOD values, explaining up to 62 and 65 % of the total variation. This locus QSbm.uga-5DL coincided with previously reported SBCMV resistance genes Sbm1, SbmClaire and SbmTrémie on the long arm of chromosome 5D [157]. Additionally, two major WYMV resistance QTL, Qym1 and Qym2 were mapped on wheat chromosome 2D [158].
Two independent GWAS analyses utilizing iSelect 9K and 90K Illumina arrays have reported SNPs and genes for SBWMV resistance [159,160]. Liu et al. [160] completed a GWAS analysis of SBWMV resistance using the 90K Illumina array. Thirty-five SNPs in 12 wheat genes and one intergenic SNP in the Sbwm1 region were identified on chromosome 5D that was associated significantly with SBWMV resistance. Resistance to SBWMV was strongly associated with putative kinase family protein [159]. Furthermore, GWAS analysis identified major resistance SNPs for WSSMV on chromosome 2D in addition to regions on 5B and 7D. The 2D genomic region was linked with 18 candidate genes including eleven NBS-LRR genes [161], and are listed in Table 5. Finally, fine-mapping and leveraging of available wheat pan-genome datasets together with TILLING resources have recently allowed the identification of the gene likely underlying Sm1-mediated resistance to the wheat insect pest orange wheat blossom midge (OWBM, Sitodiplosis mosellana Géhin). The gene Sm1 likely encodes a canonical NLR with kinase and major sperm protein integrated domains [7].

Cloning of multiple diseases resistance genes
Examples of the positional or map-based cloning of disease resistance-related genes in wheat are arguably more common than for non-disease resistance traitspresumably due to the genefor-gene interaction of such major resistance genes with specific avirulent factors in the pathogen or other major effects. These include genes such as Lr21 [162], Yr36 [163], Yr15 [127], YrU1 [126], Fhb1 [164], Fhb7 [165], SuSr-D1 [166], Yr7, and Yr5/YrSP [167] have been cloned in wheat to improve the resistance against t leaf and yellow rusts in addition to Fusarium head blight diseases (  Table 5. Access to a high-quality genome reference of wheat (IWGSC RefSeq v1.0) has also enabled researchers to explore susceptibility factors that may contribute to the onset of disease. For instance, Henningsen et al [176] explored the upregulation of wheat genes with orthology to known susceptibility factors in other plant species in response to the stem rust fungus to hypothesise genes that may play a conserved role in susceptibility. Similarly, a recent study by Corredor-Moreno et al. [177] also guided by gene expression analysis and the genome reference of wheat (IWGSC RefSeq v1.0) showed that the branched-chain amino acid aminotransferase gene TaBCAT1 contributes to susceptibility to both stripe and stem rust. The authors from both studies suggest that manipulation of susceptibility genes can result in novel strategies to control the disease.

Advances in molecular breeding for insect resistance
There are a number of arthropod pests that can cause major yield losses in wheat. Efforts to identify resistance genes to insects lag behind those for other diseases, but with increased awareness and demand for sustainable wheat production methods, this field is accelerating in recent years. The insects discussed here such as the Russian wheat aphid (Diuraphis noxia), the greenbug (Schizaphis graminum) and the Hessian fly (Mayetiola destructor) where single dominant resistance genes have been found to confer resistance, whereas for the bird cherry-oat aphid (Rhopalosiphum padi) and the English grain aphid (Sitobion avenae) where good resistance has been harder to identify and seems to be under the control of multiple genes. Methods for phenotyping for insect resistance vary depending on the type of insect, clear visual symptoms, such as for D. noxia, S. graminum and M. destructor are associated with a larger number of genes identified, whereas those that cause damage to yield and quality without leaving visual symptoms and where phenotyping is more difficult like R. padi and S. avenae have fewer resistance sources identified [178].
Hessian fly (Mayetiola destructor) is a major pest in wheat in the USA which has been successfully controlled using singly deployed H genes for over 50 years; 100% of M. destructor larvae die before causing damage in resistant germplasm, which is biotype dependant. Over 35 H genes have been identified, but none of those has been cloned [179].
For the Russian wheat aphid (D. noxia), Dn resistance protects yield by maintaining chlorophyll functionality, whereas susceptible plants react to infestation with chlorotic streaks, leaf rolling and stunting. A number of Dn genes are reported; Dn1, Dn2, Dn4, Dn5, Dn6, Dn8 and Dn9 are thought to be dominant and derive from T. aestivum, whereas the recessive dn3 derives from Ae. tauschii, and Dn7 came for a rye translocation [180]. A recent review suggests however that considering the location of the majority of the Dn genes near the centromere of 7D and the difficulty in identification of diagnostic markers for the genes, the resistance may be instead be controlled by closely linked genes or QTL influenced by the genetic background they occur in [180]. The deployment of resistance genes needs to take biotype development in D. noxia into account, five biotypes are known from the USA and four from S-Africa [181].
The greenbug (S. graminum) is a serious aphid pest that occurs in all major wheat growing regions apart from Australia. It causes necrotic lesions on plants, is a vector for the barley yellow dwarf virus and causes indirect damage by acting as a winged transport mechanism for the wheat curl mite (Aceria tosichella) that vectors the wheat streak mosaic virus. The development of the first resistant wheat cultivars started in the 1950's, and now there are >15 resistance genes reported. A number of greenbug biotypes have been identified whose distribution needs to inform the deployment of the resistance genes [181,182]. Both genomic regions and candidate genes identified for the resistance against different classes of wheat pest are detailed in Table 6.
Wheat resistance to the barley yellow dwarf virus vectoring bird cherry-oat aphid (R. padi) and the English grain aphid (S. avenae) has been elusive, as effective high throughput phenotyping methods have been lacking for these species, but resistance has been identified in wheat wild relatives from the primary and secondary gene pool, more commonly in species with low ploidy levels such as Triticum boeoticum, Aegilops tauschii. Triticum araraticum, T. araraticum and Triticum dicoccoides.
Many other insect pests are of importance in wheat production. The resistance gene Sm1, which acts against the orange wheat blossom midge (Sitodiplosis mosellana) has been deployed in Canada and Europe and four resistance genes (Cmc1-4) have been identified against the aforementioned wheat curl mite (A. tosichella) [183]. Other pests that are of concern to wheat production include the Sunn pest (Eurygaster integriceps), wheat stem sawfly (Cephus cinctus) (WSS) and the yellow wheat blossom midge (Contarinia tritici). Hence there is no shortage of projects for insect resistance breeding programs in future, which will need to focus both on the discovery of new resistance traits and the maintenance of previously developed insect resistant genotypes [184]. For instance, the use of solid stemmed cultivars has been the primary strategy against WSS until recently. Advances in the availability of genomics tools and resources first permitted a comparative analysis of this QTL in wheat and related grasses and additionally provided a global view of the plant response upon WSS manifestation at the transcriptome, proteome and metabolome levels [185]. Another study explored the WSS transcriptome and its interaction with the regulatory elements, microRNAs (miRNAs) and long non-coding RNAs (lncRNAs). Interestingly, this study found that WSS miRNAs may target wheat transcripts and vice versa, thereby potentially modulating the plant responses against WSS [186]. Finally, the solid-stemness trait was linked to the copy number variation of a putative Dof Transcription Factor (TdDof) within the 3BL QTL, through the use of high-throughput sequencing in different genetic backgrounds. Transgenic lines over-expressing TdDof firmly established that increased expression of TdDof was responsible for solid-stemness, likely through regulation of programmed cell death in pith parenchyma cells [187]. Similarly, genome sequencing in resistant and susceptible cultivars revealed a candidate gene in Sm1 locus that is known to confer resistance to OWBM. This time knock-out mutant lines demonstrated that mutations within this gene resulted in susceptibility against OWBM. The candidate gene contains NB-ARC and LRR motifs, in addition to a serine/threonine (S/T) kinase that is similar to those found in rust resistance proteins, and a major sperm protein (MSP) domain [15].

Advances in molecular breeding for end-use quality traits
Wheat grain markets and food industries demand not only high yielding and resistant varieties but also those with specific end-use qualities. End-use quality is, therefore, an important focus in breeding programs. Methods, for testing quality, however, require large amounts of grain and are time-consuming and costly. Significant efforts have been made to identify QTL linked to various end-use quality traits such as grain protein content (GPC), dough rheological properties and baking quality (Table 7). Several comprehensive analyses [195][196][197][198] of mapping several quality traits related to protein and starch were conducted. Sun et al. [195] analysed GPC, flour protein content (FPC), grain glutenin macropolymer content, wet gluten content (WGC), dry gluten content (DGC), Zeleny sedimentation volume, flour-water absorption (FWA), dough development time (DDT), mixing tolerance index and flour paste viscosity (Table 7). They identified 30 QTL for starch traits and 15 QTL for protein traits, with QTL clusters for starch and protein traits located on chromosomes 3D, 6B and 7B and 1D and 3B, respectively. Raman et al. [196] analysed GPC, milling yield, FPC, flour color, FWA, DDT, dough strength (DS) and dough extensibility (DE) and found several QTL associated with DS, DE, DDT, and FWA close to the glutenin Glu-B1 locus on chromosome 1B. Simons et al. [197] analyzed 20 end-use quality traits including six grain, seven milling and flour, four dough mixing strength and three bread-making traits. They found that the 1DL QTL cluster containing Glu-D1 had a large genetic influence on dough mixing strength and bread-making performance. Furthermore, two QTL clusters located on chromosomes 3B and 4D associated with several milling and baking quality traits [198] that were associated with the Wx-B1, Glu-B1 and Glu-D1 genes.
Specific attributes of starch contribute unique properties to certain wheat breeding lines and the genome level characterization of one such property, udon noodle quality, as detailed in [5]. The gene Granule Bound Starch Synthase (GBSS; TraesCS4A01G418200) on chromosome 4A is absent from some lines and it is the null allele for GBSS-4AL (Wx-B1b) that associates with udon noodle quality. In 3.9% of a set of 644 (hexaploid) wheat varieties and landraces assessed using 10 SNPs identified from snapshot exome sequence data indicated significant sections of TraesCS4A01G418200 in these lines was absent. The specific deletions within the GBSS-4AL gene mean that the respective lines provide new sources of germplasm for wheat breeding. It would not be expected to show detrimental effects due to the deletion of adjoining gene models and thus perform successfully at the agronomic level to satisfy the high-value commercial udon noodle market. Other attributes of starch such as a high amylose content for an improved source of fibre in the diet can now be introduced into commercial wheat lines.
Importantly, a metaQTL analysis [47] identified stable QTL by combining 27 quantitative genetic studies with four genetic maps and located 73 and 82 QTL for baking quality and GPCrelated traits, respectively, on the consensus map. They reported 8 metaQTL for baking quality and 6 for GPC. The most precise metaQTL having the smallest confidence intervals were located on chromosomes 3D (3.78 cM) for baking quality and chromosome 2B (5.83 cM) for GPC. The candidate genes identified are listed in Table 7.
Recently, high density SNP arrays and GBS have also been utilized to identify QTL for breadmaking quality using bi-parental populations [27,199,200]. These high-resolution genetic maps helped to precisely identify the major QTL and candidate genes that are a valuable resource for MAS and genomic selection in wheat. Guo et al. [199] [27]. Using the power of GBS, Boehm Jr. et al. [200] identified co-localizing QTL for multiple end-use quality traits (GPC, FWA and flour yield) on chromosomes 1B, 2D, 7A and 7B showing allelic variation for the glutenin genes Glu-A1, Glu-B1, Glu-A3, Glu-B3 and Glu-D3.
In contrast to agronomic traits, dissection of quality traits in wheat by GWAS has received less attention. However, in the past two years, a few studies have reported quality traits dissection by association mapping approaches. These studies used high density SNP arrays and identified candidate genes (Table 7) in some cases [134,[201][202][203]. Chen et al. [202], for example, used the 90K genotyping assay to identify MTAs for grain hardness, GPC, WGC and flour color and reported 103multi-environment-significant SNPs in more than four environments. Further, they reported that a disease resistance RPP13-like protein 1 gene, TaRPP13L1, was associated with flour color. Wheat lines with TaRPP13L1-B1a showed significantly higher flour redness than those with TaRPP13L1-B1b in Chinese wheat. Other candidate genes included TaRPP13L1-7B, TaRPP13L1-7D, MCM3, Pinb, SBE1, Psy, RPL1, SPY and STK. Similarly, Bhatta et al. [134] explored synthetic and bread wheat accessions from Western Siberia for GPC and agronomic traits by GWAS using 192,876 GBS-SNPs, and based on MTA on chromosome 2D, identified two candidate genes (TraesCS2D01G582500.1 and TraesCS2D01G264700.1) based on MTAs identified on chromosome 2D for GPC. Gene model TraesCS2D01G582500.1 has a putative kinase function while TraesCS2D01G264700.1 was annotated as a member of the NBS-LRR disease resistance proteins (NLRs) family. Yang et al. [201] conducted a highresolution multilocus-GWAS combined with gene network analysis of grain quality and dough rheological traits based on 19,254 SNPs genotyped in 267 bread wheat accessions. In that study, sixty-seven core candidate genes involved in protein/sugar synthesis, histone modification and the regulation of transcription factor were reported to be associated with the grain quality.
Furthermore, another GWAS study [203] identified MTAs for grain hardness, GPC and flour sedimentation on chromosomes 3A, 3D, 4A, 4B, 4D, 5B, 5D, 6B and 7B that were associated with candidate genes NAM-B1 and Pinb-D1. Furthermore, eight SNP markers for early identification of high molecular weight glutenin subunits (HMW-GSs) were reported [204] that can be utilized in applied breeding. It is important to note that although immense efforts were invested in mapping and targeting genes for grain quality, only one gene, NAM-B1 (Gpc-B1) [205] has been map-based cloned.

Current and potential methods to identify and clone genes in wheat
Dissecting the genetic and molecular mechanisms regulating grain yield and growth is key for wheat breeding and improvement. Positional cloning or map-based cloning, traditional forward genetic tool, it has been widely used to clone genes regulating traits of interest in wheat, e.g., VRN1 [49], Gpc-B1 [205] and Lr21 [162]. Map-based gene cloning, however, usually needs multiple steps such as generating mapping populations, fine mapping to narrow the target region to identify genetic markers co-segregating with the phenotype, screening candidate gene(s) and gene(s) identification by sequencing. This process often requires more time and is labour intensive, especially in wheat. Hence a limited number of positional cloning studies have been successfully undertaken.
Bulked segregant analysis (BSA) was recommended as a shortcut to identify the linkage of molecular markers with the phenotype and has been used extensively to map loci that have major-effect [206]. In this analysis, DNA of each individuals showing extreme phenotypes in a segregating population (i.e.F2) are bulked and genotyped including their parents with molecular markers [207]. Any marker is found to be linked with the trait if it shows the same allele in the bulk and parent of a similar phenotype. Recently, with the great advances of NGS technologies, several BSA-based modifications have been developed to identify major-effect QTL regulating quantitative traits. These modifications are based on whole-genome resequencing bulks in a large population to reduce the cost of genotyping, time spent and increase statistical power of analysis [207,208].
In crops with large genomes such as wheat, complexity reduction is very important to identify and clone target genes more quickly and efficiently. QTL-seq is one such approach that combines the potential of BSA. The power of high-throughput whole-genome resequencing to identify genomic regions show contrasting results of an SNP index in the two bulk populations (each with 20-50 individuals) showing extreme phenotypes [209]. Recently QTL-seq was used in bread wheat to identify the candidate genomic region tightly linked to the awning inhibitor loci and diagnostic markers were designed to understand the role of QTL in the awnless trait formation [210]. Moreover, this approach was applied to identify loci involved in the tiller angle in bread wheat which represent an important factor influencing yield. Also, in this case, functional markers for MAS were developed and validated [211]. Multiple QTL-seq (mQTL-seq), derived from QTL-seq, where QTL-seq is used for several mapping populations from crosses with at least one common parent [212]. The use of multiple mapping populations with a broad genetic diversity was critical for the validation of QTL, along with narrowing down the detected QTL. To date, however, this technique has not yet been used in wheat.
The cost of whole-genome resequencing will increase with larger genome sizes and with those having a high percentage of repetitive DNA as in wheat. In that case, bulk segregant RNA sequencing (BSR-seq) can be an alternative strategy that identifies the eQTL (expression QTL) regions and generates data of gene expression at all of the genomic loci [213]. The differential expression of genes in two bulks can be used to identify candidate genes responsible for the favourable phenotype. BSR-seq has been used successfully for mapping of stripe rust-resistant loci YrMM58 and YrHY1 on chromosome 2AS [214], Yr15 on chromosome 1BS [215] and leaf senescence gene els1 on chromosome 2BS [216] in segregating wheat biparental populations. Similarly, BSR-seq enabled fine-mapping of a locus controlling grain protein content (GPC) in wheat (GPC-B1) to 0.4 cM from the previously reported interval of 30 cM [213]. This study pinpointed candidate genes (13-18 genes) for GPC in wheat.
NGS platforms have also accelerated the identification and cloning of the genes in mutant collections. The TEnSeq pipelines are examples of the advances that have allowed for rapid gene cloning identification as reviewed recently [217]. The recently developed tool, MutChromSeq (Mutagenesis Chromosome flow sorting and short-read Sequencing) [166,218], is based on mutagenesis followed by flow sorting of chromosomes and their subsequent sequencing to identify the induced mutations. This rapid cloning approach was successfully described to clone the powdery mildew resistance locus Pm2 in wheat [218]. MutChromSeq has the advantage that it does not rely on an assumption that the resistance gene belongs to the NLR class (as for other approaches, see below). Hence it would be appropriate for the identification of non-immune mediated resistance genes. The most recent application is the cloning of Med15 encoded by SuSr-D1, a suppressor gene of stem rust resistance [166] from the wheat cultivar 'Canthatch'.
A similar approach to MutChromSeq, which does not require mutagenesis, is the target chromosome-based cloning (TACCA) method. TACCA uses of flow-sorted of chromosomes, next-generation sequencing, and cultivar specific de novo assembly. Using this approach, Lr22a, broad-spectrum leaf-rust resistance locus, was cloned in wheat. Two SSR markers flanking Lr22a covering 0.48 cM interval on chromosome 2D were mapped previously. Sorting the chromosome 2D followed by sequencing, and identification of genes were performed within four months [219].
Other cloning strategies such as MutMap (mutational mapping) involve mutagenesis, sequencing, and mapping to identify SNPs between wild-type and homozygous mutants and then zero in on the region which the gene of interest. This approach was considered to be applicable only in crops with small genomes, it was, however, utilized successfully to map and clone Ms1 from bread wheat using F2 plants derived from the heterozygous ms1e mutants [220]. MutMap will be less efficient at identifying the causal mutation, however, if the wild-type reference genome has gaps at the position of the causal mutation. Thus, de novo assembly of the wild-type genome is used in MutMap-Gap [221] and could be used in wheat in the future.
Finally, MutRenSeq is a fast gene cloning tool for the isolation of nucleotide-binding and leucine-rich repeats (NLR) genes [168]. Chemical mutagenesis, exome capture, and sequencing are required. Most resistance genes encode proteins with NLRs. Hence exome capture is necessary to enrich the NLR-specific bait library and then sequencing of the resistant wild-type parent and susceptible loss-of-function mutants is required last step. The mutant reads are aligned with reads from the parent. To clone two fungal stem rust resistance genes (Sr22 and Sr45) and three yellow rust genes (Yr7, Yr5/YrSP) from bread wheat [167,168], this method was utilized. If this method does not require positional fine mapping and can be applied to isolate NLR-type resistance genes from most crops and their wild relatives, two major limitations must be highlighted. Firstly the design of oligonucleotide baits is based on a reference genome sequence, but considering the large-scale presence/absence variations among different accessions, the recent release of pan-genome in wheat is the ideal reference on which to design baits [15]. Secondly, this approach is limited to isolating only resistance genes encoding NLR proteins, meaning that genes that do not belong to the NLR family are missed [222]. However, it is possible to add capture baits targeting other classes of genes thought a priori to likely be involved in disease resistance, such as wall-associated kinases.
Unlike map-based cloning and MutRenSeq, the AgRenSeq (Association genetics with R-gene enrichment sequencing) method has been developed to align with the GWAS platforms (to utilize genome-wide natural variation). RenSeq thereby eliminating the need to does depend on bi-parental mapping populations or mutagenesis. This approach was demonstrated successfully to clone the R gene, Sr46 and to identify the candidate gene sequence for SrTA1662 using a diverse panel of Aegilops tauschii ssp. strangulata [171]. It explores a pool of diverse wild relatives carrying many resistance genes and, as a result, enables the cloning of multiple genes at the same time [222]. These described strategies have many advantages over traditional marker-based mapping. In addition to taking much less time, these approaches find genes or functional nucleotides/haplotypes responsible for a given agronomic trait. In several cases, genes were rapidly cloned and diagnostic markers were developed. All these approaches will have certainly a uniquely high power to speed crop breeding. The benefits of cloning genes in wheat, particularly those with a role in disease management, have been shown recently [115].
As demonstrated by their study, the construction of transgene cassettes can simplify breeding bottlenecks associated with the deployment of multiple genes to be inherited as a single unit. In this case study, a gene cassette containing five previously cloned R genes (Sr45, Sr50, Sr55, Sr22, Sr35) provides high levels of resistance to stem rust, suggesting this could be a viable solution to confer durable multi-pathogen resistance. The emergence of novel fungi based cloning and other advanced cloning methods for wheat and pan-genome availability means that we expect more cloning accomplishments in near future.

Genomic selection (GS) in wheat to improve complex traits
The relatively recent availability of large numbers of genome-wide molecular markers in wheat genetic resources has led to the application of an alternative marker-assisted approach for wheat genetic improvement to be appliedthe genomic selection [223,224]. For a long time, the lack of high-density markers was a major hindrance in conducting the in-depth genetic and genomic analysis. GS is an advanced form of MAS wherein genome-wide markers are used to calculate genomic estimated breeding values (GEBVs) [223]. Rather than explicitly identifying and tracking markers associated with genetic loci controlling a given trait, GS aims to use large numbers of genome-wide markers, in conjunction with phenotypic data collected in a collection of lines/varieties (termed the 'training set') to establish parameters that allow forward selection of the progenies derived from the training set over multiple forward generations in the absence of additional phenotyping [223,224]. This allows selection potentially to potentially be applied faster and at higher intensities as more lines can be incorporated and advanced to subsequent generations without the need for time-consuming phenotyping steps. Additionally, the advancement in statistical methods to deal with high density marker data for genomic selection has been equally important to the development of GS in wheat.
Since its first use in 2006 [225], GS has been used extensively in wheat for a plethora of traits with different architectures including grain yield [75,[226][227][228][229], resistances to different diseases as rusts, Fusarium head blight, Stagonospora nodorum blotch, Septoria tritici blotch and tan spot resistance [230][231][232], macro-and-micronutrients [233] and end-use quality traits [234,235]. Application of GS for hybrid prediction has also been investigated in wheat [236,237] pointing to the challenges when predicting hybrids derived from untested parents [238]. Using a more refined Genotype x Environment (GxE) interaction-based GS models promises to reduce partly this shortcoming [239]. The optimisation of the underlying factors on which genomic selection relies, such as marker density, predictive models, training population size, and the relationship between training and validation population sets is ongoing [240].
Recent investigations have focussed on the optimization of GS in genetic resources in order to harness new diversity from wheat gene banks. Crossa et al. [228] investigated GS models to predict days to heading and days to maturity on a large set of wheat landrace accessions (8,416 Mexican landrace accessions and 2,403 Iranian landrace accessions) from the CIMMYT gene bank using two strategies. The first strategy involved random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80). In the second strategy, two types of core sets called "diversity" and "prediction", including 10% and 20%, respectively, of the total collections were used. Prediction accuracy of the 20% diversity core set was close to accuracies obtained for 20% training and 80% testing set (0.412 to 0.654 and 0.182 to 0.647 for Mexican landraces and Iranian landraces, respectively). For traits controlled by a mix of a few major and many minor genes, it can be beneficial to include pre-existing knowledge on known candidate genes to increase the accuracy of genome-wide prediction [241]. The potential of such an approach has been demonstrated when predicting flowering time and plant height for the wheat [242]. These results suggested a way forward for parental selection in pre-breeding by predicting the value of all genotyped accessions in a gene bank, followed by pre-breeding programs based on those genotypes that have the highest predicted value or harbor promising novel candidate genes or alleles. Once promising parents are identified, efficient pre-breeding programs need to be designed. This is a non-trivial task and depends mainly on the performance of the plant genetic resources. Some genetic resources have a performance that makes them directly useful as parents for breeding new varieties. However, this tends to be the exception and usually the genetic resources have poor overall performance. As a result, a breeding program based directly by crossing between genetic resources has little chance of producing offspring with enough quality to become varieties. Genetic resources must therefore be improved through appropriate pre-breeding programs to the point where they can be used as parents in breeding programs. Two complementary approaches can be used: pre-breeding of populations created from genetic resources or pre-breeding of populations created from crosses between plant genetic resources and elite materials. In both cases, genomic prediction enables rapid selection gain, but requires the presence of extensive training populations related to the base population. Crossing parents selected from the agronomic and physiological screening of the genetic resources and advancing through generations using high throughput phenotyping of physiological parameters is another approach [243]. The approach is complementary with the genomic selection of progeny since many of the complex physiological traits -that may have been used in strategic crossing-do not lend themselves to high throughput progeny screening [243].
Substantial efforts have shifted lately to the development of high-throughput phenotyping platforms in wheat, which are being used to measure different traits including plant height [244], disease resistance [245], growth rate [244], and nitrogen deficiency [245]. These significant advancements in high-throughput phenotyping have brought a paradigm shift in breeding strategies. Wheat scientists have incorporated high-throughput phenotyping data in GS models to explore their potential in improving prediction accuracies for complex traits [227,246]. Rutkoski et al. [227] investigated the role of canopy temperature and green and red NDVI as secondary traits in GS models for improving prediction accuracy for grain yield. The authors observed 67% improvement in prediction accuracy without correcting for days to heading (DTH) and 37% improvement upon correction with DTH. Crain et al. [246] used over 1.1 million phenotypic data points generated by high-throughput phenotyping on 1,170 advanced CIMMYT lines in drought and heat stress environments and observed an increase in prediction accuracy from 7 to 33% compared to the standard univariate model.
Genomic selection has revolutionised animal breeding and will likely be a major source of genetic improvement of crops including wheat over the coming decade [247]. We envisage that the incorporation of additional data-types and technologies into GS pipelines will open opportunities for further gains to be made. For example, increasing the precision of phenotypic characterisation in the training set via high-throughput phenotyping platforms [227,[244][245][246], as well as the incorporation of environmental covariates [248], may lead to improved prediction accuracies. Similarly, incorporating additional molecular or 'omics' data may further refine prediction equations. These include molecular data tagging functionally validated alleles, whether they be natural variants or novel alleles generated via technologies such as gene editing or TILLING [249,250], or the integration of transcriptomic and metabolomic data with molecular markers-as has been reported in maize [251]. Ultimately, combining these approaches with methods for the shortening of wheat generation times will further increase selection intensity. Currently, 'speed breeding' methodology, whereby plants are grown under extended photoperiods via the use of supplementary lighting, allows spring wheat generation time to be reduced from 4 months to around 2 months [252]. Any further dramatic shortening of cycling times would require the development of new approaches, such as the generation of recombinant individuals by in vitro production of gametes and their subsequent fusion [253]. Combining such in vitro generation cycling with genomic selection methodologies may well represent an achievable medium-term step-change in genomics-informed breeding.

CRISPR/Cas9 mediated genome editing for wheat improvement
The availability of complete genome assemblies of diverse wheat genotypes, originating from different parts of the world is much essential to identify and characterize the functions of various wheat genes at different growth stages and environmental conditions at the whole genome level. Furthermore, transcriptomic analyses help to identify the genes and gene networks that regulate traits in different conditions. Therefore, knock out, knock-in, or activation of such genes through clustered regularly interspaced short palindromic repeats (CRISPR)/ CRISPR associated protein 9 (Cas9) gene-editing system provides unique opportunities for wheat genetic improvement [250,254,255]. Because of the very large genome size (~17 Gb), orthologous gene copies present in the polyploidy genome, and the presence of many repetitive sequences, genetic manipulation through CRISPR/Cas9 mediated gene editing system is much troublesome in bread and durum wheat. Using wheat cell suspension cultures which led to InDel mutations, an attempt was taken to use the CRISPR/Cas9 method for specific gene modification in wheat inositol oxygenase (inox) and phytoene desaturase (pds) genes [256]. The first successful application of CRISPR/Cas9 to generate wheat knockout lines having three homoeoalleles of powdery mildew resistance locus O gene (TaMLO) by a transient protoplast expression system was done independently by Shan et al. [257] and Wang et al. [258]. The later successfully applied the CRISPR/Cas9 system in bread wheat for the generation of plants mutated in a single TaMLO-A1 allele with increased resistance to powdery mildew. A similar strategy has been used to knockout drought-responsive transcription factors in wheat like dehydration-responsive element-binding protein 2 (TaDREB2) and ethylene-responsive factor 3 (TaERF3) for improved drought signalling [250,259].
Furthermore, knockout of all three homoeoalleles of TaGW2 through CRISPR/Cas9 system increased the thousand kernel weight (TKW) and seed size [260] implying the utility of the system for crop improvement. Such CRISPR-generated lines can either be released as a variety or can be used as germplasm. Moreover, recent advances in editing allow simultaneous multiple-gene targeting or genome multiplexing [261], opening new horizons for the employment of CRISPR/Cas9 in polyploid wheat carrying many homoeologous and paralogous copies of the same gene, such as α-gliadins. In another study, CRISPR/Cas9 mediated mutation in 35 out of 45 α-gliadin genes, genes controlling the gluten content in wheat, generated transgene-free, low-gluten wheat without any off-target mutations [262]. Using this genome multiplexing by CRISPR/Cas9, three genes, viz., TaGW2 (grain traits negative regulator), TaMLO (resistance to powdery mildew), and TaLpx-1 (lipoxygenase; offers resistance to Fusarium graminearum) were targeted [260]. The first application of zinc finger nuclease (ZFN)-mediated, non-homologous end joining (NHEJ)-directed loss-of-function gene knockout of acetohydroxyacid synthase (AHAS) in allohexaploid bread wheat through a supplied DNA repair template resulted in resistance to imidazolinone herbicides due to an amino acid change in the target gene coding sequence [263]. Efficient and novel ribonucleoprotein-based (RNP) CRISPR/Cas9 genome editing procedures that required only 7-9 weeks were developed, with no off-target mutations and no transgene integration implying the efficiency of the system [264].
This is important also for pre-breedingto reduce the time to transfer the beneficial allelesand to increase the success rate. The idea is to directly induce/ modify the alleles to beneficial ones in elite wheat germplasm efficiently and quickly. CRISPR/Cas9 mediated permanent genome integration results in a stable expression of CRISPR/Cas9, however, the RNP-based biolistic delivery offers a transient expression of CRISPR/Cas9, and its rapid degradation which controls the off-target mutations [264] and thus RNP-based gene editing has been successfully applied for the gene-editing in bread wheat [265]. Similarly, knockout of three homologs of wheat enhanced disease resistance1, TaEDR1, a negative regulator of the defense response against powdery mildew, conferred resistance against powdery mildew without any off-target mutations [266]. DNA-virus [e.g. Geminivirus, i.e., wheat dwarf virus (WDV)] based amplicons were later identified as an efficient construct delivery method for gene editing with an enhanced CRISPR/Cas9 expression compared to the ubiquitin reference gene and proposed that it could be a potential tool CRISPR mediated genome editing in wheat [267]. Moreover, CRISPR/Cas9 has been successfully applied by generating heritable targeted mutations in wheat male sterility 1 gene (Ms1) responsible for the complete male sterility in commercial wheat cultivars, like Gladius and Fielder [268] thus speeding up the hybrid wheat production. These studies demonstrate the utility of the CRISPR/Cas9 system for the rapid generation of male sterility in commercial wheat cultivars for breeding programs. Although CRISPR mediated genome or gene editing was demonstrated to be successful, it's widespread implementation still encountered difficulties involved in the low regeneration efficiency of crops, such as wheat. Recently GRF-GIF (Growth-Regulating Factor-GRF-Interacting Factor) wheat transformation system has become the game-changer by using GRF-GIF chimeric protein construct which improves the regeneration efficiency up to 100% [269] in the transgenic wheat lines.
Such CRISPR-generated lines can either be released as a variety or can be used as germplasm stocks. Although the utility of this revolutionary technology for crop improvement was demonstrated, the regulatory approvals for the use of gene-edited plants still vary among different countries [270]. CRISPR can also be utilized for testing the effect of a mutated allele on the resulting phenotype. When regulation is too strict in certain countries in Europe, this could be used later to search for such alleles in natural populations of wheat progenitors and germplasm stored in gene banks (what is so-called 'natural variation'). The recent initiatives to sequence thousands of gene bank accessions [271] can help to facilitate this approach.
An additional benefit of using CRISPR/Cas9 genome editing is that genome editing for the first time allows direct transfer of favorable alleles into elite breeding material without the typical linkage drag associated with cross-breeding. Wheat is crossed with maize to induce haploids and colchicine is applied to get doubled haploid plants that serve as breeding material or could be introduced as a variety thus speeds up the wheat breeding [28]. Taking advantage of the well-established wheat × maize crossing system, maize pollens carrying gRNA for plant height genes, BRI1, and SD1 were crossed to wheat for inducing site-directed targeted mutagenesis in wheat [272] without the need of segregating out the transgene. It helped in reducing the genotype-dependent site-directed mutagenesis. It can also be used for introducing mutations in multiple genes with one cas9/gRNA-transgenic (pollinator) plant thus providing an opportunity for multiplex gene editing in wheat.

Wheat reference genome and in silico bioinformatics
The NGS technologies, as well as high-resolution optical mapping, generates huge amounts of data [273]. Fortunately, bioinformatics tools have been improved significantly in the past years to cope with the challenge of analyzing such data. Advancements made possible by a dualstrategy approach focused on hardware and software have included clock-frequency increases, node reduction, multicores and integration through System on a Chip (SoC) with unified memory between the Central Processing Unit (CPU), Graphics Processing Units (GPU). machine learning, artificial intelligence, Artificial Neural Network (ANN) analysis, and massive parallelism allowed by multi-core and many-core architectures [274,275]. All this has contributed to our ability to sequence de novo, assemble, and annotate extremely large and complex genomes.

Characterization of gene families using the reference genome
As indicated above, the availability of reference genome sequences in many crop species, including wheat, has sparked the publication of many works about genomics and breeding of such species, using bioinformatics tools, with special emphasis on previously unknown areas.
For instance, besides the examples described above, in silico analyses of the published wheat reference genome, IWGSC RefSeq v1. [14] has allowed the identification and characterization of the following gene families: i) Domain of Unknown Function (DUF) 966 (TaDUF966) gene family, with some involved in salinity stress tolerance [276].
iii) Gretchen Hagen3 (TaGH3) gene family the members of which play different roles in various biological processes, including phytohormone responses, growth, development, metabolism, defense, and abiotic-stress tolerance such as salinity and osmotic ones with polyploidization contributing to their high number [278]. iv) Super Oxide Dismutases (SOD) gene (TaSOD) family which encode antioxidant enzymes scavenging Reactive Oxygen Species (ROS) and are also involved in plant growth, development, and abiotic stress tolerance, including drought and salinity [279].
v) Non-specific lipid transfer proteins (nsLTP/LTP) gene (TansLTP/TaLTP) family which are involved in transporting phospholipids across membranes, growth, development, and abiotic stresses, such as drought and salinity with their high number due to gene duplications [280].
vii) S-phase Kinase-associated Protein 1 (SKP1) gene (TaSKP1) family that encodes core subunits of the Ubiquitin Proteasome 26S (UPS) which expanded through duplications, and tare involved in development and stress signaling [282].
viii) Subtilase or subtilisin-like protease (SBT) genes (TaSBT) are involved in many biological functions, such as defense and tolerance to biotic stresses caused by pathogens, among which are Puccinia striiformis f. sp. tritici, which is the fungus generating the wheat stripe-rust disease [283].
ix) The so-called "Soluble N-ethylmaleimide Sensitive Factor (NSF; SNF) Attachment Protein (SNAP) REceptor" (SNARE) and Novel Plant SNare (NPSN) gene families (TaSNARE and TaNPSN, respectively) are involved in growth and development, regulating vesicle trafficking, fusion, and targeting to vacuoles and exocytosis and are, interesting, highly and structurally conserved suggesting their important biological roles [284].
x) The genes known as "DNA binding with one finger" (Dof) gene (TaDof) family (encoding zinc-finger transcription factors (TaDof)), are involved in phytohormone response, growth, development, metabolism, defense and stress responses, including both abiotic (such as salinity and drought) and biotic ones with their high number due to polyploidization showing many segmental duplications and both miRNA and cis-regulators involvement in modulating their gene expression profiles [285].
xi) The basic leucine ZIPper (bZIP) gene (TabZIP) family encodes transcription factors and are involved in plant growth, development, metabolism, chlorophyll content, photosynthesis, membrane stability, and tolerance to stresses, including abiotic ones such as drought, salinity, and heat, also involving oxidative stress [286].

Miscellaneous applications involving reference genome
Bioinformatics analyses of the published wheat reference genome, IWGSC RefSeq v1.0, also has enabled the following developments: i) Comparisons of cDNA, identification, and annotation of genes. These comparisons revealed different metabolic pathways, including starch and sucrose, as well as genes related to abiotic and biotic stress tolerance, signaling and transportation were also found [287].
ii) Production of microchips such as the Axiom Wheat high-density Genotyping Array, Axiom Wheat Breeder's Genotyping Array, and GeneChip Wheat Genome Array from Affymetrix-Applied Biosystems-Thermo Fisher Scientific <https://www.thermofisher.com/order/catalog/product/900560TS#/900560TS>. The quality of SNP in the first two arrays above has been evaluated, using a three-way classification system permitting the sorting of SNP into three quality groups [288].
iii) Arbor Biosciences IWGSC exome array. Working with the IWGSC, Arbor Biosciences released in 2019 the myBaits(R) Expert Wheat Exome capture panel based on the complete high-confidence exon-annotated genome. It included 2 million probes targeting more than 200 megabases of high-confidence exons (https://arborbiosci.com/genomics/targetedsequencing/mybaits/mybaits-expert/mybaits-expert-wheat-exome/). iv) New microRNA (miRNA), including polycistronic miRNA, in cultivated and wild species. Their targeted genes were predicted. Some are monomorphic, whereas others are polymorphic, suggesting selective pressures for essential or diverse roles, respectively. They are spatially and temporally regulated, being involved in post-transcriptional gene regulation, including transcription factors. This allows an extra level of fine-tuning for the gene regulation of different biological processes, including metabolism, growth, development, transport, cell signaling, structural proteins, and abiotic and biotic stress tolerance. Their abundance is due to duplication events [289]. v) A genome-wide transcriptomics approach has been used to dissect the dynamics and underlying regulation of wheat-spike development. Genes involved are related to meristem maintenance, initiation and transition, development of flowers, and flowering response to stress [290].
vi) Complex metabolic engineering can be exploited to improve cereals like wheat-producing essential Polyunsaturated Fatty Acids (PUFA), including the healthy omega 3 (ω-3) and omega 6 (ω-6). Genetic engineering and synthetic biology tools can be used to reach such a goal [291]. Similarly, these technologies can be used to generate Marker-Free and Transgene Insertion site-Defined (MFTID) transgenic plants. Thus, the lipoxygenase (LOX) gene expression was repressed using an RNA interference (RNAi) cassette. Such technology can be used to reduce the lipid peroxidation (improving storability) and increase nutrient quality, such as the level of healthy fatty acids (e.g. linoleic and linolenic) of wheat seeds [292].
vi) Cereals can be improved as well so that their agricultural waste (plant cell wall, being mainly made of cellulose, hemicellulose and lignin) contains more cellulose and less lignin, thereby allowing its use as feedstock for biofuel or bioproducts production. Currently, high costs and low yields are associated with their use due to the molecular structure of the natural lignocellulosic biomass. As expected from an evolutionary point of view, it is hard to enzymatically hydrolyse into glucose, as it is resistant to microorganism degradation [293].
In short, these developments hold interesting potential applications for wheat improvement, within molecular breeding programs, such as enhancing yield traits, as well as biotic and abiotic-stress tolerance which are particularly relevant in the present scenario of global warming and climate change.

Recommendations for bridging the gap between wheat genomics and breeding
Wheat genomics lagged behind other crops for a long time due to its large and intractable genome. Thanks to the advances in sequencing technologies, and the availability of the highquality reference genome sequence, the marker tool kit for wheat breeding has expanded significantly as millions of SNPs are now available from different genotyping platforms. As a result, there has been an exponential increase in the number of QTL and GWAS studies for a plethora of complex traits. Despite the advances made in complex trait dissection, we have not reached the level of knowledge required to identify quantitative trait nucleotide (QTN) from QTL identified in these studies, as was perhaps anticipated from high-resolution GWAS. Candidate genes have only been predicted and 'causal' genes remain to be unidentified. This is partly due to the complex nature of quantitative traits but largely because efforts have not been made to go beyond GWAS results. Abundant transcriptome sequence data has been generated in wheat and is freely available in open-access servers such as Wheat Expression (https://wheat.pw.usda.gov/WheatExp/#), which can be utilized to narrow down the candidate genes identified in GWAS analyses for future validation projects. The availability of sequenced mutant populations has opened doors to conduct validation studies. Predesigned SNP-based primers are available in the Ensemble database (http://plants.ensembl.org/Triticum_aestivum/Info/Index) for validating the mutations which can be combined to develop double or triple null mutants for research and breeding applications. Efforts in this direction will be required if 'causal' genes are to be identified. Transcriptomics and expression analysis has been combined with QTL and metaQTL to narrow down and validate the candidate genes by some groups [79,100,294] and we expect it to applied more frequently in marker-assisted and genomic selection.
The role of synthetic wheat in imparting stress tolerance is well known in wheat and wheat gene banks harbour hundreds of such synthetic wheat. In the post reference genome era, extensive genotyping efforts have been undertaken to genotype entire gene bank accessions to generate the so-called 'digital gene banks'. For example, CIMMYT has generated GBS data on ~100K accessions stored in its gene bank in order to bridge the gap between genetic resources and breeding pipelines. Although success has been achieved in quantifying genome contributions of the wild germplasm (synthetics, landraces) to the current elite germplasm, it is still unknown how to devise a genome-based strategy to deploy favourable introgressions from synthetic wheat to enhance breeding value. The development of new genomic selection and machine learning models and tools will be required to predict the best exotics from gene bank without having to invest in laborious and costly multi-environmental field testing.
Furthermore, a big challenge that scientists have faced with the availability of wheat genome reference (IWGSC RefSeq v 1.0) is its integration with the previously published genetic maps harbouring QTL for various traits. Large datasets including physical maps, sequence variations, gene expression, markers and phenomic data have already been integrated on the IWGSC RefSeq v1.0 using the Wheat@URGI portal [295]. Thus, a user-friendly practical haplotype graph integrating different marker types and aligned with the reference genome will soon be required to capitalize on existing information of trait-linked SNPs, DArTseq and/or GBS markers.
Despite tremendous progress in gene discovery and application of new genomic tools, the practical use of MAS or GAB is still limited in public programs. Large public breeding programs in North America use only selected markers or genomic selection for specific purposes without integrating fully the opportunities current knowledge offers. Breeding programs in large wheat-producing countries like China, India, Pakistan and Russia are still based on conventional lengthy crossing and selection procedures. Automation technology, fifth-generation mobile networks, cloud-based technologies, and artificial intelligence (AI) applications such as deep learning (DL) of images, phenotyping have shown great power to promote fundamental crop research and crop breeding. Not all programs, however, especially in the less developed economies can engage geneticist or bioinformatics specialists to assist in the conversion of their programs to genomic-based approaches, especially in the less developed economies. Thus, this remains an important challenge for the full deployment of MAS in wheat.
An important part of genetic analyses is the identification of the candidate genes and /or diagnostic marker(s) in linkage equilibrium to the trait(s) of selection interest. Many studies for abiotic stresses lack identification of candidate genes and this has slowed down MAS in wheat. Typically, candidate genes are identified by locating the QTL region on the genome assembly and the region contains several genes. If the researcher selects only a few genes to validate their expression under a certain condition, the possibility of human bias [296] can come into play and slow down the progress of GA. For example, looking for K + /Na + transporters for salinity studies could lead to ignoring important genes in tolerance mechanisms. Therefore, we suggest that we need to re-think our strategy while doing selection and utilizing all current capabilities now feasible. In this way, it will be possible to complement genomic selection with in-silico transcriptomics analysis for all potential genes, thereby casting a wider net to capture a more complete complement of genes that are relevant to the phenotype under study or selection.
Given the breadth of genomic tools and resources now available for breeders and scientists, the next few years will result in significant advances in the wheat improvement and enable breeders to develop varieties that can withstand the coming challenges associated with climate change.