Comparative analysis of the WRKY gene family reveals the gene family expansion and evolution in diverse plant species

: The WRKY gene family plays a very diverse role in plant growth and development. These genes contained an evolutionarily conserved WRKY DNA binding domain, which shows functional diversity and extensive expansion of the gene family. In this study, we conducted a genome-wide comparative analysis to investigate the evolutionary aspects of the WRKY gene family across various plant species and revealed significant expansion and diversification ranging from aquatic green algae to terrestrial plants. Phylogeny reconstruction of WRKY genes was performed using the Maximum Likelihood (ML) method; the genes were grouped into seven different clades and further classified into algae, bryophytes, pteridophytes, dicotyledons, and monocotyledons subgroups. Furthermore, duplication analysis showed that the increase in the number of WRKY genes in higher plant species was primarily due to tandem and segmental duplication under purifying selection. In addition, the selection pressures of different subfamilies of the WRKY gene were investigated using different strategies (classical and Bayesian maximum likelihood methods (Data monkey/PAML)). The average dN/dS for each group are less than one, indicating purifying selection. Our comparative genomic analysis provides the basis for future functional analysis, understanding the role of gene duplication in gene family expansion, and selection pressure analysis.


Introduction
Plants need to be involved in a variety of complex mechanisms to respond to various factors and stresses in the external environment and maintain regular growth and expansion.Plants cannot move to protect themselves from adverse environmental conditions and can adapt to biotic or abiotic stress responses.Thus, plants include multiple regulatory mechanisms that allow them to amplify regulatory signals and responses to stress at the cellular, molecular, and physiological levels [1].WRKY transcription factors (TFs) can regulate various stress responses through a composite network of genes.At the molecular level, the correlation of WRKY genes in plants could provide the most expected outcome of synchronized responses.Activation or inhibition of WRKY TFs by binding with W-box or W-box-like sequences is regulated at the transcriptional and translational levels.Due to the strong regulation associated with explicit recognition of WRKY binding to the promoter sequence, they are promising candidates for crop improvement [2].TF is a specialized class of peptides that participates in gene regulation exclusively in plants [3].They activate or suppress the expression of several target genes by binding to specific DNA segments in the promoter region.WRKY is one of the most important families of transcription factors for regulatory genes, primarily recognized in plants [4].WRKY TF, an important class of the stress-responsive TF family, is actively involved in the regulation of plant growth and development, as well as in the biotic and abiotic stress responses [5].One of the most common characteristics of all WRKYs is the presence of a highly conserved WRKY domain sequence approximately 60 amino acids in length at the N-terminus and a zinc-finger structure at the C-terminus [6].Based on the type of WRKY domain and the pattern of zinc-finger motifs, WRKY proteins are mainly classified into groups (I, II, III) [7].WRKY transcription factors are generally considered to be plant-specific and have been studied in several plant species such as rice (Oryza sativa), Arabidopsis thaliana, cucumber (Cucumus sativus), grapes (Vitis vinifera), poplar (Populus trichocarpa), and pigeon pea (Cajanus cajan), respectively.The members of group I contain two WRKY domains, while members of groups II and III contain only a single WRKY domain.Group II is further subdivided into subgroups (IIa-IIe) according to the presence of conserved short motifs [7].The WRKY members of groups I and II are composed of C2H2-type zinc-finger-like sequences, while group III WRKY members include C2HC-type zinc-finger-like sequences [8].
The WRKY TF gene family has been studied in many plant species for several years, but little is known about the phenomenon of expansion and evolution of the WRKY gene family in higher plants.Therefore, considering the importance of the WRKY TF gene family in plant defense mechanisms and growth, and developmental processes, an attempt was made to understand the evolution, regulation, and distribution of the WRKY gene family and its further exploration in higher plants.The main purpose of the current analysis is to gain more insight into the genomic distribution, organization, and evolution of the WRKY gene family across different crop lines.In this study, we conducted a genome-wide comparative study of the WRKY-TFs gene family in 40 plant species belonging to diverse groups such as green algae, bryophytes, pteridophytes, monocotyledons, and dicotyledons.Our analysis provides useful information about the WRKY gene family, which supports potential functional and ecological studies of this essential gene family in higher plant species.

Multiple sequence alignment and phylogenetic analysis
Multiple sequence alignments of all conserved WRKY protein sequences from 14 plant species representing major plant groups, such as monocot, dicot, and lower plant species were performed using ClustalX version 2.0 [34] with default parameters.The alignment files generated from the ClustalX were converted into the MEGA format after that MEGA6 software [35] was used to build the phylogenetic tree.The phylogenetic trees were generated by using the neighbor-joining (NJ) method under the Jones-Thornton-Taylor (JTT) method based on the following parameters, including 1000 bootstrap replicates, amino acid type of substitution, Poisson model, 95% of site coverage, and partial deletion of gap/missing data.Based on the multiple sequence alignment, phylogenetic analyses as well as previously reported classification of AtWRKY and OsWRKY genes, WRKY genes were assigned to different groups and subgroups.

Gene duplication and evolutionary analysis
Gene duplication analysis was done using the MCScanX software package [36].To perform the duplication analysis, homologs/paralogs among all predicted WRKY protein sequences were found by deploying all vs all BLASTP programs with the parameters including V = 10, B = 100, filter = seg, e-value <1 × 10 −10 and -m 8 (for tabular format).All vs. all BLASTP outputs were integrated into the MCScanX program, along with the predicted chromosomal coordinates of all protein-coding genes.MCScanX can classify the origin of duplicate genes within a gene family into various categories, including proximal, dispersed, segmental/WGD, and tandem duplicates based on the copy number and genomic distribution.The duplication events were predicted to compare the contribution of duplication events toward the expansion of the WRKY gene family across selected plant species.Orthologous gene locations and segmental duplication were visualized by using Circos v0.68 (http://circos.ca/).Synonymous (Ks: which do not alter amino acids) and non-synonymous (Ka: alters the amino acids) substitutions of each orthologous gene pair were used to calculate the selection pressure for all orthologous gene pairs of selected species WRKY family [37].As a result, the PAL2NAL program was used to convert protein alignments and their corresponding DNA (or mRNA) sequences into codon alignments [38].The PAL2NAL program automatically determines the Ka and Ks values by CODEML script in the PAML program.The synonymous (Ks) values above 5.0 were excluded from the analysis due to saturated substitutions at synonymous sites [39].

Identification of WRKY genes across 40 diverse plant species
In our study, we thoroughly analyzed the WRKY gene family in 40 plant species, starting from marine algae to flowering plants (Figure 1).A total of 3234 WRKY homologs were identified from the selected plant genomes (Figure 2).The presence of the WRKY gene in unicellular aquatic algae indicates their ancient origin and their functional conservation.In addition, the presence of greater numbers of WRKY genes in higher plants indicates a broader degree of expansion as compared to marine algae.Initially, only two homologs were identified in all three marine algae V. carteri, O. lucimarinus, and C. reinhardtii respectively, which was extremely lower than those present in land plants.Besides, in P. patens a most primitive land plant which is diverged from marine plants [40], and Amborella trichopoda, a single living species of the most basal lineage of the clade angiosperms [41], thirty-two and twenty-nine WRKYgenes were recognized, respectively.While, in S. moellendorffii a non-seeded vascular plant [42], thirty-five WRKYs were present.In seven monocotyledonous plant species, 45 (in H. vulgare) to 172 (T.aestivum) WRKY genes were identified, and in the other twenty-six eudicots, 44 (S. melongena) to 188 (in G. max) WRKY genes have been observed, which indicating the widespread gene expansion and duplication events.The maximum number of the WRKY genes was identified in the leguminous plant, G.max (188) and in T. aestivum (172) respectively.The expansion of the WRKYgene family in wheat is primarily due to gene duplication events, and compared with tandem duplication, segmental duplication might play a more critical role [43].Although differences in copy number among the plant species appear very complex, our analysis showed that the number of WRKY genes in each plant species is positively correlated with the number of genes present in that species.In addition, the WRKY gene was unevenly distributed within the species or among the plant species.For example, no WRKY genes were present in 10 out of the 27 P. patens chromosomes, while about 51% of the total Oryza nivara WRKY genes were present only on chromosomes 1, 5, and 11 out of the total 12 chromosomes.Similarly, in G. max and G. raimondii, both the gene copy numbers and distribution of WRKY genes were different.

Comparative phylogenetic analysis
Comparative analysis of the WRKY genes was performed in 40 plant genomes consisting of monocotyledonous, dicotyledonous, and lower eukaryotes but the main focus was on the pigeonpea WRKY gene family.The WRKY gene plays an important role in the regulation of gene networks which are associated with several important developmental processes and defense responses in plants.Genome-wide analysis of the WRKY gene family showed that the unicellular green algae have lesser no. of WRKY genes (1-4), followed by P. patens (nonvascular plant) and S. moellendorffii (vascular plant) of the lower plant group having 32 and 35 WRKY genes, respectively.
According to phylogenetic analysis plant WRKYs can be divided into seven clades and named I, II, III, IV, V, VI, and VII, respectively (Figure 3), as reported earlier in Arabidopsis, rice, and other plant species.Monocots and dicots were distributed between all clades (shown by light green and light blue colors).The algal species (indicated by orange color) are observed in clades I and VI, and bryophyte (P.patens) are predominantly observed in clades number VII, VI, and V (indicated by dark blue color), while pteridophyte (S. moellendorffii) was observed in all clades except clade IV (represented by red color).Clade IV contained only monocotyledonous and dicotyledonous species.

Expansion of WRKY gene family among plant genomes
Gene expansion or duplication, arising from polyploidy or through tandem and segmental duplication associated with replication, is a major factor of gene family expansion.Various types of gene duplications such as segmental duplications (SD) or whole-genome duplication (WGD), and single gene duplication (including proximal, tandem duplications as well as dispersed duplication), have been identified.Out of these gene duplication patterns, tandem and segmental type of duplication are the two major causes of gene family expansion in plants.Therefore, assessment of the duplication type in WRKY genes family for the selected plant genomes was performed using MCscanX program, including algae O. lucimarinus, V. carteri, C. reinhardtii, the basal land plant species S. moellendorffii, P. patens as well as angiosperms (monocotyledonous and dicotyledonous).The results suggested that the number of WRKY genes maintained by different type of gene duplication events.The dispersed type of duplication mode preferentially detected in all selected plant species.While, WGD or segmental duplication pattern of WRKYgenes were detected in most of the higher plant species, excluding algae and mosses, which might be correlated to the fact that all flowering plants go through one or more whole-genome duplication events.For example, there are 16 GmWRKY gene pairs (Chr03&Chr19: Glyma03G176600-Glyma19G177400, Glyma03G220100-Glyma19G217000, Glyma03G220800-Glyma19G217800, Glyma03G224700-Glyma19G221700, Glyma03G256700-Glyma19G254800), (Chr09&Chr15: Glyma09G005700-Glyma15G110300, Glyma09G029800-Glyma15G135600, Glyma09G034300-Glyma15G139000, Glyma09G061900-Glyma15G168200, Glyma09G080000-Glyma15G186300), and (Chr09&Chr18: Glyma09G240000-Glyma18G256500, Glyma09G250500-Glyma18G242000, Glyma09G254400-Glyma18G238600, Glyma09G254800-Glyma18G238200, Glyma09G274000-Glyma18G213200, Glyma09G280200-Glyma18G208800) were identified as segmental duplicated genes (Figure 4), suggesting that the expansion of GmWRKY genes were possibly occurs due to gene segmental duplication.Specifically, the WRKY gene accounted for 37.1%, 32.4%, 17.4%, and 4.5%, of the duplication, retained from WGD/segmental duplication, dispersed, tandem, and proximal duplication respectively.Due to the segmental and tandem duplication events, a significant increase in the number of WRKY genes was observed in higher plants compared to the basal land plants.A species-specific duplication model was detected and the proportion of the segmental and tandem duplication in all plant species was not equal.For example, in the case of monocots, similar type of trends of tandem, as well as WGD/segmental duplication was observed in all selected monocotyledons except T. aestivum and H. vulgare in which WGD/segmental type of duplication is absent.Conversely, in dicotyledonous plants, WGD/segmental duplication mainly enriched the expansion of G. hirsutum, B. oleracea, G. max and P. tricocarpa WRKY genes and the tandem duplication event confers to the expansion of WRKY genes in M. truncatula and S. tuberossum plant species.In G. raimondii, more than 74% of WRKY genes, were derived through WGD/segmental duplication and 32.0% of genes are WGD/segmental duplicated in case of B. oleracea and Z. mays, while 33.0% of genes in S. tuberossum, 29% in M. truncatula and24 % in Z. mays were derived through tandem duplication event, which were much greater than other plant species (Table 1

Identification of orthologous gene pairs and evolutionary analysis
For the selection pressure analysis, we identified 42 orthologous gene pairs between C.cajan and G.max, 60 orthologous pairs between G.max and adzuki bean, and 36 orthologous gene pairs between C.cajan and adzuki bean respectively, based on the phylogenetic as well as sequence homology (Table 2).The higher and lower levels of protein sequence identity between C.cajan and G.max were observed in the pairs CcWRKY86-Glyma02G297400 (93.42%) and CcWRKY71-Glyma02G141000 (81.14%) with an average sequences identity of 85.24%.The higher and lower level of amino acid identity between the G.max and adzuki bean gene pairs were Glyma05G127600-Vang0333s00130 (96.63%) and Glyma09G254400-Vang04g03920 (81.14%) with an average of 85.40%.The higher and lower levels of protein sequence identity between C.cajan and adzuki bean were the gene pairs CcWRKY75-Vang0333s00130 (96.21%) and CcWRKY13-Vang01g02180 (81.14%) with an average sequences identity of 85.48%.
The chromosomal distribution and syntenic relationship between C. cajan-G.max, G. max-adzuki bean and C. cajan-adzuki bean orthologous gene pairs were shown in Figure 5.The physical mapping of WRKY genes revealed that (56%-64%) of WRKY genes in most of the leguminous species such as pigeonpea, adzuki bean, common bean and mung bean were not located in the corresponding chromosomes, suggesting the occurrence of substantial chromosomal rearrangement in the leguminous genomes.In addition, the level of variation between the synonymous substitution (dS) and non-synonymous substitution (dN) values was used to infer the direction and magnitude of natural selection acting on WRKY orthologous gene pairs in pigeonpea, soybean and adzuki bean.The dN/dS distribution showed that the WRKY orthologous gene pairs in leguminous species subject to stronger purifying selection pressure during evolution (Table 2).Table 2. Identification of synonymous (dS) and non-synonymous (dN) substitution rates for WRKY orthologous gene pairs between pigeonpea-soybean, soybean-adzuki bean and pigeonpea-adzuki bean.

Discussion
Plants are constantly being challenged by invaders that influence their growth and development processes.To cope with invaders plants,use their sentries and WRKY plays an important role.Therefore, in this analysis, an attempt was made to identify the distribution of WRKY throughout the plant lineages.Publicly available genomes of 40 plant species whose genomes were completely sequenced were used.Interestingly, some plant genomes such as S. italica (105), Z. mays (135), S. bicolor (133), O. sativa (107), T. aestivum (172), R. sativus (126), B. rapa (145), B. oleracea (148), P. trichocarpa (102), M. domestica (127), G. raimondii (120), and G. hirsutum (197) contained more than hundreds of WRKYs (Figure 3).In our current study, WRKY homologs have been investigated in 40 plant species, ranging from unicellular algae to higher plants, revealing their functional importance and ancient origin.Only one or two WRKY homologs are predicted for the three aquatic algae, and 29 to 197 homologs are predicted in terrestrial plants, indicating rapid gene expansion of the WRKYgene family in higher plants (especially angiosperms).
Phylogenetic analysis of selected plant species revealed that the WRKY-TF is present in monocotyledons, dicotyledons, and lower eukaryotes.This means that the origin of most WRKY-TFs in plants precedes the divergence of these species.No species-specific groups, subgroups, or clades have been observed in the phylogenetic tree.This indicates that the WRKY gene family is more conserved during evolution.
In addition, WRKY domains from similar ancestors tended to cluster together in the phylogenetic tree, which was not observed in this study.This suggests that they accomplished duplication after divergence.The WRKY genes that clustered together are orthologous ones and also evolutionarily closer than others.The phylogenetic relationship found in this study showed that WRKY TFs may evolve conservatively.Only a few WRKY genes were identified in lower eukaryotes, including O. lucimarinus, V. carteri, and C. reinhardtii, while higher plants had a larger number of WRKY genes.This suggests that the most primitive evolutionary origin of the genes containing the WRKY-TF was from aquatic algae.This suggests that the WRKY protein evolved before plants transitioned from an aquatic to a terrestrial habitat.With the uninterrupted evolution of plant species, terrestrial plants have evolved a series of highly sophisticated signaling processes that help them to adjust to the ever-changing environmental circumstances, and hence, the number of WRKY-TFs increased in different plant species.
Gene family expansion is generally due to gene duplication events commonly occurring in plants and is considered to be an important evolutionary force in the expansion and evolution of gene families that are the source of new biological functions.Estimation of the recent gene duplication events, including segmental, tandem, proximal and dispersed duplication was conducted that led to the expansion of the WRKYgene family in flowering plants.Dispersed type of duplication models mainly contributes to genetic novelty and adaptation to new environments.The results indicate that distinct duplication patterns were observed, that implies a range of functional divergence between different plant species or taxa.The tandem type duplication was observed in most of the plant species except Malus domestica and aquatic algae.Furthermore, only 12 segmental duplicated FvWRKY have been identified in the Fragaria vesca genome, suggesting that tandem and segmental duplication events occur at low levels in the FvWRKY gene family.Therefore, the number of WRKY gene members may be lesser due to the absence of segmental and tandem duplication events in several plant genomes.Based on our results, we suggested that the ancestral core WRKY genes possibly mainly dispersed duplications.Consequently, segmental and tandem duplications mainly contributed to the expansion of the WRKY gene family in angiosperms.The duplication and expansion of genes followed by functional diversification, and diversification of gene function can play a significant role in providing novel genes for adaptation to new environments.The expansion of the WRKY gene family, and their distinct roles in various processes, evidently indicate their functional variations and evolutionary history.
The orthologous WRKY genes between different plants are usually supposed to maintain related properties and provide other important functions.Therefore, comparative analysis of the WRKY orthologous genes between legume plants might be helpful in predicting their hereditary relationship and possible functions of the WRKY protein in pigeon pea, soybean, and adzuki bean.Our study identified 42 orthologous gene pairs between pigeon pea and soybean, 60 orthologous gene pairs between soybean and adzuki bean, and 36 orthologous gene pairs between pigeon pea and adzuki bean, which gives a very important key for further functional prediction of WRKY genes in pigeon pea, soybean, and adzuki bean.

Conclusion
Overall, our comparative genomic analysis of the WRKY gene family provides a comprehensive framework for understanding the evolutionary dynamics, duplication patterns, and selection pressures shaping this important gene family across diverse plant lineages.This study not only enhances our knowledge of plant evolution but also paves the way for targeted approaches to harness the potential of WRKY genes in crop improvement and environmental resilience.

Figure 1 .
Figure 1.Schematic representation of the methodology used for the identification of the WRKY gene family across selected plant genomes.

Figure 2 .
Figure 2. NCBI taxonomy trees of 40 plant species showing the no. of predicted WRKY genes in each crop.The species from different taxonomic groups were marked with a specific color.

Figure 3 .
Figure 3. Phylogenetic relationships of WRKY proteins from 14 different plant species.The tree was built using the neighbor-joining method by MEGA6.0.The different colors of the inner circle represent WRKY groups, and the colors of the outer circle represent the crop category; (orange-algae, blue-bryophytes, redpteridophytes, light blue-dicots, and green color represents monocts).

Figure 4 .
Figure 4. Segmental duplications of GmWRKY genes in Glycine max genome.Note: Red color lines denote segmental duplicated GmWRKY gene pairs.Each chromosome in the circle is represented with a different color.

Figure 5 .
Figure 5. Orthologous WRKY genes between the Cajanus cajan and Glycine max and Vigna angularis.Cc, Gm, and Va indicates the chromosomes of Cajanus cajan, Glycine max as well as Vigna angularis and each chromosome represented with different color.(A) Green color lines indicate the homologous genes pairs between the pigeonpea and soybean chromosomes; (B) red color lines represents the homologous genes pairs between soybean and adzuki bean chromosomes; and (C) blue color lines indicates the homologous gene pairs between pigeonpea and adzuki bean.

Table 1 .
). Distribution of the type of duplication in all selected 40 plant species.