Complex traits are the features whose properties are determined by both genetic and environmental factors. Generally, complex traits include the classical quantitative traits with continuous distribution, the binary or categorical traits with discrete distribution controlled by polygene and other traits that cannot be measured exactly, such as behavior and psychology. Most human complex diseases and most economically important traits in plants and animals belong to the category. Understanding the molecular basis of complex traits plays a vital role in the genetic improvement of plant and animal breeding. In this article, the conception and research background of complex traits were summarized, and the strategies, methods and the great progress that had been made in dissecting genetic basis of complex traits were reviewed. The challenges and possible developments in future researches were also discussed.
Several typical supervised clustering methods such as Gaussian mixture model-based supervised clustering (GMM), k- nearest-neighbor (KNN), binary support vector machines (SVMs) and multiclass support vector machines (MC-SVMs) were employed to classify the computer simulation data and two real microarray expression datasets. False positive, false negative, true positive, true negative, clustering accuracy and Matthews' correlation coefficient (MCC) were compared among these methods. The results are as follows: (1) In classifying thousands of gene expression data, the performances of two GMM methods have the maximal clustering accuracy and the least overall FP+FN error numbers on the basis of the assumption that the whole set of microarray data are a finite mixture of multivariate Gaussian distributions. Furthermore, when the number of training sample is very small, the clustering accuracy of GMM-Ⅱ method has superiority over GMM- Ⅰ method. (2) In general, the superior classification performance of the MC-SVMs are more robust and more practical, which are less sensitive to the curse of dimensionality, and not only next to GMM method in clustering accuracy to thousands of gene expression data, but also more robust to a small number of high-dimensional gene expression samples than other techniques. (3) Of the MC-SVMs, OVO and DAGSVM perform better on the large sample sizes, whereas five MC-SVMs methods have very similar performance on moderate sample sizes. In other cases, OVR, WW and CS yield better results when sample sizes are small. So, it is recommended that at least two candidate methods, choosing on the basis of the real data features and experimental conditions, should be performed and compared to obtain better clustering result.
Based on the major gene and polygene mixed inheritance model for multiple correlated quantitative traits, the authors proposed a new joint segregation analysis method of major gene controlling multiple correlated quantitative traits, which include major gene detection and its effect and variation estimation. The effect and variation of major gene are estimated by the maximum likelihood method implemented via expectation-maximization (EM) algorithm. Major gene is tested with the likelihood ratio (LR) test statistic. Extensive simulation studies showed that joint analysis not only increases the statistical power of major gene detection but also improves the precision and accuracy of major gene effect estimates. An example of the plant height and the number of tiller of F2 population in rice cross Duonieai x Zhonghua 11 was used in the illustration. The results indicated that the genetic difference of these two traits in this cross refers to only one pleiotropic major gene. The additive effect and dominance effect of the major gene are estimated as -21.3 and 40.6 cm on plant height, and 22.7 and -25.3 on number of tiller, respectively. The major gene shows overdominance for plant height and close to complete dominance for number of tillers.
XIAO Jing WANG Xue-feng HU Zhi-qiu TANG Zai-xiang SUI Jiong-ming LI Xin XU Chen-wu
It's well known that incorporating some existing populations derived from multiple parents may improve QTL mapping and QTL-based breeding programs. However, no general maximum likelihood method has been available for this strategy. Based on the QTL mapping in multiple related populations derived from two parents, a maximum likelihood estimation method was proposed, which can incorporate several populations derived from three or more parents and also can be used to handle different mating designs. Taking a circle design as an example, we conducted simulation studies to study the effect of QTL heritability and sample size upon the proposed method. The results showed that under the same heritability, enhanced power of QTL detection and more precise and accurate estimation of parameters could be obtained when three F2 populations were jointly analyzed, compared with the joint analysis of any two F2 populations. Higher heritability, especially with larger sample sizes, would increase the ability of QTL detection and improve the estimation of parameters. Potential advantages of the method are as follows: firstly, the existing results of QTL mapping in single population can be compared and integrated with each other with the proposed method, therefore the ability of QTL detection and precision of QTL mapping can be improved. Secondly, owing to multiple alleles in multiple parents, the method can exploit gene resource more adequately, which will lay an important genetic groundwork for plant improvement.
AO Yan HU Zhi-qiu TANG Zai-xiang WANG Xue-feng XU Chen-wu