Quantitative Trait Loci Mapping for Trait in Categorical Scale
QUANTITATIVE TRAIT LOCI MAPPING FOR TRAIT
IN BINARY AND ORDINAL SCALE
FARIT MOCHAMAD AFENDI
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2006
ABSTRACT
FARIT MOCHAMAD AFENDI. Quantitative Trait Loci Mapping for Trait in
Categorical Scale. Under the direction of Asep Saefuddin, Muhammad Jusuf, and
Totong Martono.
Genes or regions on chromosome underlying a quantitative trait are called
Quantitative Trait Loci (QTL). Characterizing genes controlling quantitative trait on
their position in chromosome and their effect on trait is through a process called QTL
mapping. In estimating the QTL position and its effect, QTL mapping basically
utilize the association between QTL and DNA markers. However, many important
traits are obtained in categorical scale, such as resistance from certain disease. From
a theoretical point of view, QTL mapping method assuming continuous trait could
not be applied to categorical trait.
This research was focusing on the assessment of the performance of
Maximum Likelihood (ML) and Regression (REG) approach employed in QTL
mapping as well as the performance of Lander and Botstein (LB) and Piepho method
in determining critical value in testing the existence of QTL for binary and ordinal
trait by means of simulation study. The simulation study to evaluate the performance
of ML and REG approach was conducted by taking into account several factors that
may affecting the performance of both approaches. The factors are: (1) marker
density; (2) QTL effect; (3) sample size; (4) shape of phenotypic distribution; (5)
number of categories; and (6) number of QTL. Moreover, the simulation study for
evaluating LB and Piepho method in determining critical value was conducted by
generating distribution of the test statistic under null hypothesis.
From simulation study, it was obtained that LB and Piepho method showing
similar performance in determining critical value in testing the existence of QTL for
binary and ordinal trait. The simulation study also indicating that both methods
could be used in determining critical value in QTL mapping analysis for binary trait
as well as for ordinal trait if the REG approach is used but not if ML approach is
used due to their poor performance. In assessing the performance of ML and REG
approach in QTL mapping analysis for binary trait, the two approaches showing
comparable performance; whereas for ordinal trait REG approach showing poor
performance compared with ML approach in estimating thresholds. As a result, in
QTL mapping analysis,
ML and REG approach could be used when dealing with
binary trait, whereas ML approach is suggested when dealing with ordinal trait.
Keyword: QTL mapping, binary, ordinal, maximum likelihood, regression, critical
value
ABSTRAK
FARIT MOCHAMAD AFENDI. Quantitative Trait Loci Mapping for Trait in
Categorical Scale.
Dibimbing oleh Asep Saefuddin, Muhammad Jusuf, dan
Totong Martono.
Gen atau suatu segmen di kromosom yang mendasari sifat kuantitatif
dinamakan dengan Lokus Sifat Kuantitatif (Quantitative Trait Loci/QTL).
Penelusuran gen yang mengatur sifat kuantitatif dalam hal posisinya di kromosom
serta besar pengaruhnya dilakukan melalui proses yang dinamakan pemetaan QTL.
Di dalam menduga posisi QTL dan besar pengaruhnya, pemetaan QTL pada
dasarnya memanfaatkan hubungan antara QTL dengan penanda DNA. Di sisi lain,
banyak sifat penting lain yang diamati dengan skala kategorik seperti ketahanan
terhadap suatu penyakit. Secara teori, metode pemetaan QTL dengan anggapan sifat
kontinu tidak dapat diterapkan pada sifat kategorik.
Penelitian ini bertujuan untuk menilai performa metode kemungkinan
maksimum (ML) dan regresi (REG) yang diterapkan pada pemetaan QTL dan
performa metode Lander dan Botstein (LB) dan Piepho di dalam penentuan titik
kritis untuk pengujian keberadaan QTL untuk sifat dengan skala biner dan ordinal
dengan menggunakan simulasi.
Kajian simulasi untuk mengevaluasi performa
metode ML dan REG dilakukan dengan memperhatikan beberapa faktor yang
mungkin mempengaruhi performa kedua sifat ini. Faktor-faktor tersebut adalah: (1)
kepadatan penanda; (2) besar pengaruh QTL; (3) ukuran contoh; (4) bentuk sebaran
fenotipe; (5) banyaknya kategori; dan (6) banyaknya QTL.
Selanjutnya, kajian
simulasi untuk menilai performa metode LB dan Piepho di dalam penentuan titik
kritis dilakukan dengan membangkitkan sebaran statistik uji di bawah hipotesis nol.
Dari kajian simulasi diperoleh hasil bahwa metode LB dan Piepho
menunjukkan performa yang serupa untuk sifat biner dan ordinal. Kajian simulasi
juga menunjukkan bahwa kedua metode dapat diterapkan untuk sifat biner maupun
untuk sifat ordinal bila metode REG yang digunakan namun tidak bila metode ML
yang digunakan. Di dalam evaluasi performa metode ML dan REG untuk sifat biner,
kedua metode menunjukkan performa yang serupa; sedangkan untuk sifat ordinal
metode REG menunjukkan performa yang kurang dibandingkan metode ML
terutama di dalam menduga titik ambang (threshold). Dengan demikian, metode ML
dan REG dapat digunakan untuk sifat biner, sedangkan untuk sifat ordinal metode
ML yang disarankan.
Kata kunci: Pemetaan QTL, biner, ordinal, kemungkinan maksimum, regresi, titik
kritis
LETTER OF PRONOUNCEMENT
With this I stated that my thesis which is entitled:
Quantitative Trait Loci Mapping for Trait in Binary and Ordinal Scale
is based on my own research and never published before. All of the data source and
information has been stated clearly and could be reviewed.
Bogor, August 2006
Farit Mochamad Afendi
NRP. G151030091
QUANTITATIVE TRAIT LOCI MAPPING FOR TRAIT
IN BINARY AND ORDINAL SCALE
FARIT MOCHAMAD AFENDI
A Thesis submitted to the Graduate School of
Bogor Agricultural University
in partial fulfillment of the
requirements for the degree of
Master of Science
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2006
Copyright © 2006 by Bogor Agricultural University
All rights reserved
No part of this thesis may be reproduced, stored in a retrieval system, or transcribed,
in any form or by any means – electronic, mechanical, photocopying, recording, or
otherwise – without the prior written permission of Bogor Agricultural University
Title
:
Quantitative Trait Loci Mapping for Trait in Binary and
Ordinal Scale
Name
:
Farit Mochamad Afendi
NRP
:
G151030091
Study Program
:
Statistics
Approved by,
1. Advisory Committee
Dr. Ir. Asep Saefuddin
Chair
Dr. Ir. Muhammad Jusuf
Co-chair
Dr. Ir. Totong Martono
Co-chair
Acknowledged by,
2. Chair of Study Program of Statistics
Dr. Ir. Aji Hamim Wigena
Passed examination date: 21 July 2006
3. Dean of Graduate School
Dr. Ir. Khairil Anwar Notodiputro, M.S.
Graduation date:
To m y parents and m y lovely w ife
QUANTITATIVE TRAIT LOCI MAPPING FOR TRAIT
IN BINARY AND ORDINAL SCALE
FARIT MOCHAMAD AFENDI
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2006
ABSTRACT
FARIT MOCHAMAD AFENDI. Quantitative Trait Loci Mapping for Trait in
Categorical Scale. Under the direction of Asep Saefuddin, Muhammad Jusuf, and
Totong Martono.
Genes or regions on chromosome underlying a quantitative trait are called
Quantitative Trait Loci (QTL). Characterizing genes controlling quantitative trait on
their position in chromosome and their effect on trait is through a process called QTL
mapping. In estimating the QTL position and its effect, QTL mapping basically
utilize the association between QTL and DNA markers. However, many important
traits are obtained in categorical scale, such as resistance from certain disease. From
a theoretical point of view, QTL mapping method assuming continuous trait could
not be applied to categorical trait.
This research was focusing on the assessment of the performance of
Maximum Likelihood (ML) and Regression (REG) approach employed in QTL
mapping as well as the performance of Lander and Botstein (LB) and Piepho method
in determining critical value in testing the existence of QTL for binary and ordinal
trait by means of simulation study. The simulation study to evaluate the performance
of ML and REG approach was conducted by taking into account several factors that
may affecting the performance of both approaches. The factors are: (1) marker
density; (2) QTL effect; (3) sample size; (4) shape of phenotypic distribution; (5)
number of categories; and (6) number of QTL. Moreover, the simulation study for
evaluating LB and Piepho method in determining critical value was conducted by
generating distribution of the test statistic under null hypothesis.
From simulation study, it was obtained that LB and Piepho method showing
similar performance in determining critical value in testing the existence of QTL for
binary and ordinal trait. The simulation study also indicating that both methods
could be used in determining critical value in QTL mapping analysis for binary trait
as well as for ordinal trait if the REG approach is used but not if ML approach is
used due to their poor performance. In assessing the performance of ML and REG
approach in QTL mapping analysis for binary trait, the two approaches showing
comparable performance; whereas for ordinal trait REG approach showing poor
performance compared with ML approach in estimating thresholds. As a result, in
QTL mapping analysis,
ML and REG approach could be used when dealing with
binary trait, whereas ML approach is suggested when dealing with ordinal trait.
Keyword: QTL mapping, binary, ordinal, maximum likelihood, regression, critical
value
ABSTRAK
FARIT MOCHAMAD AFENDI. Quantitative Trait Loci Mapping for Trait in
Categorical Scale.
Dibimbing oleh Asep Saefuddin, Muhammad Jusuf, dan
Totong Martono.
Gen atau suatu segmen di kromosom yang mendasari sifat kuantitatif
dinamakan dengan Lokus Sifat Kuantitatif (Quantitative Trait Loci/QTL).
Penelusuran gen yang mengatur sifat kuantitatif dalam hal posisinya di kromosom
serta besar pengaruhnya dilakukan melalui proses yang dinamakan pemetaan QTL.
Di dalam menduga posisi QTL dan besar pengaruhnya, pemetaan QTL pada
dasarnya memanfaatkan hubungan antara QTL dengan penanda DNA. Di sisi lain,
banyak sifat penting lain yang diamati dengan skala kategorik seperti ketahanan
terhadap suatu penyakit. Secara teori, metode pemetaan QTL dengan anggapan sifat
kontinu tidak dapat diterapkan pada sifat kategorik.
Penelitian ini bertujuan untuk menilai performa metode kemungkinan
maksimum (ML) dan regresi (REG) yang diterapkan pada pemetaan QTL dan
performa metode Lander dan Botstein (LB) dan Piepho di dalam penentuan titik
kritis untuk pengujian keberadaan QTL untuk sifat dengan skala biner dan ordinal
dengan menggunakan simulasi.
Kajian simulasi untuk mengevaluasi performa
metode ML dan REG dilakukan dengan memperhatikan beberapa faktor yang
mungkin mempengaruhi performa kedua sifat ini. Faktor-faktor tersebut adalah: (1)
kepadatan penanda; (2) besar pengaruh QTL; (3) ukuran contoh; (4) bentuk sebaran
fenotipe; (5) banyaknya kategori; dan (6) banyaknya QTL.
Selanjutnya, kajian
simulasi untuk menilai performa metode LB dan Piepho di dalam penentuan titik
kritis dilakukan dengan membangkitkan sebaran statistik uji di bawah hipotesis nol.
Dari kajian simulasi diperoleh hasil bahwa metode LB dan Piepho
menunjukkan performa yang serupa untuk sifat biner dan ordinal. Kajian simulasi
juga menunjukkan bahwa kedua metode dapat diterapkan untuk sifat biner maupun
untuk sifat ordinal bila metode REG yang digunakan namun tidak bila metode ML
yang digunakan. Di dalam evaluasi performa metode ML dan REG untuk sifat biner,
kedua metode menunjukkan performa yang serupa; sedangkan untuk sifat ordinal
metode REG menunjukkan performa yang kurang dibandingkan metode ML
terutama di dalam menduga titik ambang (threshold). Dengan demikian, metode ML
dan REG dapat digunakan untuk sifat biner, sedangkan untuk sifat ordinal metode
ML yang disarankan.
Kata kunci: Pemetaan QTL, biner, ordinal, kemungkinan maksimum, regresi, titik
kritis
LETTER OF PRONOUNCEMENT
With this I stated that my thesis which is entitled:
Quantitative Trait Loci Mapping for Trait in Binary and Ordinal Scale
is based on my own research and never published before. All of the data source and
information has been stated clearly and could be reviewed.
Bogor, August 2006
Farit Mochamad Afendi
NRP. G151030091
QUANTITATIVE TRAIT LOCI MAPPING FOR TRAIT
IN BINARY AND ORDINAL SCALE
FARIT MOCHAMAD AFENDI
A Thesis submitted to the Graduate School of
Bogor Agricultural University
in partial fulfillment of the
requirements for the degree of
Master of Science
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2006
Copyright © 2006 by Bogor Agricultural University
All rights reserved
No part of this thesis may be reproduced, stored in a retrieval system, or transcribed,
in any form or by any means – electronic, mechanical, photocopying, recording, or
otherwise – without the prior written permission of Bogor Agricultural University
Title
:
Quantitative Trait Loci Mapping for Trait in Binary and
Ordinal Scale
Name
:
Farit Mochamad Afendi
NRP
:
G151030091
Study Program
:
Statistics
Approved by,
1. Advisory Committee
Dr. Ir. Asep Saefuddin
Chair
Dr. Ir. Muhammad Jusuf
Co-chair
Dr. Ir. Totong Martono
Co-chair
Acknowledged by,
2. Chair of Study Program of Statistics
Dr. Ir. Aji Hamim Wigena
Passed examination date: 21 July 2006
3. Dean of Graduate School
Dr. Ir. Khairil Anwar Notodiputro, M.S.
Graduation date:
To m y parents and m y lovely w ife
BIOGRAPHY
As the last children from sixth brother of my family, I was born to my father
H. Abdus Syakur Suyitno (alm) and my mother Hj. Siti Aisyah in 1979 in Jepara, a
city which well known on its wood carving and birthplace of Kartini.
I started my formal education in 1986. In July of that year, I entered to
elementary school at SD Negeri Panggang IV. After six years in the elementary
school and another three years in the SMP N 2 Jepara, I began to study in SMU N 1
Jepara in 1994.
It was that time when I was introduced with the concept of
Mendelian in genetics. However, after graduated from high school 3 years later, I
was study the wonderful statistics in Bogor Agricultural University. After four years
study in undergraduate, in 2002 I was given an honor to become lecturer in my
department. It is a nice place to work for with many nice peoples to work with. In
2003, I entered to graduate school still in statistics program. This time I was taking
genetics as my minor.
ACKNOWLEDGEMENTS
Alhamdulillah.
That is my first word as my thanks to Allah, God The
Almighty. Without Him, I will unable to do anything.
I would like to thank my first advisor Dr. Asep Saefuddin who encourages
me to brave writing paper in English since my undergraduate. I thank my second
advisor Dr. Muhammad Jusuf for give me the first sight in application of statistics in
genetics and for keep me enrich the biology concept in my research. I also thank to
my third advisor Dr. Totong Martono who inspired me to keep looking in my paper,
searching any inconsistency in the mathematical background.
During my introduction on application of statistics in genetics, I also discuss
with many experts. Ahmed Rebai in Centre of Biotechnology of Sfax Tunisia, thank
for your warm and helpful discussion. Thanks also for providing me your article
which help me enrich my research.
Jazakallah khairan katsira.
I thank Steve
Kachman in Department of Statistics, University of Nebraska Lincoln for his helpful
comment during the writing of the computer program. I also thank to Shizong Xu in
Department of Botany and Plant Sciences, University of California, at Riverside and
Lauren McIntyre in Department of Agronomy, Purdue University for helping me
contacting their colleagues during the finding of real data set of QTL mapping. I
also thank to the members of Quantitative Genetics as well as Animal Genome
mailing list who responded my email.
I thank my beloved wife Euis Siti Aisyah, for her love and for her
understanding and support to me. I would not have be able to go this far without her.
I thank my parents H. Abdus Syakur Suyitno (alm) and Hj. Siti Aisyah, for bringing
me to this world, for teaching me to be a good man, and for supporting me through
my education. I also thank my parents-in-law Ase Ruhiyat and Siti Hafsah, for
supporting us through their advice and pray.
TABLE OF CONTENTS
Page
LIST OF TABLES………………………………………………………………… vii
LIST OF FIGURES……………………………………………………………….. ix
INTRODUCTION ....................................................................................................... 1
Background.............................................................................................................. 1
Problems .................................................................................................................. 2
The design of the population................................................................................ 2
The statistical method employed.......................................................................... 2
The critical value in testing the existence of QTL ............................................... 3
The estimate of the QTL effect and location ........................................................ 3
Missing value in QTL data................................................................................... 4
Objectives ................................................................................................................ 4
THEORY AND METHODS ....................................................................................... 5
Backcross population ............................................................................................... 5
Trait in Binary Scale ................................................................................................ 6
Threshold model and liability .............................................................................. 6
Maximum likelihood (ML) approach................................................................... 7
Regression (REG) approach ................................................................................ 9
Trait in Ordinal Scale............................................................................................. 10
Threshold model and liability ............................................................................ 10
ML approach...................................................................................................... 11
REG approach ................................................................................................... 13
Critical Value......................................................................................................... 15
Heritability ............................................................................................................. 16
MATERIAL AND METHOD ................................................................................... 17
Design of simulation experiments in evaluation of the performance of ML and
REG approach........................................................................................................ 17
Design of simulation experiments for binary trait ............................................. 18
Evaluation of QTL effect................................................................................ 18
Evaluation of the shape of the phenotypic distribution ................................. 18
Evaluation of sample size .............................................................................. 19
Evaluation of marker density......................................................................... 19
Evaluation of number of QTL ........................................................................ 19
Design of simulation experiments for ordinal trait............................................ 20
Evaluation of QTL effect................................................................................ 20
Evaluation of the shape of the phenotypic distribution ................................. 20
Evaluation of sample size .............................................................................. 20
Evaluation of marker density......................................................................... 20
Evaluation of number of categories............................................................... 21
Evaluation of number of QTL ........................................................................ 21
Design of simulation experiments for evaluating critical value ............................ 21
RESULT AND DISCUSSION .................................................................................. 22
Result ..................................................................................................................... 22
Evaluation of methods in determining critical value ......................................... 22
Result for binary trait .................................................................................... 22
Result for ordinal trait ................................................................................... 24
Evaluation of statistical approach in QTL mapping .......................................... 25
Result for binary trait .................................................................................... 25
Result for ordinal trait ................................................................................... 33
Discussion.............................................................................................................. 44
CONCLUSION AND SUGGESTION...................................................................... 55
Conclusion ............................................................................................................. 55
Suggestion.............................................................................................................. 55
REFERENCES .......................................................................................................... 56
LIST OF TABLES
Page
1. Co-segregation pattern for backcross design in interval mapping...........................6
2. Percentile of empirical distribution of test statistic under null hypothesis
for ML approach and REG approach for binary trait .................................... 23
3. Critical value and percentage of the rejection of null hypothesis obtained
using LB and Piepho method for binary trait. ............................................... 23
4. Percentile of empirical distribution of test statistic under null hypothesis
for ML approach and REG approach for ordinal trait ................................... 25
5. Critical value and percentage of the rejection of null hypothesis obtained
using LB and Piepho method for ordinal trait. .............................................. 25
6. Comparison of the performance of ML and REG approach for various
marker densities (d) for binary trait ............................................................... 28
7. Comparison of the performance of ML and REG approach for various
shapes of phenotypic distribution for binary trait.......................................... 28
8. Comparison of the performance of ML and REG approach for various
sample sizes (n) for binary trait ..................................................................... 29
9. Comparison of the performance of ML and REG approach under various
levels of QTL effect for binary trait............................................................... 29
10. ML approach performance for 3 QTLs model with equal QTL effect for
binary trait...................................................................................................... 30
11. REG approach performance for 3 QTLs model with equal QTL effect
for binary trait ................................................................................................ 30
12. ML approach performance for 3 QTLs model with unequal QTL effect
for binary trait ................................................................................................ 31
13. REG approach performance for 3 QTLs model with unequal QTL effect
for binary trait ................................................................................................ 31
14. ML approach performance for 3 QTLs model using alternative
categorize method for binary trait.................................................................. 32
15. REG approach performance for 3 QTLs model using alternative
categorize method for binary trait.................................................................. 32
16. Comparison of the performance of ML and REG approach for various
marker densities (d) for ordinal trait .............................................................. 36
17. Comparison of the performance of ML and REG approach for various
shapes of phenotypic distribution for ordinal trait......................................... 37
18. Comparison of the performance of ML and REG approach for various
sample sizes (n) for ordinal trait .................................................................... 38
19. Comparison of the performance of ML and REG approach under various
levels of QTL effect for ordinal trait ............................................................. 39
20. Comparison of the performance of ML and REG approach for various
numbers of categories (C) for ordinal trait .................................................... 40
21. ML approach performance for 3 QTLs model with equal QTL effect for
ordinal trait..................................................................................................... 41
22. REG approach performance for 3 QTLs model with equal QTL effect
for ordinal trait ............................................................................................... 41
23. ML approach performance for 3 QTLs model with unequal QTL effect
for ordinal trait ............................................................................................... 42
24. REG approach performance for 3 QTLs model with unequal QTL effect
for ordinal trait ............................................................................................... 42
25. ML approach performance for 3 QTLs model using alternative
categorize method for ordinal trait................................................................. 43
26. REG approach performance for 3 QTLs model using alternative
categorize method for ordinal trait................................................................. 43
27. Comparison of penetrance estimation using ML and REG approach
under various level of marker distance for binary trait.................................. 48
28. Comparison of penetrance estimation using ML and REG approach
under various level of QTL effect for binary trait ......................................... 48
29. Comparison of penetrance estimation using ML and REG approach for
various shapes of phenotypic distribution for binary trait ............................. 49
30. Comparison of penetrance estimation using ML and REG approach
under various sample size for binary trait...................................................... 49
31. Comparison of penetrance estimation using ML and REG approach
under various level of marker density for ordinal trait .................................. 50
32. Comparison of penetrance estimation using ML and REG approach
under various level of QTL effect for ordinal trait ........................................ 51
33. Comparison of penetrance estimation using ML and REG approach
under various sample size for ordinal trait..................................................... 52
34. Comparison of penetrance estimation using ML and REG approach for
various shapes of phenotypic distribution for ordinal trait ............................ 53
35. Comparison of penetrance estimation using ML and REG approach
under various numbers of categories for ordinal trait.................................... 54
LIST OF FIGURES
Page
1. Conventionally defined backcross progeny for a QTL and two flanking
markers............................................................................................................. 5
2. Linkage relationship of a QTL and two flanking markers...................................... 5
3. Liability and threshold model for binary trait......................................................... 6
4. Liability and threshold model for ordinal trait...................................................... 11
5. Empirical distribution of test statistic under null hypothesis for ML
approach and REG approach for binary trait ................................................. 22
6. Empirical distribution of test statistic under null hypothesis for ML
approach and REG approach for ordinal trait ................................................ 24
INTRODUCTION
Background
Genes or loci on chromosome underlying a quantitative trait are called
Quantitative Trait Loci (QTL). Many such traits are both important economically as
well as biologically such as milk, meat or crop production. Hence, characterizing
genes controlling quantitative trait on their position in chromosome and their effect
on trait through a process called QTL mapping are needed. The QTL genotypes are
unobserved.
In addition, the environment also affects the trait making the
characterization of QTL become complex.
The idea in locating QTL is if there is association among trait and DNA
markers, then the QTL should located near the DNA markers. The statistical method
in utilizing this association can be traced back in 1923 when Sax evaluating the
association between seed weight as trait of interest and seed coat color as marker and
concluded that there is association among them. The method proposed by Sax
(1923) then called single marker analysis because it utilizes association between trait
and single marker at a time. There are various statistical methods employed in single
marker analysis such as T-Test, ANOVA, Linear regression, as well as Likelihood
Ratio Test where the hypothesis to be tested is the equality of trait mean among
marker genotypes. Single marker analysis is relatively easy to implement, however,
the position and effect of QTL are confounded.
To overcome the drawback of single marker analysis, Lander and Botstein
(1989) proposed a new method called interval mapping. Its name comes from the
idea of this method in locating QTL position. This method evaluating the existence
of QTL on certain interval in chromosome flanked by two adjacent markers rather
than near certain marker as in single marker. However, the number of QTL could
not be determined by this method and the position as well as the effect of QTL could
be affected by another QTL located in another interval. Hence, Jansen and Stam
(1994) and Zeng (1994) proposed composite interval mapping as extension of
interval mapping by incorporating another marker as cofactor.
All the method previously mentioned, assuming that the trait of interest is in
continuous scale.
On the other hand, many important traits are obtained in
categorical scale, such as resistance from certain disease. If the resistance from the
disease is obtained as suscept or resistance, then the trait is in binary scale, whether if
the resistance scored on ordered scale varying from unaffected to dead then the trait
is in ordinal scale. Another trait could also be obtained in nominal scale such as
shapes and colors of flowers, fruits, and seeds in plants, as well as coat colors. From
a theoretical point of view, QTL mapping method assuming continuous trait could
not be applied to categorical trait.
In dealing with binary trait, Xu and Atchley (1996) proposed likelihood based
method by assuming there is continuous distribution called liability underlying
binary trait by means of threshold model. Similar approach proposed by Hackett and
Weller (1995) in dealing with ordinal trait. On the other hand, Hayashi and Awata
(2006) proposed likelihood based approach in analyzing trait in nominal scale.
Problems
In dealing with QTL mapping for categorical trait, there are several issues
occur such as the design of the population, the statistical method employed, the
critical value in testing the existence of QTL, the estimate of the QTL effect and
location, and missing value in QTL data. Some issues are briefly discussed below.
The design of the population
There are two population types in QTL mapping which are designed and not
designed population. The population type determines the statistical method involved
in the QTL mapping.
Recently, the development of statistical method mainly
focuses on designed population. Hence, the development of statistical method on not
designed population is needed.
The statistical method employed
During the development of statistical method in QTL mapping, the likelihood
approach becomes the main approach in analyzing data. However, this approach is
computationally intensive. In simplifying the computation, in the case of continuous
trait, Haley and Knott (1992) proposed regression approach in interval mapping. The
idea in their approach is the component of independent variable representing the
QTL effect is replacing by their expected value conditional on the two markers
flanked the interval. However, the regression approach in the case of categorical
scales is not yet developed.
The critical value in testing the existence of QTL
In characterizing QTL, the analysis is performed by searching or scanning
and conducting test at every point on the genome (genome scan) simultaneously. In
dealing with simultaneous multiple test such as in genome scan, there are two types
of error regarded, which are: comparison-wise error rate and family-wise error rate.
Regarding the issue of determining critical value which controlled family-wise error
rate in genome scan, Lander and Botstein (1989) (denoted as LB method) and Piepho
(2001) (denoted as Piepho method) has proposed their approaches which fast in
computation but proposed in the context of QTL mapping for continuous trait.
However, their approaches were focused on the test statistic not on the trait data
itself. Hence, their approaches are potential to be used in categorical trait. For that
reason, the performance of these two approaches is interesting to be explored in
categorical trait.
The estimate of the QTL effect and location
Characterizing QTL is basically a process to obtain the QTL location and its
effect. All of the methods proposed in QTL mapping provide the point estimate of
these two parameters.
Beside the point estimate, the regression approach also
provides the interval estimation. On the other hand, the likelihood approach using
Expectation-Maximization (EM) algorithm could not provide the interval estimation
because this method did not provide the standard error of the statistic. Regarding this
issue, Kao and Zeng (1997) proposed general formula for obtaining standard error
for the statistic when using EM algorithm in likelihood approach. Still dealing with
interval estimation issue, Mangin, Goffinet and Rebai (1994) proposed likelihood
ratio test base confidence interval, whether Visscher, Thompson and Haley (1996)
and Bennewitz, Reinsch, and Kalm (2002) proposed resampling based confidence
interval in QTL mapping. However, all the method proposed in the context of
quantitative trait. Hence, the development and evaluation of interval estimation in
categorical trait is needed.
Missing value in QTL data
Missing data are common in QTL analysis. Common treatment in analysis
dealing with missing value is to delete the observation containing missing value.
However, ignoring such missing data may result in biased estimates of the QTL
effect. Dealing with missing value, Niu et al (2005) proposed EM based Likelihood
Ratio Test for handling missing data. This approach is implemented in quantitative
trait and the extension on categorical trait is needed.
Objectives
Regarding the issues mentioned above, this research will focus on the second
and third issue. The objectives of this research are:
1. Evaluate the performance of likelihood and regression approach in QTL
mapping for binary and ordinal trait.
2. Evaluate the performance of LB and Piepho method in constructing critical
value in testing the existence of QTL for binary and ordinal trait.
In addition, this research will focus on controlled population, especially backcross
population.
THEORY AND METHODS
Backcross population
In a classical backcross design, the population is generated by a heterozygous
F1 backcrossed to one of the homozygous parents (for example, a cross of AaQqBb x
AAQQBB)(see Figure 1).
The rationale behind the interval mapping can be
explained using co-segregation listed in the Table 1 (Liu, 1998).
Here, r is
recombination rate between marker A and B, whether r1 and r2 is recombination rate
between marker A and QTL Q and QTL Q and marker B, respectively (Figure 2).
Recombination rate is defined as ratio between recombinant progeny (progeny which
is not parental type, created because of crossover process among homologs during
meiosis process) and total progeny. In addition, it is assumed that there is no double
crossover between markers.
As mentioned above, the QTL genotypes are unobservable, but the
probability of QTL genotypes could be obtained using the information from flanking
markers genotypes as listed in Table 1.
AAQQBB
Parent 1
X
aaqqbb
Parent 2
AaQqBb X AAQQBB
F1
Parent 1
Expected
Frequency
AAQQBB,
AaQqBb
0.5 (1-r)
AAQqBb,
AaQQBB
0.5 r1
AAQQBb,
AaQqBB
0.5 r2
AAQqBB,
AaQQBb
0
Figure 1. Conventionally defined backcross progeny for a QTL and two flanking
markers.
r
A
(marker)
r2
r1
Q
(putative QTL)
B
(marker)
Figure 2. Linkage relationship of a QTL and two flanking markers
Table 1. Co-segregation pattern for backcross design in interval mapping
Marker
Genotype
Observed
Count
AABB
AABb
AaBB
AaBb
n1
n2
n3
n4
0.5(1-r)
0.5r
0.5r
0.5(1-r)
AABB
AABb
AaBB
AaBb
n1
n2
n3
n4
0.5(1-r)
0.5r
0.5r
0.5(1-r)
0.25
Mean
Frequency
QTL Genotype
QQ
Qq
Joint frequency
0.5(1-r)
0
0.5r2
0.5r1
0.5r1
0.5r2
0
0.5(1-r)
Conditional frequency
1
0
r2/r = 1-ρ
r1/r = ρ
r1/r = ρ
r2/r = 1-ρ
0
1
μ1
μ2
Expected Value
(gi)
μ1
(1-ρ)μ1 + ρμ2
ρμ1 + (1-ρ)μ2
μ2
0.5(μ1+μ2)
Trait in Binary Scale
Threshold model and liability
In dealing with binary trait, it is assumed that there is continuous distribution,
say U, underlying binary trait, say Y, referred to as liability (Xu and Atchley, 1996).
In relation between liability and binary trait (such as resistance to certain disease), it
is assumed that there is threshold (γ) in the scale of liability, below which the
individual has unaffected phenotype, and above which it is affected (see Figure 3).
Figure 3. Liability and threshold model for binary trait
The relation can be summarized by
⎧1; if u i ≥ γ
yi = ⎨
⎩0; if u i < γ
(1)
Maximum likelihood (ML) approach
Using liability model, the one-QTL ML mapping model for a backcross
population can be written as
ui = μ + bxi* + εi,
i = 1, 2, …, n
(2)
where ui is the liability value for individual i, μ is the mean, b is the effect of QTL Q,
xi* taking the value of 1 (0) for homozygote QQ (heterozygote Qq), denotes the
genotypes of Q, εi is environmental deviation and is assumed to follow N(0, σ2).
Since the liability is unobserved, the mean μ and variance of ε can be set at any
arbitrary value (for simplicity, it is determined that μ = 0 and σ2 = 1).
Based on the conditional probability of ui given xi*, the conditional
probability of yi given xi* is obtained by
∞
P(yi = 1|xi* ) = ∫ f(ui|x*i )d(ui|x*i )
γ
(
γ
) (
= 1 − ∫ f(u i|x*i )d(ui|x*i ) = 1 − Φ γ − bx*i = Φ bx*i − γ
−∞
)
(3)
where Φ(ξ) stands for the standardized cumulative normal distribution function and ξ
is the argument. Analysis involving Φ(ξ) is referred to as probit analysis. However,
the probit model is difficult to manipulate because numerical integration is required
although the parameters are easy to interpret. So, a logistic model is employed to
approximate Φ(ξ) for estimation purpose and is expressed by
ψ (ξ ) =
exp(ξ )
1 + exp(ξ )
(4)
The relationship between a probit model and a logistic model is Φ(ξ) ≈ ψ(dξ), where
d = π/√3. Therefore,
P(yi = 1|x*i
)≈
exp{d (bx*i − γ)}
1 + exp{d (bx*i − γ)}
(5)
Since the QTL genotype xi* could be homozygote (1) or heterozygote (0) for
an individual, the likelihood is then a mixture distribution with mixing proportions
equivalent to the conditional probabilities of QTL genotypes given two flanking
markers, qi1 and qi2 for the QTL genotypes QQ and Qq respectively (see Table 1).
For n individuals in the sample, the likelihood function is
n
2
L = ∏ [ ∑ qij pij yi (1 − pij )1− yi ]
i =1 j =1
where pi1 and pi2 denotes the conditional probability of yi = 1 given the QTL
genotypes xi* = 1 and xi* = 0, respectively. The log likelihood function is
n
2
i =1
j =1
l = ∑ log( ∑ qij pij yi (1 − pij )1− yi ) .
(6)
The first partial derivatives are
n
∂l
= ∑ ωi ( yi − pi1 )
∂b i =1
(7)
n
∂l
= ∑ [ωi ( yi − pi1 ) + (1 − ω i )( yi − pi 0 )]
∂γ i =1
(8)
and
where
ωi =
qi1 pi1 yi (1 − pi1 )1− yi
2
1− yi
.
(9)
∑ qij pij (1 − pij )
yi
j =1
is the posterior probability of xi* = 1.
By treating ωi as constants, the second partial derivatives are
∂ 2l
∂b 2
n
= − ∑ ω i pi1 (1 − pi1 )
(10)
i =1
n
∂ 2l
= − ∑ ωi pi1 (1 − pi1 )
∂b∂γ
i =1
(11)
and
∂ 2l
∂γ 2
n
= − ∑ [ω i pi1 (1 − pi1 ) + (1 − ω i ) pi 0 (1 − pi 0 )] .
(12)
i =1
In obtaining the parameter estimates, the EM algorithm could be applied.
The idea of EM algorithm is the likelihood solution of complete data is relatively
simple compared to incomplete data (Pawitan, 2001).
In QTL mapping, the
unobserved QTL genotype xi* treated as missing data. The EM steps are as follows:
1. Set up initial values of b and γ. Usually b is set to 0, whether γ is set to Σiyi/n
2. Calculate ωi (E-Step)
3. Given ωi, solve for b and γ using the Newton-Raphson iteration (M-Step) as
follow. Let g denote the vector of first partial derivatives and H be a matrix
of second partial derivatives. If θ(t) is a vector of solutions at the tth step, the
solutions at the (t+1) step is
θ(t+1) = θ(t) + H-1g
4. Update the initial values and go to step 2
5. Repeat steps 2-4 until convergence. Given two convergence criteria εk>0 and
εp>0, the iteration is considered to be converged if one of the following
criteria are satisfied:
a.
l (γ (t +1) , b (t +1) ) − l (γ (t ) , b (t ) ) < ε k
(
)
b. max γ (t +1) − γ (t ) , b (t +1) − b ( t ) < ε p
Regression (REG) approach
Using liability model, the one-QTL REG mapping model for a backcross
population can be written as
ui = μ + bπi + εi,
i = 1, 2, …, n
(13)
where ui, μ, b, and εi have the same definitions as in model (2), and πi is the
conditional expectation of QTL genotypes given the two flanking markers. The
likelihood function is
n
L = ∏ pi yi (1 − pi )1− yi
i =1
where pi denotes the conditional probability of yi = 1 given the πi. The log likelihood
function is
n
l = ∑ [ yi log pi + (1 − yi )(1 − pi )] .
i =1
(14)
The first partial derivatives are
n
∂l
= ∑ π i ( y i − pi )
∂b i =1
(15)
n
∂l
= ∑ ( y i − pi ) .
∂γ i =1
(16)
and
The second partial derivatives are
∂ 2l
∂b 2
n
= − ∑ π i2 pi (1 − pi )
(17)
i =1
n
∂ 2l
= − ∑ π i pi (1 − pi )
∂b∂γ
i =1
(18)
and
∂ 2l
∂γ 2
n
= − ∑ pi (1 − pi )
(19)
i =1
The procedure in obtaining parameter estimates are as follow. Let g denote
the vector of first partial derivatives and H be a matrix of second partial derivatives.
If θ(t) is a vector of solutions at the tth step, the solutions at the (t+1) step is
θ(t+1) = θ(t) + H-1g
The procedure in choosing initial value for γ and b is similar to the procedure in ML
approach as mentioned above.
Trait in Ordinal Scale
Threshold model and liability
Let U denote the liability underlying ordinal trait Y with c categories. A set
of fixed thresholds, γ1, γ2, …, γc-1, on the underlying scale defined the observed
categories on ordinal scale 1, 2, …, c. We thus have model
γj-1 < ui ≤ γj ⇔ Yi=j; γ0=-∞; γc=∞
here i=1, 2…, n denote the index for individual i. Figure 4 give the illustration of the
threshold model in ordinal trait.
Figure 4. Liability and threshold model for ordinal trait
ML approach
Let the c ordered categories have cumulative probabilities γ1(x*), γ2(x*), ..,
γc(x*), where γc(x*)=1. Then
φ j = P(Y ≤ j) = P(U ≤ γ j ) =
exp(γ j − bx * )
(20)
1 + exp(γ j − bx * )
We can rewrite this as a generalized linear model with logit link function
⎛ φj
⎜1− φ j
⎝
η ij = log⎜
⎞
⎟ = γ j − bxi*
⎟
⎠
(21)
Similar to binary trait, since the QTL genotype xi* could be homozygote (1)
or heterozygote (0) for an individual, the likelihood is then a mixture distribution
with mixing proportions equivalent to the conditional probabilities of QTL genotypes
given two flanking markers, qi1 and qi2 for the QTL genotypes QQ and Qq
respectively (see Table 1).
Let φj1 and φj2 denote the conditional cumulative
probability of yi ≤ j given the QTL genotypes xi* = 1 and xi* = 0, respectively. In
addition, let Rij = Σjzij where zij = 1 if yi = j and j = 1,2, …, c, and
⎛
φijk
⎜ φi ( j +1) k − φijk
⎝
ϕ ijk = log⎜
⎞
⎟
⎟
⎠
For n individuals in the sample, the likelihood function is
⎡2
c −1 ⎧
⎪⎛ φ jk
⎢
L = ∏ ∑ qik ∏ ⎨⎜
⎜
⎢
j =1⎪⎝ φ ( j +1) k
i =1 k =1
⎩
⎣⎢
n
⎞
⎟
⎟
⎠
Rij
⎛ φ ( j +1) k − φ jk
⎜
⎜ φ( j +1) k
⎝
⎞
⎟
⎟
⎠
Ri ( j +1) − Rij
⎫⎤
⎪⎥
⎬⎥
⎪⎭⎥
⎦
The log likelihood function is
⎡2
c −1 ⎧
⎪⎛ φ jk
l = ∑ log ⎢ ∑ qik ∏ ⎨⎜
⎢k =1 j =1 ⎜ φ( j +1) k
i =1
⎪⎩⎝
⎢⎣
⎞
⎟
⎟
⎠
n
Rij
⎛ φ( j +1) k − φ jk
⎜
⎜ φ( j +1) k
⎝
⎞
⎟
⎟
⎠
Ri ( j +1) − Rij
⎫⎤
⎪⎥
⎬⎥ .
⎪⎭⎥
⎦
(22)
The first partial derivatives are
n 2
c −1 ∂l
∂l
= ∑ ∑ ωik ∑ i U ijk Qijkt ,
∂θ t i =1 k =1
j =1 ∂ϕ ijk
(23)
where θt is the t-th parameter of the model (21), and t=1, 2, …, c. In addition,
φijk
∂li
= Rij − Ri ( j +1)
,
∂ϕ ijk
φi ( j +1) k
U ijk =
φi ( j +1) k
φijk (φi ( j +1) k − φijk )
(24)
,
(25)
and
Qijkt = Pijt
∂φijk
∂η ijk
− Pi ( j +1)t
φijk
∂φi ( j +1) k
φi ( j +1) k ∂η i ( j +1) k
,
(26)
in which
Pijt =
⎧⎪ δ jt ; if 1 ≤ t ≤ c − 1
=⎨ *
∂θ t ⎪⎩− xi[t −(c −1)] ; if t = c
∂η ij
δjt = 1 if j=t, 0 otherwise, and Pict=0. For i=1, 2, …, n; j=1, 2, …, c-1,
∂φijk
∂η ijk
= φijk (1 − φijk )
and ∂φict/∂ηict = 0. In addition,
c −1 ⎧
⎪⎛
ωik =
R
φ jk ⎞ j ⎛ φ( j +1) k − φ jk ⎞
⎟
qik ∏ ⎨⎜
⎜
⎟
j =1⎪⎝ φ ( j +1) k ⎠
⎩
⎜
⎜
⎝
c −1 ⎧
2
⎪⎛ φ jk
q
∑ ik ∏ ⎨⎜⎜
k =1
j =1⎪⎝ φ ( j +1) k
⎩
Rj
⎞
⎟
⎟
⎠
φ ( j +1) k
⎟
⎟
⎠
⎛ φ( j +1) k − φ jk
⎜
⎜ φ( j +1) k
⎝
R( j +1) − R j
⎞
⎟
⎟
⎠
⎫
⎪
⎬
⎪⎭
R( j +1) − R j
⎫
⎪
⎬
⎪⎭
(27)
is the posterior probability of xi*=1 and xi*=0 for k=1 and k=2, respectively. The
second partial derivatives are
n 2
c −1
∂ 2l
= − ∑ ∑ qik ∑ U ijk Qijkt Qijks
∂θ t ∂θ s
i =1 k =1
j =1
(28)
The EM steps to obtain parameter estimate are similar as mentioned in binary trait
above. In addition, the initial value for b is usually set to 0, whereas for thresholds,
the initial values are
⎛ Aj ⎞
⎟
⎟
1
A
−
j
⎠
⎝
γ (j0) = log⎜⎜
where
n
j
i
m =1
∑∑ z
Aj =
im
n
and j=1, 2, …, c-1. On the other hand, the two convergence criteria are as follow:
(a) l (γ 1( t +1) , γ 2( t +1) ,..., γ c(−t +11) b (t +1) ) − l (γ 1(t ) , γ 2(t ) ,..., γ c(−t )1 , b (t ) ) < ε k
(
)
(b) max γ 1(t +1) − γ 1(t ) , γ 2(t +1) − γ 2(t ) ,..., γ c(−t +11) − γ c(−t )1 , b (t +1) − b (t ) < ε p
REG approach
The generalized linear model with logit link function for one-QTL model is
⎛ φj
⎜1− φ j
⎝
η ij = log⎜
⎞
⎟ = γ j − bπ i
⎟
⎠
(29)
Let φj denote the conditional cumulative probability of yi ≤ j given the πi. In
addition, let Rij = Σjzij where zij = 1 if yi = j and j = 1,2, …, c. For n individuals in the
sample, the likelihood function is
n c −1 ⎧
⎪⎛
φj ⎞
⎟
L = ∏ ∏ ⎨⎜
⎟
⎜
i =1 j =1⎪⎝ φ ( j +1) ⎠
⎩
Rij
⎛ φ ( j +1) − φ j
⎜
⎜ φ( j +1)
⎝
⎞
⎟
⎟
⎠
Ri ( j +1) − Rij
⎫
⎪
⎬
⎪⎭
The log likelihood function is
n
c −1
i =1
j =1
{
}
l = ∑ log ∑ Rij ϕ ij − Ri ( j +1) g (ϕ ij )
where
⎛
φij
⎜ φi ( j +1) − φij
⎝
ϕ ij = log⎜
and
⎞
⎟
⎟
⎠
(30)
⎛ φi ( j +1)
g (ϕ ij ) = log(1 + exp(ϕ ij )) = log⎜
⎜ φi ( j +1) − φij
⎝
⎞
⎟
⎟
⎠
The first partial derivatives are
n c −1 ∂l
∂l
= ∑ ∑ i U ij Qijt ,
∂θ t i =1 j =1∂ϕ ij
(31)
where θt is the t-th parameter of the model (29), and t=1, 2, …, c. In addition,
φij
∂li
,
= Rij − Ri ( j +1)
φi ( j +1)
∂ϕ ij
U ij =
φi ( j +1)
φij (φi ( j +1) − φij )
(32)
,
(33)
and
Qijt = Pijt
∂φij
∂η ij
− Pi ( j +1)t
φij
∂φi ( j +1)
φi ( j +1) ∂η i ( j +1)
,
(34)
in which
Pijt =
⎧⎪ δ jt ; if 1 ≤ t ≤ c − 1
=⎨ *
∂θ t ⎪⎩− xi[t −(c −1)] ; if t = c
∂η ij
δjt = 1 if j=t, 0 otherwise, and Pict
IN BINARY AND ORDINAL SCALE
FARIT MOCHAMAD AFENDI
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2006
ABSTRACT
FARIT MOCHAMAD AFENDI. Quantitative Trait Loci Mapping for Trait in
Categorical Scale. Under the direction of Asep Saefuddin, Muhammad Jusuf, and
Totong Martono.
Genes or regions on chromosome underlying a quantitative trait are called
Quantitative Trait Loci (QTL). Characterizing genes controlling quantitative trait on
their position in chromosome and their effect on trait is through a process called QTL
mapping. In estimating the QTL position and its effect, QTL mapping basically
utilize the association between QTL and DNA markers. However, many important
traits are obtained in categorical scale, such as resistance from certain disease. From
a theoretical point of view, QTL mapping method assuming continuous trait could
not be applied to categorical trait.
This research was focusing on the assessment of the performance of
Maximum Likelihood (ML) and Regression (REG) approach employed in QTL
mapping as well as the performance of Lander and Botstein (LB) and Piepho method
in determining critical value in testing the existence of QTL for binary and ordinal
trait by means of simulation study. The simulation study to evaluate the performance
of ML and REG approach was conducted by taking into account several factors that
may affecting the performance of both approaches. The factors are: (1) marker
density; (2) QTL effect; (3) sample size; (4) shape of phenotypic distribution; (5)
number of categories; and (6) number of QTL. Moreover, the simulation study for
evaluating LB and Piepho method in determining critical value was conducted by
generating distribution of the test statistic under null hypothesis.
From simulation study, it was obtained that LB and Piepho method showing
similar performance in determining critical value in testing the existence of QTL for
binary and ordinal trait. The simulation study also indicating that both methods
could be used in determining critical value in QTL mapping analysis for binary trait
as well as for ordinal trait if the REG approach is used but not if ML approach is
used due to their poor performance. In assessing the performance of ML and REG
approach in QTL mapping analysis for binary trait, the two approaches showing
comparable performance; whereas for ordinal trait REG approach showing poor
performance compared with ML approach in estimating thresholds. As a result, in
QTL mapping analysis,
ML and REG approach could be used when dealing with
binary trait, whereas ML approach is suggested when dealing with ordinal trait.
Keyword: QTL mapping, binary, ordinal, maximum likelihood, regression, critical
value
ABSTRAK
FARIT MOCHAMAD AFENDI. Quantitative Trait Loci Mapping for Trait in
Categorical Scale.
Dibimbing oleh Asep Saefuddin, Muhammad Jusuf, dan
Totong Martono.
Gen atau suatu segmen di kromosom yang mendasari sifat kuantitatif
dinamakan dengan Lokus Sifat Kuantitatif (Quantitative Trait Loci/QTL).
Penelusuran gen yang mengatur sifat kuantitatif dalam hal posisinya di kromosom
serta besar pengaruhnya dilakukan melalui proses yang dinamakan pemetaan QTL.
Di dalam menduga posisi QTL dan besar pengaruhnya, pemetaan QTL pada
dasarnya memanfaatkan hubungan antara QTL dengan penanda DNA. Di sisi lain,
banyak sifat penting lain yang diamati dengan skala kategorik seperti ketahanan
terhadap suatu penyakit. Secara teori, metode pemetaan QTL dengan anggapan sifat
kontinu tidak dapat diterapkan pada sifat kategorik.
Penelitian ini bertujuan untuk menilai performa metode kemungkinan
maksimum (ML) dan regresi (REG) yang diterapkan pada pemetaan QTL dan
performa metode Lander dan Botstein (LB) dan Piepho di dalam penentuan titik
kritis untuk pengujian keberadaan QTL untuk sifat dengan skala biner dan ordinal
dengan menggunakan simulasi.
Kajian simulasi untuk mengevaluasi performa
metode ML dan REG dilakukan dengan memperhatikan beberapa faktor yang
mungkin mempengaruhi performa kedua sifat ini. Faktor-faktor tersebut adalah: (1)
kepadatan penanda; (2) besar pengaruh QTL; (3) ukuran contoh; (4) bentuk sebaran
fenotipe; (5) banyaknya kategori; dan (6) banyaknya QTL.
Selanjutnya, kajian
simulasi untuk menilai performa metode LB dan Piepho di dalam penentuan titik
kritis dilakukan dengan membangkitkan sebaran statistik uji di bawah hipotesis nol.
Dari kajian simulasi diperoleh hasil bahwa metode LB dan Piepho
menunjukkan performa yang serupa untuk sifat biner dan ordinal. Kajian simulasi
juga menunjukkan bahwa kedua metode dapat diterapkan untuk sifat biner maupun
untuk sifat ordinal bila metode REG yang digunakan namun tidak bila metode ML
yang digunakan. Di dalam evaluasi performa metode ML dan REG untuk sifat biner,
kedua metode menunjukkan performa yang serupa; sedangkan untuk sifat ordinal
metode REG menunjukkan performa yang kurang dibandingkan metode ML
terutama di dalam menduga titik ambang (threshold). Dengan demikian, metode ML
dan REG dapat digunakan untuk sifat biner, sedangkan untuk sifat ordinal metode
ML yang disarankan.
Kata kunci: Pemetaan QTL, biner, ordinal, kemungkinan maksimum, regresi, titik
kritis
LETTER OF PRONOUNCEMENT
With this I stated that my thesis which is entitled:
Quantitative Trait Loci Mapping for Trait in Binary and Ordinal Scale
is based on my own research and never published before. All of the data source and
information has been stated clearly and could be reviewed.
Bogor, August 2006
Farit Mochamad Afendi
NRP. G151030091
QUANTITATIVE TRAIT LOCI MAPPING FOR TRAIT
IN BINARY AND ORDINAL SCALE
FARIT MOCHAMAD AFENDI
A Thesis submitted to the Graduate School of
Bogor Agricultural University
in partial fulfillment of the
requirements for the degree of
Master of Science
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2006
Copyright © 2006 by Bogor Agricultural University
All rights reserved
No part of this thesis may be reproduced, stored in a retrieval system, or transcribed,
in any form or by any means – electronic, mechanical, photocopying, recording, or
otherwise – without the prior written permission of Bogor Agricultural University
Title
:
Quantitative Trait Loci Mapping for Trait in Binary and
Ordinal Scale
Name
:
Farit Mochamad Afendi
NRP
:
G151030091
Study Program
:
Statistics
Approved by,
1. Advisory Committee
Dr. Ir. Asep Saefuddin
Chair
Dr. Ir. Muhammad Jusuf
Co-chair
Dr. Ir. Totong Martono
Co-chair
Acknowledged by,
2. Chair of Study Program of Statistics
Dr. Ir. Aji Hamim Wigena
Passed examination date: 21 July 2006
3. Dean of Graduate School
Dr. Ir. Khairil Anwar Notodiputro, M.S.
Graduation date:
To m y parents and m y lovely w ife
QUANTITATIVE TRAIT LOCI MAPPING FOR TRAIT
IN BINARY AND ORDINAL SCALE
FARIT MOCHAMAD AFENDI
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2006
ABSTRACT
FARIT MOCHAMAD AFENDI. Quantitative Trait Loci Mapping for Trait in
Categorical Scale. Under the direction of Asep Saefuddin, Muhammad Jusuf, and
Totong Martono.
Genes or regions on chromosome underlying a quantitative trait are called
Quantitative Trait Loci (QTL). Characterizing genes controlling quantitative trait on
their position in chromosome and their effect on trait is through a process called QTL
mapping. In estimating the QTL position and its effect, QTL mapping basically
utilize the association between QTL and DNA markers. However, many important
traits are obtained in categorical scale, such as resistance from certain disease. From
a theoretical point of view, QTL mapping method assuming continuous trait could
not be applied to categorical trait.
This research was focusing on the assessment of the performance of
Maximum Likelihood (ML) and Regression (REG) approach employed in QTL
mapping as well as the performance of Lander and Botstein (LB) and Piepho method
in determining critical value in testing the existence of QTL for binary and ordinal
trait by means of simulation study. The simulation study to evaluate the performance
of ML and REG approach was conducted by taking into account several factors that
may affecting the performance of both approaches. The factors are: (1) marker
density; (2) QTL effect; (3) sample size; (4) shape of phenotypic distribution; (5)
number of categories; and (6) number of QTL. Moreover, the simulation study for
evaluating LB and Piepho method in determining critical value was conducted by
generating distribution of the test statistic under null hypothesis.
From simulation study, it was obtained that LB and Piepho method showing
similar performance in determining critical value in testing the existence of QTL for
binary and ordinal trait. The simulation study also indicating that both methods
could be used in determining critical value in QTL mapping analysis for binary trait
as well as for ordinal trait if the REG approach is used but not if ML approach is
used due to their poor performance. In assessing the performance of ML and REG
approach in QTL mapping analysis for binary trait, the two approaches showing
comparable performance; whereas for ordinal trait REG approach showing poor
performance compared with ML approach in estimating thresholds. As a result, in
QTL mapping analysis,
ML and REG approach could be used when dealing with
binary trait, whereas ML approach is suggested when dealing with ordinal trait.
Keyword: QTL mapping, binary, ordinal, maximum likelihood, regression, critical
value
ABSTRAK
FARIT MOCHAMAD AFENDI. Quantitative Trait Loci Mapping for Trait in
Categorical Scale.
Dibimbing oleh Asep Saefuddin, Muhammad Jusuf, dan
Totong Martono.
Gen atau suatu segmen di kromosom yang mendasari sifat kuantitatif
dinamakan dengan Lokus Sifat Kuantitatif (Quantitative Trait Loci/QTL).
Penelusuran gen yang mengatur sifat kuantitatif dalam hal posisinya di kromosom
serta besar pengaruhnya dilakukan melalui proses yang dinamakan pemetaan QTL.
Di dalam menduga posisi QTL dan besar pengaruhnya, pemetaan QTL pada
dasarnya memanfaatkan hubungan antara QTL dengan penanda DNA. Di sisi lain,
banyak sifat penting lain yang diamati dengan skala kategorik seperti ketahanan
terhadap suatu penyakit. Secara teori, metode pemetaan QTL dengan anggapan sifat
kontinu tidak dapat diterapkan pada sifat kategorik.
Penelitian ini bertujuan untuk menilai performa metode kemungkinan
maksimum (ML) dan regresi (REG) yang diterapkan pada pemetaan QTL dan
performa metode Lander dan Botstein (LB) dan Piepho di dalam penentuan titik
kritis untuk pengujian keberadaan QTL untuk sifat dengan skala biner dan ordinal
dengan menggunakan simulasi.
Kajian simulasi untuk mengevaluasi performa
metode ML dan REG dilakukan dengan memperhatikan beberapa faktor yang
mungkin mempengaruhi performa kedua sifat ini. Faktor-faktor tersebut adalah: (1)
kepadatan penanda; (2) besar pengaruh QTL; (3) ukuran contoh; (4) bentuk sebaran
fenotipe; (5) banyaknya kategori; dan (6) banyaknya QTL.
Selanjutnya, kajian
simulasi untuk menilai performa metode LB dan Piepho di dalam penentuan titik
kritis dilakukan dengan membangkitkan sebaran statistik uji di bawah hipotesis nol.
Dari kajian simulasi diperoleh hasil bahwa metode LB dan Piepho
menunjukkan performa yang serupa untuk sifat biner dan ordinal. Kajian simulasi
juga menunjukkan bahwa kedua metode dapat diterapkan untuk sifat biner maupun
untuk sifat ordinal bila metode REG yang digunakan namun tidak bila metode ML
yang digunakan. Di dalam evaluasi performa metode ML dan REG untuk sifat biner,
kedua metode menunjukkan performa yang serupa; sedangkan untuk sifat ordinal
metode REG menunjukkan performa yang kurang dibandingkan metode ML
terutama di dalam menduga titik ambang (threshold). Dengan demikian, metode ML
dan REG dapat digunakan untuk sifat biner, sedangkan untuk sifat ordinal metode
ML yang disarankan.
Kata kunci: Pemetaan QTL, biner, ordinal, kemungkinan maksimum, regresi, titik
kritis
LETTER OF PRONOUNCEMENT
With this I stated that my thesis which is entitled:
Quantitative Trait Loci Mapping for Trait in Binary and Ordinal Scale
is based on my own research and never published before. All of the data source and
information has been stated clearly and could be reviewed.
Bogor, August 2006
Farit Mochamad Afendi
NRP. G151030091
QUANTITATIVE TRAIT LOCI MAPPING FOR TRAIT
IN BINARY AND ORDINAL SCALE
FARIT MOCHAMAD AFENDI
A Thesis submitted to the Graduate School of
Bogor Agricultural University
in partial fulfillment of the
requirements for the degree of
Master of Science
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2006
Copyright © 2006 by Bogor Agricultural University
All rights reserved
No part of this thesis may be reproduced, stored in a retrieval system, or transcribed,
in any form or by any means – electronic, mechanical, photocopying, recording, or
otherwise – without the prior written permission of Bogor Agricultural University
Title
:
Quantitative Trait Loci Mapping for Trait in Binary and
Ordinal Scale
Name
:
Farit Mochamad Afendi
NRP
:
G151030091
Study Program
:
Statistics
Approved by,
1. Advisory Committee
Dr. Ir. Asep Saefuddin
Chair
Dr. Ir. Muhammad Jusuf
Co-chair
Dr. Ir. Totong Martono
Co-chair
Acknowledged by,
2. Chair of Study Program of Statistics
Dr. Ir. Aji Hamim Wigena
Passed examination date: 21 July 2006
3. Dean of Graduate School
Dr. Ir. Khairil Anwar Notodiputro, M.S.
Graduation date:
To m y parents and m y lovely w ife
BIOGRAPHY
As the last children from sixth brother of my family, I was born to my father
H. Abdus Syakur Suyitno (alm) and my mother Hj. Siti Aisyah in 1979 in Jepara, a
city which well known on its wood carving and birthplace of Kartini.
I started my formal education in 1986. In July of that year, I entered to
elementary school at SD Negeri Panggang IV. After six years in the elementary
school and another three years in the SMP N 2 Jepara, I began to study in SMU N 1
Jepara in 1994.
It was that time when I was introduced with the concept of
Mendelian in genetics. However, after graduated from high school 3 years later, I
was study the wonderful statistics in Bogor Agricultural University. After four years
study in undergraduate, in 2002 I was given an honor to become lecturer in my
department. It is a nice place to work for with many nice peoples to work with. In
2003, I entered to graduate school still in statistics program. This time I was taking
genetics as my minor.
ACKNOWLEDGEMENTS
Alhamdulillah.
That is my first word as my thanks to Allah, God The
Almighty. Without Him, I will unable to do anything.
I would like to thank my first advisor Dr. Asep Saefuddin who encourages
me to brave writing paper in English since my undergraduate. I thank my second
advisor Dr. Muhammad Jusuf for give me the first sight in application of statistics in
genetics and for keep me enrich the biology concept in my research. I also thank to
my third advisor Dr. Totong Martono who inspired me to keep looking in my paper,
searching any inconsistency in the mathematical background.
During my introduction on application of statistics in genetics, I also discuss
with many experts. Ahmed Rebai in Centre of Biotechnology of Sfax Tunisia, thank
for your warm and helpful discussion. Thanks also for providing me your article
which help me enrich my research.
Jazakallah khairan katsira.
I thank Steve
Kachman in Department of Statistics, University of Nebraska Lincoln for his helpful
comment during the writing of the computer program. I also thank to Shizong Xu in
Department of Botany and Plant Sciences, University of California, at Riverside and
Lauren McIntyre in Department of Agronomy, Purdue University for helping me
contacting their colleagues during the finding of real data set of QTL mapping. I
also thank to the members of Quantitative Genetics as well as Animal Genome
mailing list who responded my email.
I thank my beloved wife Euis Siti Aisyah, for her love and for her
understanding and support to me. I would not have be able to go this far without her.
I thank my parents H. Abdus Syakur Suyitno (alm) and Hj. Siti Aisyah, for bringing
me to this world, for teaching me to be a good man, and for supporting me through
my education. I also thank my parents-in-law Ase Ruhiyat and Siti Hafsah, for
supporting us through their advice and pray.
TABLE OF CONTENTS
Page
LIST OF TABLES………………………………………………………………… vii
LIST OF FIGURES……………………………………………………………….. ix
INTRODUCTION ....................................................................................................... 1
Background.............................................................................................................. 1
Problems .................................................................................................................. 2
The design of the population................................................................................ 2
The statistical method employed.......................................................................... 2
The critical value in testing the existence of QTL ............................................... 3
The estimate of the QTL effect and location ........................................................ 3
Missing value in QTL data................................................................................... 4
Objectives ................................................................................................................ 4
THEORY AND METHODS ....................................................................................... 5
Backcross population ............................................................................................... 5
Trait in Binary Scale ................................................................................................ 6
Threshold model and liability .............................................................................. 6
Maximum likelihood (ML) approach................................................................... 7
Regression (REG) approach ................................................................................ 9
Trait in Ordinal Scale............................................................................................. 10
Threshold model and liability ............................................................................ 10
ML approach...................................................................................................... 11
REG approach ................................................................................................... 13
Critical Value......................................................................................................... 15
Heritability ............................................................................................................. 16
MATERIAL AND METHOD ................................................................................... 17
Design of simulation experiments in evaluation of the performance of ML and
REG approach........................................................................................................ 17
Design of simulation experiments for binary trait ............................................. 18
Evaluation of QTL effect................................................................................ 18
Evaluation of the shape of the phenotypic distribution ................................. 18
Evaluation of sample size .............................................................................. 19
Evaluation of marker density......................................................................... 19
Evaluation of number of QTL ........................................................................ 19
Design of simulation experiments for ordinal trait............................................ 20
Evaluation of QTL effect................................................................................ 20
Evaluation of the shape of the phenotypic distribution ................................. 20
Evaluation of sample size .............................................................................. 20
Evaluation of marker density......................................................................... 20
Evaluation of number of categories............................................................... 21
Evaluation of number of QTL ........................................................................ 21
Design of simulation experiments for evaluating critical value ............................ 21
RESULT AND DISCUSSION .................................................................................. 22
Result ..................................................................................................................... 22
Evaluation of methods in determining critical value ......................................... 22
Result for binary trait .................................................................................... 22
Result for ordinal trait ................................................................................... 24
Evaluation of statistical approach in QTL mapping .......................................... 25
Result for binary trait .................................................................................... 25
Result for ordinal trait ................................................................................... 33
Discussion.............................................................................................................. 44
CONCLUSION AND SUGGESTION...................................................................... 55
Conclusion ............................................................................................................. 55
Suggestion.............................................................................................................. 55
REFERENCES .......................................................................................................... 56
LIST OF TABLES
Page
1. Co-segregation pattern for backcross design in interval mapping...........................6
2. Percentile of empirical distribution of test statistic under null hypothesis
for ML approach and REG approach for binary trait .................................... 23
3. Critical value and percentage of the rejection of null hypothesis obtained
using LB and Piepho method for binary trait. ............................................... 23
4. Percentile of empirical distribution of test statistic under null hypothesis
for ML approach and REG approach for ordinal trait ................................... 25
5. Critical value and percentage of the rejection of null hypothesis obtained
using LB and Piepho method for ordinal trait. .............................................. 25
6. Comparison of the performance of ML and REG approach for various
marker densities (d) for binary trait ............................................................... 28
7. Comparison of the performance of ML and REG approach for various
shapes of phenotypic distribution for binary trait.......................................... 28
8. Comparison of the performance of ML and REG approach for various
sample sizes (n) for binary trait ..................................................................... 29
9. Comparison of the performance of ML and REG approach under various
levels of QTL effect for binary trait............................................................... 29
10. ML approach performance for 3 QTLs model with equal QTL effect for
binary trait...................................................................................................... 30
11. REG approach performance for 3 QTLs model with equal QTL effect
for binary trait ................................................................................................ 30
12. ML approach performance for 3 QTLs model with unequal QTL effect
for binary trait ................................................................................................ 31
13. REG approach performance for 3 QTLs model with unequal QTL effect
for binary trait ................................................................................................ 31
14. ML approach performance for 3 QTLs model using alternative
categorize method for binary trait.................................................................. 32
15. REG approach performance for 3 QTLs model using alternative
categorize method for binary trait.................................................................. 32
16. Comparison of the performance of ML and REG approach for various
marker densities (d) for ordinal trait .............................................................. 36
17. Comparison of the performance of ML and REG approach for various
shapes of phenotypic distribution for ordinal trait......................................... 37
18. Comparison of the performance of ML and REG approach for various
sample sizes (n) for ordinal trait .................................................................... 38
19. Comparison of the performance of ML and REG approach under various
levels of QTL effect for ordinal trait ............................................................. 39
20. Comparison of the performance of ML and REG approach for various
numbers of categories (C) for ordinal trait .................................................... 40
21. ML approach performance for 3 QTLs model with equal QTL effect for
ordinal trait..................................................................................................... 41
22. REG approach performance for 3 QTLs model with equal QTL effect
for ordinal trait ............................................................................................... 41
23. ML approach performance for 3 QTLs model with unequal QTL effect
for ordinal trait ............................................................................................... 42
24. REG approach performance for 3 QTLs model with unequal QTL effect
for ordinal trait ............................................................................................... 42
25. ML approach performance for 3 QTLs model using alternative
categorize method for ordinal trait................................................................. 43
26. REG approach performance for 3 QTLs model using alternative
categorize method for ordinal trait................................................................. 43
27. Comparison of penetrance estimation using ML and REG approach
under various level of marker distance for binary trait.................................. 48
28. Comparison of penetrance estimation using ML and REG approach
under various level of QTL effect for binary trait ......................................... 48
29. Comparison of penetrance estimation using ML and REG approach for
various shapes of phenotypic distribution for binary trait ............................. 49
30. Comparison of penetrance estimation using ML and REG approach
under various sample size for binary trait...................................................... 49
31. Comparison of penetrance estimation using ML and REG approach
under various level of marker density for ordinal trait .................................. 50
32. Comparison of penetrance estimation using ML and REG approach
under various level of QTL effect for ordinal trait ........................................ 51
33. Comparison of penetrance estimation using ML and REG approach
under various sample size for ordinal trait..................................................... 52
34. Comparison of penetrance estimation using ML and REG approach for
various shapes of phenotypic distribution for ordinal trait ............................ 53
35. Comparison of penetrance estimation using ML and REG approach
under various numbers of categories for ordinal trait.................................... 54
LIST OF FIGURES
Page
1. Conventionally defined backcross progeny for a QTL and two flanking
markers............................................................................................................. 5
2. Linkage relationship of a QTL and two flanking markers...................................... 5
3. Liability and threshold model for binary trait......................................................... 6
4. Liability and threshold model for ordinal trait...................................................... 11
5. Empirical distribution of test statistic under null hypothesis for ML
approach and REG approach for binary trait ................................................. 22
6. Empirical distribution of test statistic under null hypothesis for ML
approach and REG approach for ordinal trait ................................................ 24
INTRODUCTION
Background
Genes or loci on chromosome underlying a quantitative trait are called
Quantitative Trait Loci (QTL). Many such traits are both important economically as
well as biologically such as milk, meat or crop production. Hence, characterizing
genes controlling quantitative trait on their position in chromosome and their effect
on trait through a process called QTL mapping are needed. The QTL genotypes are
unobserved.
In addition, the environment also affects the trait making the
characterization of QTL become complex.
The idea in locating QTL is if there is association among trait and DNA
markers, then the QTL should located near the DNA markers. The statistical method
in utilizing this association can be traced back in 1923 when Sax evaluating the
association between seed weight as trait of interest and seed coat color as marker and
concluded that there is association among them. The method proposed by Sax
(1923) then called single marker analysis because it utilizes association between trait
and single marker at a time. There are various statistical methods employed in single
marker analysis such as T-Test, ANOVA, Linear regression, as well as Likelihood
Ratio Test where the hypothesis to be tested is the equality of trait mean among
marker genotypes. Single marker analysis is relatively easy to implement, however,
the position and effect of QTL are confounded.
To overcome the drawback of single marker analysis, Lander and Botstein
(1989) proposed a new method called interval mapping. Its name comes from the
idea of this method in locating QTL position. This method evaluating the existence
of QTL on certain interval in chromosome flanked by two adjacent markers rather
than near certain marker as in single marker. However, the number of QTL could
not be determined by this method and the position as well as the effect of QTL could
be affected by another QTL located in another interval. Hence, Jansen and Stam
(1994) and Zeng (1994) proposed composite interval mapping as extension of
interval mapping by incorporating another marker as cofactor.
All the method previously mentioned, assuming that the trait of interest is in
continuous scale.
On the other hand, many important traits are obtained in
categorical scale, such as resistance from certain disease. If the resistance from the
disease is obtained as suscept or resistance, then the trait is in binary scale, whether if
the resistance scored on ordered scale varying from unaffected to dead then the trait
is in ordinal scale. Another trait could also be obtained in nominal scale such as
shapes and colors of flowers, fruits, and seeds in plants, as well as coat colors. From
a theoretical point of view, QTL mapping method assuming continuous trait could
not be applied to categorical trait.
In dealing with binary trait, Xu and Atchley (1996) proposed likelihood based
method by assuming there is continuous distribution called liability underlying
binary trait by means of threshold model. Similar approach proposed by Hackett and
Weller (1995) in dealing with ordinal trait. On the other hand, Hayashi and Awata
(2006) proposed likelihood based approach in analyzing trait in nominal scale.
Problems
In dealing with QTL mapping for categorical trait, there are several issues
occur such as the design of the population, the statistical method employed, the
critical value in testing the existence of QTL, the estimate of the QTL effect and
location, and missing value in QTL data. Some issues are briefly discussed below.
The design of the population
There are two population types in QTL mapping which are designed and not
designed population. The population type determines the statistical method involved
in the QTL mapping.
Recently, the development of statistical method mainly
focuses on designed population. Hence, the development of statistical method on not
designed population is needed.
The statistical method employed
During the development of statistical method in QTL mapping, the likelihood
approach becomes the main approach in analyzing data. However, this approach is
computationally intensive. In simplifying the computation, in the case of continuous
trait, Haley and Knott (1992) proposed regression approach in interval mapping. The
idea in their approach is the component of independent variable representing the
QTL effect is replacing by their expected value conditional on the two markers
flanked the interval. However, the regression approach in the case of categorical
scales is not yet developed.
The critical value in testing the existence of QTL
In characterizing QTL, the analysis is performed by searching or scanning
and conducting test at every point on the genome (genome scan) simultaneously. In
dealing with simultaneous multiple test such as in genome scan, there are two types
of error regarded, which are: comparison-wise error rate and family-wise error rate.
Regarding the issue of determining critical value which controlled family-wise error
rate in genome scan, Lander and Botstein (1989) (denoted as LB method) and Piepho
(2001) (denoted as Piepho method) has proposed their approaches which fast in
computation but proposed in the context of QTL mapping for continuous trait.
However, their approaches were focused on the test statistic not on the trait data
itself. Hence, their approaches are potential to be used in categorical trait. For that
reason, the performance of these two approaches is interesting to be explored in
categorical trait.
The estimate of the QTL effect and location
Characterizing QTL is basically a process to obtain the QTL location and its
effect. All of the methods proposed in QTL mapping provide the point estimate of
these two parameters.
Beside the point estimate, the regression approach also
provides the interval estimation. On the other hand, the likelihood approach using
Expectation-Maximization (EM) algorithm could not provide the interval estimation
because this method did not provide the standard error of the statistic. Regarding this
issue, Kao and Zeng (1997) proposed general formula for obtaining standard error
for the statistic when using EM algorithm in likelihood approach. Still dealing with
interval estimation issue, Mangin, Goffinet and Rebai (1994) proposed likelihood
ratio test base confidence interval, whether Visscher, Thompson and Haley (1996)
and Bennewitz, Reinsch, and Kalm (2002) proposed resampling based confidence
interval in QTL mapping. However, all the method proposed in the context of
quantitative trait. Hence, the development and evaluation of interval estimation in
categorical trait is needed.
Missing value in QTL data
Missing data are common in QTL analysis. Common treatment in analysis
dealing with missing value is to delete the observation containing missing value.
However, ignoring such missing data may result in biased estimates of the QTL
effect. Dealing with missing value, Niu et al (2005) proposed EM based Likelihood
Ratio Test for handling missing data. This approach is implemented in quantitative
trait and the extension on categorical trait is needed.
Objectives
Regarding the issues mentioned above, this research will focus on the second
and third issue. The objectives of this research are:
1. Evaluate the performance of likelihood and regression approach in QTL
mapping for binary and ordinal trait.
2. Evaluate the performance of LB and Piepho method in constructing critical
value in testing the existence of QTL for binary and ordinal trait.
In addition, this research will focus on controlled population, especially backcross
population.
THEORY AND METHODS
Backcross population
In a classical backcross design, the population is generated by a heterozygous
F1 backcrossed to one of the homozygous parents (for example, a cross of AaQqBb x
AAQQBB)(see Figure 1).
The rationale behind the interval mapping can be
explained using co-segregation listed in the Table 1 (Liu, 1998).
Here, r is
recombination rate between marker A and B, whether r1 and r2 is recombination rate
between marker A and QTL Q and QTL Q and marker B, respectively (Figure 2).
Recombination rate is defined as ratio between recombinant progeny (progeny which
is not parental type, created because of crossover process among homologs during
meiosis process) and total progeny. In addition, it is assumed that there is no double
crossover between markers.
As mentioned above, the QTL genotypes are unobservable, but the
probability of QTL genotypes could be obtained using the information from flanking
markers genotypes as listed in Table 1.
AAQQBB
Parent 1
X
aaqqbb
Parent 2
AaQqBb X AAQQBB
F1
Parent 1
Expected
Frequency
AAQQBB,
AaQqBb
0.5 (1-r)
AAQqBb,
AaQQBB
0.5 r1
AAQQBb,
AaQqBB
0.5 r2
AAQqBB,
AaQQBb
0
Figure 1. Conventionally defined backcross progeny for a QTL and two flanking
markers.
r
A
(marker)
r2
r1
Q
(putative QTL)
B
(marker)
Figure 2. Linkage relationship of a QTL and two flanking markers
Table 1. Co-segregation pattern for backcross design in interval mapping
Marker
Genotype
Observed
Count
AABB
AABb
AaBB
AaBb
n1
n2
n3
n4
0.5(1-r)
0.5r
0.5r
0.5(1-r)
AABB
AABb
AaBB
AaBb
n1
n2
n3
n4
0.5(1-r)
0.5r
0.5r
0.5(1-r)
0.25
Mean
Frequency
QTL Genotype
Joint frequency
0.5(1-r)
0
0.5r2
0.5r1
0.5r1
0.5r2
0
0.5(1-r)
Conditional frequency
1
0
r2/r = 1-ρ
r1/r = ρ
r1/r = ρ
r2/r = 1-ρ
0
1
μ1
μ2
Expected Value
(gi)
μ1
(1-ρ)μ1 + ρμ2
ρμ1 + (1-ρ)μ2
μ2
0.5(μ1+μ2)
Trait in Binary Scale
Threshold model and liability
In dealing with binary trait, it is assumed that there is continuous distribution,
say U, underlying binary trait, say Y, referred to as liability (Xu and Atchley, 1996).
In relation between liability and binary trait (such as resistance to certain disease), it
is assumed that there is threshold (γ) in the scale of liability, below which the
individual has unaffected phenotype, and above which it is affected (see Figure 3).
Figure 3. Liability and threshold model for binary trait
The relation can be summarized by
⎧1; if u i ≥ γ
yi = ⎨
⎩0; if u i < γ
(1)
Maximum likelihood (ML) approach
Using liability model, the one-QTL ML mapping model for a backcross
population can be written as
ui = μ + bxi* + εi,
i = 1, 2, …, n
(2)
where ui is the liability value for individual i, μ is the mean, b is the effect of QTL Q,
xi* taking the value of 1 (0) for homozygote QQ (heterozygote Qq), denotes the
genotypes of Q, εi is environmental deviation and is assumed to follow N(0, σ2).
Since the liability is unobserved, the mean μ and variance of ε can be set at any
arbitrary value (for simplicity, it is determined that μ = 0 and σ2 = 1).
Based on the conditional probability of ui given xi*, the conditional
probability of yi given xi* is obtained by
∞
P(yi = 1|xi* ) = ∫ f(ui|x*i )d(ui|x*i )
γ
(
γ
) (
= 1 − ∫ f(u i|x*i )d(ui|x*i ) = 1 − Φ γ − bx*i = Φ bx*i − γ
−∞
)
(3)
where Φ(ξ) stands for the standardized cumulative normal distribution function and ξ
is the argument. Analysis involving Φ(ξ) is referred to as probit analysis. However,
the probit model is difficult to manipulate because numerical integration is required
although the parameters are easy to interpret. So, a logistic model is employed to
approximate Φ(ξ) for estimation purpose and is expressed by
ψ (ξ ) =
exp(ξ )
1 + exp(ξ )
(4)
The relationship between a probit model and a logistic model is Φ(ξ) ≈ ψ(dξ), where
d = π/√3. Therefore,
P(yi = 1|x*i
)≈
exp{d (bx*i − γ)}
1 + exp{d (bx*i − γ)}
(5)
Since the QTL genotype xi* could be homozygote (1) or heterozygote (0) for
an individual, the likelihood is then a mixture distribution with mixing proportions
equivalent to the conditional probabilities of QTL genotypes given two flanking
markers, qi1 and qi2 for the QTL genotypes QQ and Qq respectively (see Table 1).
For n individuals in the sample, the likelihood function is
n
2
L = ∏ [ ∑ qij pij yi (1 − pij )1− yi ]
i =1 j =1
where pi1 and pi2 denotes the conditional probability of yi = 1 given the QTL
genotypes xi* = 1 and xi* = 0, respectively. The log likelihood function is
n
2
i =1
j =1
l = ∑ log( ∑ qij pij yi (1 − pij )1− yi ) .
(6)
The first partial derivatives are
n
∂l
= ∑ ωi ( yi − pi1 )
∂b i =1
(7)
n
∂l
= ∑ [ωi ( yi − pi1 ) + (1 − ω i )( yi − pi 0 )]
∂γ i =1
(8)
and
where
ωi =
qi1 pi1 yi (1 − pi1 )1− yi
2
1− yi
.
(9)
∑ qij pij (1 − pij )
yi
j =1
is the posterior probability of xi* = 1.
By treating ωi as constants, the second partial derivatives are
∂ 2l
∂b 2
n
= − ∑ ω i pi1 (1 − pi1 )
(10)
i =1
n
∂ 2l
= − ∑ ωi pi1 (1 − pi1 )
∂b∂γ
i =1
(11)
and
∂ 2l
∂γ 2
n
= − ∑ [ω i pi1 (1 − pi1 ) + (1 − ω i ) pi 0 (1 − pi 0 )] .
(12)
i =1
In obtaining the parameter estimates, the EM algorithm could be applied.
The idea of EM algorithm is the likelihood solution of complete data is relatively
simple compared to incomplete data (Pawitan, 2001).
In QTL mapping, the
unobserved QTL genotype xi* treated as missing data. The EM steps are as follows:
1. Set up initial values of b and γ. Usually b is set to 0, whether γ is set to Σiyi/n
2. Calculate ωi (E-Step)
3. Given ωi, solve for b and γ using the Newton-Raphson iteration (M-Step) as
follow. Let g denote the vector of first partial derivatives and H be a matrix
of second partial derivatives. If θ(t) is a vector of solutions at the tth step, the
solutions at the (t+1) step is
θ(t+1) = θ(t) + H-1g
4. Update the initial values and go to step 2
5. Repeat steps 2-4 until convergence. Given two convergence criteria εk>0 and
εp>0, the iteration is considered to be converged if one of the following
criteria are satisfied:
a.
l (γ (t +1) , b (t +1) ) − l (γ (t ) , b (t ) ) < ε k
(
)
b. max γ (t +1) − γ (t ) , b (t +1) − b ( t ) < ε p
Regression (REG) approach
Using liability model, the one-QTL REG mapping model for a backcross
population can be written as
ui = μ + bπi + εi,
i = 1, 2, …, n
(13)
where ui, μ, b, and εi have the same definitions as in model (2), and πi is the
conditional expectation of QTL genotypes given the two flanking markers. The
likelihood function is
n
L = ∏ pi yi (1 − pi )1− yi
i =1
where pi denotes the conditional probability of yi = 1 given the πi. The log likelihood
function is
n
l = ∑ [ yi log pi + (1 − yi )(1 − pi )] .
i =1
(14)
The first partial derivatives are
n
∂l
= ∑ π i ( y i − pi )
∂b i =1
(15)
n
∂l
= ∑ ( y i − pi ) .
∂γ i =1
(16)
and
The second partial derivatives are
∂ 2l
∂b 2
n
= − ∑ π i2 pi (1 − pi )
(17)
i =1
n
∂ 2l
= − ∑ π i pi (1 − pi )
∂b∂γ
i =1
(18)
and
∂ 2l
∂γ 2
n
= − ∑ pi (1 − pi )
(19)
i =1
The procedure in obtaining parameter estimates are as follow. Let g denote
the vector of first partial derivatives and H be a matrix of second partial derivatives.
If θ(t) is a vector of solutions at the tth step, the solutions at the (t+1) step is
θ(t+1) = θ(t) + H-1g
The procedure in choosing initial value for γ and b is similar to the procedure in ML
approach as mentioned above.
Trait in Ordinal Scale
Threshold model and liability
Let U denote the liability underlying ordinal trait Y with c categories. A set
of fixed thresholds, γ1, γ2, …, γc-1, on the underlying scale defined the observed
categories on ordinal scale 1, 2, …, c. We thus have model
γj-1 < ui ≤ γj ⇔ Yi=j; γ0=-∞; γc=∞
here i=1, 2…, n denote the index for individual i. Figure 4 give the illustration of the
threshold model in ordinal trait.
Figure 4. Liability and threshold model for ordinal trait
ML approach
Let the c ordered categories have cumulative probabilities γ1(x*), γ2(x*), ..,
γc(x*), where γc(x*)=1. Then
φ j = P(Y ≤ j) = P(U ≤ γ j ) =
exp(γ j − bx * )
(20)
1 + exp(γ j − bx * )
We can rewrite this as a generalized linear model with logit link function
⎛ φj
⎜1− φ j
⎝
η ij = log⎜
⎞
⎟ = γ j − bxi*
⎟
⎠
(21)
Similar to binary trait, since the QTL genotype xi* could be homozygote (1)
or heterozygote (0) for an individual, the likelihood is then a mixture distribution
with mixing proportions equivalent to the conditional probabilities of QTL genotypes
given two flanking markers, qi1 and qi2 for the QTL genotypes QQ and Qq
respectively (see Table 1).
Let φj1 and φj2 denote the conditional cumulative
probability of yi ≤ j given the QTL genotypes xi* = 1 and xi* = 0, respectively. In
addition, let Rij = Σjzij where zij = 1 if yi = j and j = 1,2, …, c, and
⎛
φijk
⎜ φi ( j +1) k − φijk
⎝
ϕ ijk = log⎜
⎞
⎟
⎟
⎠
For n individuals in the sample, the likelihood function is
⎡2
c −1 ⎧
⎪⎛ φ jk
⎢
L = ∏ ∑ qik ∏ ⎨⎜
⎜
⎢
j =1⎪⎝ φ ( j +1) k
i =1 k =1
⎩
⎣⎢
n
⎞
⎟
⎟
⎠
Rij
⎛ φ ( j +1) k − φ jk
⎜
⎜ φ( j +1) k
⎝
⎞
⎟
⎟
⎠
Ri ( j +1) − Rij
⎫⎤
⎪⎥
⎬⎥
⎪⎭⎥
⎦
The log likelihood function is
⎡2
c −1 ⎧
⎪⎛ φ jk
l = ∑ log ⎢ ∑ qik ∏ ⎨⎜
⎢k =1 j =1 ⎜ φ( j +1) k
i =1
⎪⎩⎝
⎢⎣
⎞
⎟
⎟
⎠
n
Rij
⎛ φ( j +1) k − φ jk
⎜
⎜ φ( j +1) k
⎝
⎞
⎟
⎟
⎠
Ri ( j +1) − Rij
⎫⎤
⎪⎥
⎬⎥ .
⎪⎭⎥
⎦
(22)
The first partial derivatives are
n 2
c −1 ∂l
∂l
= ∑ ∑ ωik ∑ i U ijk Qijkt ,
∂θ t i =1 k =1
j =1 ∂ϕ ijk
(23)
where θt is the t-th parameter of the model (21), and t=1, 2, …, c. In addition,
φijk
∂li
= Rij − Ri ( j +1)
,
∂ϕ ijk
φi ( j +1) k
U ijk =
φi ( j +1) k
φijk (φi ( j +1) k − φijk )
(24)
,
(25)
and
Qijkt = Pijt
∂φijk
∂η ijk
− Pi ( j +1)t
φijk
∂φi ( j +1) k
φi ( j +1) k ∂η i ( j +1) k
,
(26)
in which
Pijt =
⎧⎪ δ jt ; if 1 ≤ t ≤ c − 1
=⎨ *
∂θ t ⎪⎩− xi[t −(c −1)] ; if t = c
∂η ij
δjt = 1 if j=t, 0 otherwise, and Pict=0. For i=1, 2, …, n; j=1, 2, …, c-1,
∂φijk
∂η ijk
= φijk (1 − φijk )
and ∂φict/∂ηict = 0. In addition,
c −1 ⎧
⎪⎛
ωik =
R
φ jk ⎞ j ⎛ φ( j +1) k − φ jk ⎞
⎟
qik ∏ ⎨⎜
⎜
⎟
j =1⎪⎝ φ ( j +1) k ⎠
⎩
⎜
⎜
⎝
c −1 ⎧
2
⎪⎛ φ jk
q
∑ ik ∏ ⎨⎜⎜
k =1
j =1⎪⎝ φ ( j +1) k
⎩
Rj
⎞
⎟
⎟
⎠
φ ( j +1) k
⎟
⎟
⎠
⎛ φ( j +1) k − φ jk
⎜
⎜ φ( j +1) k
⎝
R( j +1) − R j
⎞
⎟
⎟
⎠
⎫
⎪
⎬
⎪⎭
R( j +1) − R j
⎫
⎪
⎬
⎪⎭
(27)
is the posterior probability of xi*=1 and xi*=0 for k=1 and k=2, respectively. The
second partial derivatives are
n 2
c −1
∂ 2l
= − ∑ ∑ qik ∑ U ijk Qijkt Qijks
∂θ t ∂θ s
i =1 k =1
j =1
(28)
The EM steps to obtain parameter estimate are similar as mentioned in binary trait
above. In addition, the initial value for b is usually set to 0, whereas for thresholds,
the initial values are
⎛ Aj ⎞
⎟
⎟
1
A
−
j
⎠
⎝
γ (j0) = log⎜⎜
where
n
j
i
m =1
∑∑ z
Aj =
im
n
and j=1, 2, …, c-1. On the other hand, the two convergence criteria are as follow:
(a) l (γ 1( t +1) , γ 2( t +1) ,..., γ c(−t +11) b (t +1) ) − l (γ 1(t ) , γ 2(t ) ,..., γ c(−t )1 , b (t ) ) < ε k
(
)
(b) max γ 1(t +1) − γ 1(t ) , γ 2(t +1) − γ 2(t ) ,..., γ c(−t +11) − γ c(−t )1 , b (t +1) − b (t ) < ε p
REG approach
The generalized linear model with logit link function for one-QTL model is
⎛ φj
⎜1− φ j
⎝
η ij = log⎜
⎞
⎟ = γ j − bπ i
⎟
⎠
(29)
Let φj denote the conditional cumulative probability of yi ≤ j given the πi. In
addition, let Rij = Σjzij where zij = 1 if yi = j and j = 1,2, …, c. For n individuals in the
sample, the likelihood function is
n c −1 ⎧
⎪⎛
φj ⎞
⎟
L = ∏ ∏ ⎨⎜
⎟
⎜
i =1 j =1⎪⎝ φ ( j +1) ⎠
⎩
Rij
⎛ φ ( j +1) − φ j
⎜
⎜ φ( j +1)
⎝
⎞
⎟
⎟
⎠
Ri ( j +1) − Rij
⎫
⎪
⎬
⎪⎭
The log likelihood function is
n
c −1
i =1
j =1
{
}
l = ∑ log ∑ Rij ϕ ij − Ri ( j +1) g (ϕ ij )
where
⎛
φij
⎜ φi ( j +1) − φij
⎝
ϕ ij = log⎜
and
⎞
⎟
⎟
⎠
(30)
⎛ φi ( j +1)
g (ϕ ij ) = log(1 + exp(ϕ ij )) = log⎜
⎜ φi ( j +1) − φij
⎝
⎞
⎟
⎟
⎠
The first partial derivatives are
n c −1 ∂l
∂l
= ∑ ∑ i U ij Qijt ,
∂θ t i =1 j =1∂ϕ ij
(31)
where θt is the t-th parameter of the model (29), and t=1, 2, …, c. In addition,
φij
∂li
,
= Rij − Ri ( j +1)
φi ( j +1)
∂ϕ ij
U ij =
φi ( j +1)
φij (φi ( j +1) − φij )
(32)
,
(33)
and
Qijt = Pijt
∂φij
∂η ij
− Pi ( j +1)t
φij
∂φi ( j +1)
φi ( j +1) ∂η i ( j +1)
,
(34)
in which
Pijt =
⎧⎪ δ jt ; if 1 ≤ t ≤ c − 1
=⎨ *
∂θ t ⎪⎩− xi[t −(c −1)] ; if t = c
∂η ij
δjt = 1 if j=t, 0 otherwise, and Pict