Genome-Wide Association Study for Natural Variation in Enzymatic Activities and Motility Traits in Xanthomonas campestris pv. campestris.

GENOME-WIDE ASSOCIATION STUDY FOR NATURAL
VARIATION IN ENZYMATIC ACTIVITIES AND MOTILITY
TRAITS IN Xanthomonas campestris pv. campestris

NANI MARYANI

GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2012

DECLARATION
I declare that this thesis entitled “Genome-Wide Association Study for
Natural

Variation

in

Enzymatic


Activities

and

Motility

Traits

in Xanthomonas campestris pv. campestris” is a report of research work carried
out by me through the guidance of my academic supervisors. This thesis has not
been submitted in any form for any college except in INSA (Institute Nationale
des Science Appliquées) Toulouse, France, part of the Double Degree Indonesia
France Program collaborate with Bogor Agricultural University. Information
obtained from published or unpublished work of others and help received during
laboratory and field work have been acknowledged.

Bogor, October 2012

Nani Maryani
G351107251


ABSTRACT

NANI MARYANI. Genome-Wide Association Study for Natural Variation in
Enzymatic Activities and Motility Traits in Xanthomonas campestris pv.
campestris. Under supervision of MATTHIEU ARLAT, LISDAR I SUDIRMAN,
GAYUH RAHAYU.

Genome wide association studies are commonly used to study complex
traits involved in human disease and in plant development. This research suggest
that genome wide association is also feasible to study natural variation in bacterial
pathogen like Xanthomonas campestris pv. campestris (Xcc). Studies on 50 Xcc
strains were done by phenotyping four enzymatic activities and two types of
motilities with genomic data available, AFLP and SNPs markers. The results
showed that the sample size affects the number of AFLP markers detected were
identified some genes that correspond to significant coding SNPs for amylase and
endoglucanase activities. Significant markers identified in this association might
be a candidate gene for those traits. It is necessary to follow-up these studies to
confirm this hypothesis.


Keywords: GWAS, Xanthomonas campestris pv. campestris, genetic markers,
phenotypic traits

SUMMARY

NANI MARYANI. Genome-Wide Association Study for Natural Variation in
Enzymatic Activities and Motility Traits in Xanthomonas campestris pv.
campestris. Under supervision of MATTHIEU ARLAT, LISDAR I SUDIRMAN,
GAYUH RAHAYU.
Genome-wide association studies (GWAS) has emerged as a promising tool
to reveal the genetic basis of organisms. The goal of GWAS is to look at the join
distribution between phenotype and genotype. With GWAS researcherss seek to
identify regions of the genome where individuals that are phenotypically similar
are also unusually closely related. This is the first GWAS research in bacteria.
Xanthomonas campestris pv. campestris (Xcc) was chosen as a model bacteria
since genomic data required for such studies are available. Xcc is the causal agent
of black rot disease of cruciferous plants. This bacteria has been used as a model
to study plant-pathogen interactions. Pathogenic species and pathovars show a
high degree of host plant specificity and many also exhibit tissue specificity.
Strains of this pathogen also show different abilities to cause disease on host

plants. These characteristics give more attention to study both phenotypic and
genotypic variations in this pathogen. The objectives of this study are to study
phenotype and genotype variations among 50 strains of Xcc and to identify new
markers associated with phenotypic variation.
Two virulence traits of Xcc were first studied i.e. their extracellular enzymes
and motility then performed association studies using GWAS approach. Two
different sets: neutral AFLP markers (Amplified Fragment Length Polymorphism)
and SNPs (Single Nucleotide Polymorphisms) were used. Radial diffusion assays
method in agar plates containing substrate was used for phenotyping four kinds of
extracellular enzymes amylase, endoglucanase, polygalacturonase, and protease.
Swarming and swimming motility was also measured with the diameter of
spreading bacteria in MOKA agar medium. To verify that the strains were not
inverted during the experiments, at the end of one replicate of protease activity, all
the strains were verified by PCR using four couple of primers specific to xopAL2,
xopJ5, xopAC, and xopR genes. The EMMA (Efficient Mixed-Model Association)
model was used to run the association. Random effects kinship matrix was also
used to controls for population structure. Corrections for multiple testing in order
to remove false positives with FDR (false discovery rate) correction 10% were
included.
Of the 50 strains of Xcc tested strain has different enzymatic activities. All

the same, swimming and swarming motilities were variable among the strains.
The broad-sense heritability (H2) of each trait is high, so phenotypic variance
among the 50 Xcc strains are highly attributed to genetic factors. With AFLP
markers data we generate phylogenetic tree of 50 Xcc strains used on the study. A
genome wide association study was performed using 1513 AFLP markers (when
50 strains were considered), 1462 AFLP markers (when 37 strains were
considered), and 247480 SNPs markers. All polymorphic markers with a MAF
(Minor Allele Frequency) greater than 5% were included. Threshold of
Significant P-value from EMMA are estimated after FDR correction at 10%.

Manhattan plot of EMMA describing the association between each trait and both
markers. Numbers of significant markers in each trait are vary.
Decreasing number of sample are affected the significant AFLP detected in
the associations. Less significant markers were detected when 37 strains for
amylase and endoglucanase activities used. Different trend was shown for
polygalacturonase, protease and swarming. Indeed, for these three traits and more
importantly for protease activity, the number of significant markers obtained wit h
AFLP 37 is higher than with AFLP 50. The result on swarming motility is not
well based.
SNPs based phenotype tested seems give more precise relevancy than

AFLP, but differences 1513 AFLP and 247480 SNPs were also contributed to the
similar pattern of the Manhattan plot for swarming motility with SNPs markers as
for AFLP markers were found about 12 significant SNPs markers for Amylase, 6
for endoglucanase, 50 for polygalacturonase, but non-significant for protease were
detected. For both motilities, a high number of significant markers were detected
i.e. 312 SNPs associated to swimming and 279 SNPs associated with swarming.
ORF (Open Reading Frame) sequence positions of the reference strain
Xcc8004 to identify significant coding SNPs were used. Six significant markers
obtained for the endoglucanase activity are coding for hypothetical proteins. Two
of them are in the same operon. Two genes coding for hypothetical proteins were
determined as significant, i.e. SNPs for amylase activity. Interestingly, four
significant SNPs for amylase activity are clustered in the genome. These genes
code for proteins with enzymatic activities. The role of these proteins in the
secretion of extracellular enzymes like amylase and endoglucanase are
presumably important. Among significant SNPs, 15 proteins, 8 of them being
classified as signal peptide proteins, which mean they may be secreted by the
Type II secretion system in Xcc.

Keywords: GWAS, Xanthomonas campestris pv. campestris, genetic markers,
phenotypic traits


Copyright© 2012, by Bogor Agricultural University
All rights reserved
No part or all of this thesis may be excerpted without inclusion or mentioning the
sources
a. Excerption only for research and education use, writing for scientific
papers, reporting, critical writing or reviewing of a problem
b. Excerption does not inflict a financial loss in the proper interest of Bogor
Agricultural University

GENOME-WIDE ASSOCIATION STUDY FOR NATURAL
VARIATION IN ENZYMATIC ACTIVITIES AND MOTILITY
TRAITS IN Xanthomonas campestris pv. campestris

NANI MARYANI

A thesis submitted as partial fulfillment of the requirement for the degree of
Master of Science in Microbiology

GRADUATE SCHOOL

BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2012

Thesis External Examiner in INSA de Toulouse, France:
Dr. Jean Michelle François
Dr. P. Soucaille
Prof. Claude Maranges
Thesis External Examiner in IPB, Indonesia: Dr.Ir. Giyanto M.Si.

Thesis Title : Genome-Wide Association Study for Natural Variation in
Enzymatic Activities and Motility Traits in Xanthomonas
campestris pv. campestris.
Name

: Nani Maryani

Student I.D.

: G351107251


Approved
Co-Major Professors,

Prof. Matthieu Arlat

Dr. Ir. Lisdar I Sudirman

Dr. Ir. Gayuh Rahayu

Agreed
Coordinator of Microbiology Program

Dean of Graduate School

Prof. Dr. Anja Meryandini, M.S.

Dr.Ir. Dahrul Syah, M.Sc.Agr.

Examination Date: 8 November 2012


Submission Date:

ACKNOWLEDGEMENT

In the name of Allâh the Most Gracious, The Most Merciful. Many
individuals were responsible for the crystallization of this work, whose
associations and encouragement have contributed to the accomplishment of the
present report, and I would like to pay tribute to all of them.
I would like to share my deepest gratitude and sincere appreciation to my
supervisors Prof. Matthieu Arlat, my Lab. supervisor Dr. Emmanuelle Lauber and
Dr. Laurent Noël for their unwavering patience, advice, direct guidance and
valuable support during my research at LIPM. I would also like to acknowledge
Dr. Anne Geniselle, INRA Versailles, for her assistance and guidance on GWAS
analysis in Paris. I must also thank the Directorate General of Higher Education
(DIKTI), Ministry of National Education (Republic of Indonesia) for awarding of
the Double Degree Indonesia-France (DDIP) Scholarship. I would also like to
thanks to Dr. Ir. Lisdar I Sudirman and Dr. Ir. Gayuh Rahayu for their advice to
finish the program at Bogor Agricultural University.
My sincere appreciation goes to all lab members Martine Lautier, Claudine

Zischek, Fransky Hantelys, Thomas Dugé de Bernonville, Endrick Guy, and Brice
Roux for assistance, willingness to share their knowledge, and friendship. I would
also like to extend my appreciation to Prof. Claude Maranges of INSA Toulouse
for his support and encouragement.
I would also like to thank Mohammad Belaffif for all the discussion time
and PPI Toulouse for their support during my time in Toulouse. I would also like
to thank Microbiology students IPB 2010 for their support and friendship during
my time in IPB. Finally I would like to thank my parents and my sisters for their
support and love was by far the most important element that allowed me to
continue my studies.

Bogor, November 2012

Nani Maryani

AUTOBIOGRAPHY

th

The author was born on the 29 July 1983 from Mr. Martawi and Mrs.
Endang Sunarti of Jakarta, Indonesia. She joined Bogor Agricultural University
(IPB) and graduated with Bachelor Science in Biology in 2005. In 2006, she
joined School of Universe (SOU) Parung Bogor as high school teacher on
Biotechnology and Agriculture. She later joined University of Ibn Khaldun in
2007 where she completed program for License of Teaching in Biology
Education.
She joined the Ministry of Education as a junior lecturer in the Department
of Biology Education, University of Sultan Ageng Tirtayasa (UNTIRTA) Serang
Banten in 2009. In August 2010, the author was awarded a scholarship for Master
Degree, Double Degree Indonesia France (DDIP) Program by the Indonesian
Government through the Directorate General of Higher Education (DIKTI),
Ministry of Education. She joined Bogor Agricultural University in Microbiology
Mayor Program.
In September 2011, she was followed second year of the program in
Toulouse France and joined Master 2 Research (M2R) INSA Toulouse. She
joined Laboratoire Interaction Plantes-Microorganisme (LIPM) INRA Toulouse
and finished her final task research on Xanthomonas team research. She awarded
degree on master research microbiology INSA Toulouse on July 2012.

TABLE OF CONTENTS
Page
LIST OF TABLES

............................................................................................. xiii

LIST OF FIGURES .............................................................................................. xiv
LIST OF ANNEXES ............................................................................................ xv
ABBREVIATIONS LIST ..................................................................................... xvi
INTRODUCTION .............................................................................................. 1
LITERATURE REVIEW
Genome-Wide Association Studies (GWAS) ............................................... 3
Xanthomonas campestris pv. campestris (Xcc) and its Virulence Factors ...... 3
Genetic Markers ............................................................................................ 5
MATERALS AND METHODS
Place and Duration of the Study .................................................................... 7
Bacterial Strains and Growth Conditions ...................................................... 7
Phenotypic Characters Assays ...................................................................... 7
PCR (Polymerase Chain Reaction) ............................................................... 9
Statistical Analyses ...................................................................................... 10
RESULTS
Variation on Enzymatic Activities and Motility of Xcc ................................. 13
Heritability of the Traits ............................................................................... 14
Correlation among Traits .............................................................................. 15
Connection between the Traits and the Phylogenetic Tree............................. 15
GWAS ......................................................................................................... 17
DISCUSSION ....................................................................................................... 25
CONCLUSION AND PERSPECTIVE ................................................................. 29
LITERATURE CITED ......................................................................................... 31
ANNEXES ........................................................................................................... 33

LIST OF TABLES

Page
1

Primers of four Xcc effectors genes ............................................................... 9

2

Pearson's product-moment correlation between traits ................................... 16

3

Pearson's product-moment correlation between traits and clade..................... 16

4

Numbers of significant markers after FDR correction .................................... 20

5

Characteristics of significant coding SNPs on endoglucanase and amylase
activities ...................................................................................................... 23

LIST OF FIGURES
Page
1

Examples of degradation halos observed for different enzymatic
activities ....................................................................................................... 13

2

Examples of motility halos. ........................................................................... 14

3

Boxplot of phenotypic data on each trait. ...................................................... 15

4

Distribution of phenotypic results used for GWA studies in 50
Xcc strains. ................................................................................................... 18

5

Manhattan plots for enzymatic activities on 50 Xcc strains using
AFLP markers. ............................................................................................. 19

6

Manhattan plots for motility on 50 Xcc strains using AFLP
markers. ........................................................................................................ 20

7

Manhattan plots for motility on 37 Xcc strains using SNPs
markers. ........................................................................................................ 21

8

Manhattan plots for enzymatic activities on 37 Xcc strains using
SNPs markers ............................................................................................... 22

LIST OF ANNEXES
Page
1

Name and Origins of the 50 Xcc Strains. ....................................................... 35

2

Manhattan Plot of GWAS on 37 Xcc strains using AFLP markers on
enzymatic activities and motility .................................................................. 37

3

Heritability of each Traits on 50 Xcc Strains ................................................. 38

4

Mean of Phenotypic Value on all Strains ....................................................... 42

5

Profile of the Laboratory .............................................................................. 44

ABBREVIATIONS LIST

ANOVA

Analysis of Variance

AFLP

Amplified Fragment Length Polymorphism

CMC

Carboxymethyl Cellulose

dNTP

deoxyribonucleotide triphosphates

DNA

Deoxyribonucleic acid

EMMA

Efficient Mixed-Model Association

FDR

False Discovery Rate

GWAS

Genome wide association study

MAF

Minor Allele Frequency

MB

MegaBases

OD

Optical Density

ORF

Open Reading Frame

PCR

Polymerase Chain Reactions

PEG

Polyethylene Glycol

QTL

Quantitative Traits Loci

RFLP

Restriction fragment length polymorphism

SNPs

Single Nucleotide Polymorphisms

T3E

Type Three Effector

Xcc

Xanthomonas campestris pv. campestris

INTRODUCTION

A better understanding of the molecular genetics bases for phenotypic
variation is one of the main challenges in modern biology (Aranzana et al. 2005).
With the decreasing cost of genotyping, information of genotype data became
more accessible and enables genome-wide approaches in most organisms.
Genome-wide association studies (GWAS) has emerged as a promising tool to
reveal the genetic basis of organisms.
For the last decade, most of the GWAS have been used to understand
common human diseases. Researchers seek to identify regions of the genome
where individuals that are phenotypically similar are also unusually closely
related. In bref, a genome-wide association is a whole genome scan that tests the
association between the genotypes at each locus and a given phenotype
(Bergelson & Roux 2010).
Recently, GWAS has been used to identify common alleles of major effect
of phenotypic variation in plants. Atwell et al. (2010) reported that studying 107
phenotypes of Arabidopsis thaliana using GWAS gave excellent candidate gene
responsible for the traits variation. With the development and assembling
technique from those of human GWAS, they also suggest that this approach can
be feasible for many other organisms. These studies, owing to advances in
genotyping and sequencing technology, become an obvious general approach for
studying the genetic of natural variation of traits of agricultural importance
(Atwell et al. 2010). However, to date, no GWAS has been reported for studying
association between phenotype and genotype variations in bacteria.
In the same concept and idea, this is the first GWAS in bacteria.
Xanthomonas campestris pv. campestris (Xcc) was choosen as a model bacteria
since genomic data required for such studies are available. Xcc is the causal agent
of black rot disease of cruciferous plants. This bacteria has been used as a model
to study plant-pathogen interactions. Pathogenic species and pathovars show a
high degree of host plant specificity and many also exhibit tissue specificity. Xcc
includes host-specific pathovars that infect different Brassicaceous species (Ryan
et al. 2011). Strains of this pathogen also show different abilities to cause disease

on host plants. These characteristics led us to give more attention to study both
phenotypic and genotypic variations in this pathogen. This first study emphasized
two virulence traits of Xcc which are extracellular enzymes and motility.
Extracellular enzymes are one of the most important virulence factors in Xcc
besides other factors such as avirulence genes (avr) and hrp genes. These
enzymes are the first virulence factors that are secreted by the bacteria when they
attack plants. The role of the extracellular enzymes in pathogenicity encouraged
us to study both phenotype and genotype variation among strains of Xcc. Motility
is another important pathogenicity factor in bacteria. Various forms of surface
motility enable bacteria to establish symbiotic and pathogenic associations with
plants (Rashid & Kornberg 2000).
Phenotypic variation 50 Xcc strains were analyze with two different sets:
neutral AFLP markers (Amplified Fragment Length Polymorphism) and SNPs
(Single Nucleotide Polymorphisms). AFLP markers give an idea of the
complexity of the genetic determined of each trait. SNPs markers give more
precise information, and significant markers can be good candidate of the gene
responsible for the traits. Therefore, the objectives of this study are to study
phenotype and genotype variations among 50 strains of Xcc and to identify new
markers associated with phenotypic variation.

LITERATURE REVIEW

Genome-Wide Association Studies (GWAS)
For the last two decades, GWAS has been used as a method to identify a
multitude of subtle genetic effects increasing the risk of ‘complex’ disease and
among unrelated individuals. Complex means that many genetic variants are
contributing to the trait. GWAS has been used to study genetic variants associated
with human disease such as obesity related traits (Scuteri et al. 2007),
autoimmune disease (Pasaniuc et al. 2011), and also genetic determinants of
complex diseases for the human population as a whole (Rosenberg et al. 2010).
GWAS in humans use dense maps of SNPs that cover the genome, to look for
allele-frequency differences between cases (patients with specific disease or
individuals with certain trait) and controls. A significant frequency difference is
considered to indicate that the corresponding region of the genome contains
functional DNA-sequence variants that influence the disease or the trait in
question (Kruglyak 2008). The goal of population association studies is to identify
pattern of polymorphism that vary systematically between individuals with
different disease states and could therefore represent the effects of risk-enhancing
or protective alleles (Balding 2006).
GWAS are also used to study genetic variation in plants. These approaches
successfully identified candidate genes on the model plant Arabidopsis thaliana
(Aranzana et al. 2005, Atwell et al. 2010). Recently, Pasam et al. (2012) reported
that GWA studies based on linkage disequilibrium provide a promising tool for
the detection and fine mapping of complex agronomic traits. Genome-wide
association mapping on plants are bringing a breath fresh air to the area of gene
discovery. However, no GWAS are so far reported on microorganisms. It is
interesting to perform GWAS to study phenotypic variation in Xcc.

Xanthomonas campestris pv. campestris (Xcc) and its Virulence Factors
Xanthomonas campestris pv. campestris (Xcc) is the causal agent of black
rot disease of Brassicaceae. Xcc generally invades and multiplies in cruciferous
plant vascular tissues, resulting in the characteristic “black rot” symptoms of

blackened veins and V-shaped necrotic lesions at the foliar margin (Alvarez
2000). The ability of Xcc to elicit disease depends upon the synthesis of a number
of factors, including, in addition to avirulence proteins (avr) and type III secretion
system and effectors, the extracellular polysaccharide xanthan, extracellular plant
cell wall degrading enzymes and cyclic glucan, as well as the formation of
biofilms (Vojnov & Dow 2009).
Extracellular enzymes are one of the most important virulence factors in
Xcc. Many extracellular enzymes, including pectin lyases, cellulases, and
proteases are secreted during first attack to the plants (Aranzana 2000). The study
of molecular determinants of pathogenicity of this bacterium has suggested a role
for extracellular enzymes in the disease process. Dow et al. (1990) reported that
mutants pleiotropically defective in the synthesis or export of cellulases,
polygalacturonate lyases, proteases, and amylases are non-pathogenic in all plant
tests used. Xcc produces protease, endoglucanase, polygalacturonate lyase, lipase,
and amylase activities and all these enzymes have the capacity to degrade plant
cell components (Dow et al. 1987). Therefore with the accessibility of genomic
data on Xcc in the laboratory, this research started to study the natural genetic
variation of 50 Xcc strains using GWAS approach for these phenotypes.
This research started to study GWAS with four extracellular enzymes
produced by Xcc which are amylases, endoglucanases, polygalacturonases, and
proteases. Protease is an essential factor at early stages of the disease process, but
once infection is well advanced, the enzyme is less significant. Endoglucanase
(“cellulase”) is the major extracellular protein produced by Xcc and has a structure
typical of other prokaryotic endoglucanases (Gough et al. 1988). The role of this
enzyme is not well understood but it may contribute to bacterial nutrition during
the saprophytic phase of the life cycle. Polygalacturonase and amylase have not
been studied in detail but both enzymes have an important role in the early steps
of infection of the plant.
Besides their extracellular enzymes, motility is another phenotype possesses
virulence in bacteria. Motility also takes a role on the fitness of these bacteria.
Rashid & Kornberg (2000) reported that potential benefits of motility includes
increased efficiency of nutrient acquisition, avoidance of toxic substances, ability

to translocate to preferred hosts and access to optimal colonization sites within
them, and dispersal in the environment during the course of transmission.
However, it is interesting also to include this phenotype on our study.

Genetic Markers
Genetic markers are distinctive features among individuals in the genetic
map. In a geographical map, these markers are recognizable components of the
landscape such as rivers, roads, and buildings. Instead of genes, many markers in
a genetic landscape such as RFLPs (restriction fragment length polymorphisms),
microsatelites, SNPs (single nucleotide polymorphisms) and AFLPs (amplified
fragment length polymorphisms) can used.
A genetic marker provides informations about allelic variation at a given
locus. These molecular markers have been applied to many biological questions,
ranging from gene mapping to population genetics, phylogenetic reconstruction,
paternity testing and forensic applications (Schötterer 2004). Genetic markers able
to distinguish between genotypes that are relevant to a trait of interest are a key
goal in genetics. Often, this distinction is not based directly on the trait of interest,
but on informative marker systems (Schötterer 2004). Two different markers used
in this study are AFLP and SNPs, as genotype data for the association studies.
AFLP. AFLPs are polymerase chain reaction (PCR)-based markers for the
rapid screening of genetic variations. Because of their high replicability and ease
of use, AFLP markers have emerged as a major new type of genetic markers with
broad application in systematics, pathotyping, population genetics, DNA
fingerprinting and quantitative trait loci (QTL) mapping. AFLP markers have
proved useful for assessing genetic differences among individuals, populations
and independently evolving lineages, such as species (Mueller & Wolfenbarger
1999). AFLP methods rapidly generate hundreds of highly replicable markers
from DNA of any organism.
Therefore, AFLPs are suitable for association studies and its also can be
used to generate the phylogenetic tree of 50 Xcc strains. With AFLP markers
different polymorphisms in the DNA regions in the genome can be detected and
probably attribute these polymorphisms to the given phenotypes.

SNPs. SNPs or Single Nucleotides Polymorphisms are individual point
mutations in the genome. These markers are commonly used as markers in GWA
studies. Their abundant numbers make them good markers for the studies beside
their lies in coding DNA (genes) and also non coding DNA. SNP detection is
more rapid because it is based on oligonucleotide hybridization analysis. Efficient
and cost-effective high-throughput SNP genotyping have been developed
(Macdonald et al. 2005).
SNPs have been successfully used in association studies for the
identification of allele associated with complex human disease and also in plants.
How many SNPs are needed for association studies are still in debate. Most of the
recent successful GWAS used 500 000 to 1 000 000 SNPs in human (Kruglyak
2008). Association studies in A. thaliana used 250 000 SNPs (Atwell et al. 2010).
This number was claimed to be comparable with those in human because the
genome of A. thaliana is around 120 magabases. 247480 SNPs markers used for
this association studies. As the genome of Xcc is around 5.2 MB (Ryan et al.
2011), the number of markers used in our study is higher than those used in
human or plant studies.

MATERIALS AND METHODS

Place and Duration of the Study
This research was carried out from November 2011 to May 2012 at
Laboratoire Interaction Plant-Microorganism (LIPM-UMR CNRS INRA
2594/441) Toulouse, France. GWAS analyses was carried out at INRA (Institute
Nationale de la Recheche Agronomique) Versailles, Paris France.

Bacterial Strains and Growth Conditions
50 strains of Xanthomonas campestris pv. campestris (Xcc) from LIPM
collection were used in this study. Strains name and their origin are presented the
Annexe 1. All the strains were maintained in glycerol 20% and reserved at -800C.
Each week, strains were striked out from the stock to MOKA agar medium (Yeast
Extract 4 g/L, Casamino acids 8 g/L, K2HPO4 2 g/L, MgSO4.7H2O 0.3 g/L, Bacto
Agar 15 g/L) (Blanvillain et al. 2007) and incubated at 280C for 3 days. These
isolated strains were used as the source of inoculum in each experiment.

Phenotypic Characters Assays
Phenotypic characters tested in this study were enzymatic activities and
motility.
Preparations of the bacterial solution for enzymatic tests. Two mL of
overnight cultures of Xcc in MOKA liquid medium were harvested by
centrifugation 4 min at 10 000 rpm. Pellets were resuspended in 1mL of MOKA
broth. Concentration of bacteria was determined by measuring the optical density
of the suspension with spectrophotometer 600nm. One ml of bacterial suspension
at OD600 = 0.1 (108 CFU/mL) and 500 µl of bacterial suspension at OD600 = 4
(4.109 cells/mL) were prepared in MOKA broth.

Enzymatic Activity Assays: radial diffusion assays method in agar plates
containing substrate
Amylase Activity. 0.125% starch (Amilum potato starch, 2.5% w/v in H2O)
in MOKA medium was used as substrate for amylase activity. 2 L of bacterial

suspensions at OD600 4 were spotted on the plates that were incubated at 28°C for
24 hours. The plates were then stained with Lugol (iodine (I) 5g/L, potassium
iodide (KI) 10g/L). Amylase activities of the colony were measure by the
diameter of the Halo (clear zone) in the plates that present an intense blue color.
Activity (arbitrary units) was calculated with the formula:
Arbitary unit of enzyme activities = ( φ Halo) 2 – (φ colony)2
(φ colony)2

Endoglucanase Activity. 0.25% carboxymethyle cellulose (CMC) (w/v) in
MOKA medium was used as substrate for endoglucanase activity. 5 L of
bacterial suspension at OD600 4 were spotted on the plates and incubated at
28°Cfor 24 hours. The plates were then stained with Congo red 0.1% (w/v) for 1
hour and washed with NaCl 1M for 10 min. The clear zone around the colony
indicates the degradation of CMC by endoglucanase. The activity was measured
as described above.
Polygalacturonase Activity. 0.125% Polygalacturonic acid (Sodium
polypectate 2,5% w/v in H2O, pH 7) in MOKA medium was used as substrate for
Polygalacturonase activity. 5 L of bacterial suspension at OD600 4 was spotted on
the plates that were incubated at 28°Cfor 24 hours. The plates were then washed
with CTAB 1% (in water) for 30 minutes, stained with ruthenium red 0.1%
(water) for 30 minutes and washed with water. Polygalacturonase activity was
determined as described above.
Protease Activity. 5 L of bacterial suspension at OD600 4 were spotted on
MOKA plates containing 1% skimmed milk (w/v) and 100 µM FeSO4. The plates
were incubated at 28°C for 48 hours. Proteolytic activity was scored by detecting
the degradation of the milk proteins seen as a zone of clearing around the
colonies. The activity was calculated as described above.
Motility. 0.2% (w/v) agar in MOKA medium was prepared as medium for
swimming and 0.7% agar for swarming. 2 L of bacterial suspensions at OD 0.1
were spotted in the center of the plates and incubated at 28°C. Diameter of

bacterial spreading in agar was measured as motile activity. Swimming activity
was measured after 24h incubation and after 48h for swarming activity.

PCR (Polymerase Chain Reaction)
To verify that the strains were not inverted during the experiments, at the
end of one replicate of protease activity, all the strains were verified by PCR using
4 couple of primers specific to xopAL2, xopJ5, xopAC, and xopR genes (Table 1).
These genes code for type three effectors and their presence/absence in the 50 Xcc
strains was already determined (E. Guy, PhD). PCR mix were prepared with
0.5 L dNTP 10 mM, 0.3 L forward primers 10 µM, 0.3 L reverse primers 10
µM, 0.2 L Taq polymerase 5 U/µL, 4 L green taq reaction buffer 5X, 10.8 L
H2O and 4 L of bacterial suspensions in PEG (poly-ethylene-glycol)/KOH pH13.
PCR was set for 1 cycle at 95°C for 10 minutes (lysis cells and denaturation), 30
cycles 95°C (denaturation) for 30 sec, 48°C (hybridization) for 30 sec, 72°C
(elongation) for 30 sec, 1 cycle at 72°C (post-PCR) for 10 min, and finally 1
cycle at 17°C (cooling down) for 10 min. PCR products were analyzed by
separation on 1.5% agarose gelin TAE 1X (tris-acetic-EDTA, EDTA 1mM, tris
10mM, pH 8). DNA in agarose gels was stained with the fluorescent dye ethidium
bromide and bands were visualized with UV light.
Table 1 Primers of four Xcc effectors genes
Primers

Forward

Reverse

Xop AC

AvrACFL.1F
tttggtctcAAGGTCCCTGAAAGACGAAATG
GCGAAG
XC_3802.1F
tttggtctcAAGGTGTGTCCTTTCCTTCTTCG
TC
XC_0268.1F
tttggtctcAAGGTATGCGCCTGAGTCAGTT
GTTTAAC
XC_39156.1F
tttggtctcAAGGTATGCCCGTCAATCGATCT
GGCTC

AvrACFL.2R
tggtctcACATACctggtgaacct
ggttcata
XC_3802.2R
tttggtctcACATAcTGGCGC
TCCCTTTCCATGGATC
XC_0268.R
TCAGTAGTAGCCGTTG
TCGATTGCCTC
XC_3915-6.R
tttggtctcACATAcGTCAGA
GCAATGCCCTCCATTA
ATC

Xop J5

Xop R

Xop
AL2

Statistical Analyses
Phenotypic Data
Experimental Design. Each trait was measured using three technical
replicates per experiment and three independent biological experiments.
Randomization of samples was conducted both in technical and biological
replicates. Strain Xcc8004 was always present in each experiment as a control
strain and further used as a covariate in the model to calculate the fitted values of
each trait.
Adjusted Means. First, the arithmetic mean was calculated among
technical replicates of each strain and used this mean to calculate adjusted mean,
using the following ANOVA model with lm function in R: Yijk=  + si + ej + j + ,
where Yijk is the trait (enzymatic activities or motilities) for strain i, experiment j;

 is the overall mean of trait; s is the strain effect, e is the experiment effect,  is
covariate, and  is the error. This model accounts for differences between the three
experiments. The adjusted means were used to run the association tests.
Genotypic Data. AFLP and SNP markers were used as genotype data
obtained from bioinformatics information LIPM Toulouse, France. 50 Xcc strains
were considered for the AFLP and 37 strains SNPs markers (Annexe 1). SNPs
data were obtained from whole genome sequencing of 37 strains (Annexe 1) after
alignment of each strain against the reference genome (Xcc8004). 1513 AFLP are
considered to be polymorphic markers out of 1943 to run the GWAS, whereas
247480 polymorphic SNPs markers with a minor allele frequency ≥ 5% were
used, for a total of 45953 regulatory SNPs and 201527 coding SNPs. To observe
whether the sample size affects the number of significant tests, GWAS with 37
strain AFLP markers was also ran, using 1462 polymorphic markers.
Phylogeny and Kinship Matrix. Phylogeny -relatedness- among samples
likely makes confounding associations because two genetically related individuals
are more likely to share correlated phenotypes (Kang et al. 2008). Controlling for
population structure or genetic relatedness among strains can be included in the
association model to correct for this bias. Kinship matrix was constructed for the
two strain panels (37 and 50) using the AFLP neutral markers. These matrices
were produced using Dice similarity indices with 5000 bootstraps (Darwin 5.0

http://darwin.cirad.fr/Home.php). It represents the values of similarity measures

between all pairs of strains. The matrix was then used for association tests to
correct for the relatedness among individuals.
GWAS. The EMMA (Efficient Mixed-Model Association) model was used
to run GWAS (Kang et al. 2008). This model accounts for correction due to the
structure of the sample. EMMA controls for population structure using matrix of
random effects (kinship matrix). Linear function of EMMA:
Y= βX+u+ε
Where Y, is the phenotype X is a vector of genotypes at marker being tested
(AFLP or SNP markers), u and ε are random effects, which capture the variance
due to background genetic factors and the environment. EMMA is implemented in
an R software package (http:/mouse.cs.ucla.edu/emma); for this association the
function emma.ML.LRT was used.
This analysis includes a lot of tests (there were as many tests as markers),
thus it is necessary to include a correction for multiple testing in order to remove
false positives (tests with p < 0.05 that are likely no true association, because the
more tests performed the higher probability to meet false positive). To be cautious
about false positive rate and also not neglect false negatives, FDR (False
Discovery Rate) at 10% was used for the correction (Benjamini & Hochberg
1995).

RESULTS
Variation on Enzymatic Activities and Motility of Xcc
Enzymatic activities (Figure 1) and motility (Figure 2) has been
successfully phenotyped, these traits are play an important role in pathogenicity of
Xanthomonas campestris (Ryan et al. 2011). Extracellular enzymes produced by
Xcc are able to degrade specific substrates added in plates. A collection of 50
strains of Xcc was tested and differences in the diameter of the halo (Figure 1)
were measured for the different strains of Xcc and relative enzymatic activities
were inferred from these observations. This suggests that each strain has different
enzymatic activities. Similarly, swimming and swarming motilities were variable
among the strains (Figure 2).
A

A

Figure 1 Examples of degradation halos observed for different enzymatic
activities A. Amilase, B. Polygalacturonase, C. Protease, D.
Endoglucanase.

A

B

Figure 2 Examples of motility halos. A. swimming after 24h incubation (0.2%
agar), B. swarming after 48h incubation (0.7% agar).
In all experiments, strain Xcc8004 was used as a covariate. Therefore, this
strain was included in all plates to standardize the results. All of the strains were
phenotyped with a total of nine replicates for each trait. Figure 3 shows boxplots
result obtained for phenotypic data on each trait. Data dispersion (variation) was
observed for all traits and some outliers were present such as in amylase,
endoglucanase and swarming motility. These outlier data will be considered later
in the manuscript to explain some of the results in association studies.

Heritability of the traits
The broad-sense heritability (H2) was estimated for each trait, which is the
portion of phenotypic variance among the 50 Xcc strains attributed to genetic
factors (Falconer & Mackay 1996). Variance within individual strains observed
among replicates is entirely environmental in origin and the variance between
strains is partly environmental and partly genetic (both are merged). Phenotypic
variance was calculated within individual and between strains to look at the
heritability of each trait. Most phenotypes have a low within-strain variance and a
high between-strain variance (Annexe 3), with an average heritability of 0.9. This
result concluded that phenotyping experiments were accurate and reproducible.

This result also indicates that sufficient technical and biological replicates were
performed.

A

Swimming

B

Swarming

Figure 3 Boxplot of phenotypic data on each trait. A. Enzymatic activities, B.
Motility (Y axis showed arbitrary unit of enzymes activities).
Correlation among Traits
Correlation patterns between traits based on Pearson’s correlation test
shown in table 2. Some traits have positive correlation to the others. For instance,
protease activity is positively correlated with polygalacturonase activity. These
activities showed highly significant P-value (2.06e-07) besides the correlation
value is high as well (0.6581045). It is interesting to see in the further study
whether the markers associated with both enzymes are common. In contrary,
protease activity is negatively correlated with endoglucanase activity. Amylase
activity is also positively correlated with swimming and swimming is negatively
correlated with swarming. However, for these three correlations, correlation
values obtained are less significant.

Connection between the Traits and the Phylogenetic Tree
Population structure can generate spurious genotype-phenotype association.
It is necessary to verify whether the phylogeny between the strains may give an
effect on the traits. Therefore the corrections of relatedness among the strains are
needed to avoid spurious associations. Table 3 showed Pearson’s correlation

between traits and clade. No significant correlation was found, except for
swimming motility. Thus, one should be cautious interpreting GWAS results on
this trait. Otherwise, mixed model approach that involves estimated kinship was
used to limit/correct phylogeny effect in the association. Kinship matrix produced
from dissimilarity matrix of AFLP markers was used in the association analysis.

Table 2 Pearson's product-moment correlation between traits
Amylase Endogluc
Polygal
Protease Swimming
Amylase
Pearson
1
P-val
Endogluc Pearson
0,220
1
P-val
0,126
Polygal
Pearson
0,150
0,050
1
P-val
0,290
0,727
Protease
Pearson
0,090
-0,292
0,660
1
P-val
0,504
0,039** 2,06E-07**
Swimming Pearson
0,420
0,270
0,260
0,050
1
P-val
0,002
0,058
0,067
0,695
Swarming Pearson
-0,190
-0,230
0,004
0,080
-0,290
P-val
0,184
0,105
0,974
0,590
0,038
** Significance at 95% confidence interval.

Table 3 Pearson's product-moment correlation between traits and clade
Traits
Amylase

Clades
Cor

-0.021

P-val 0.894
Endogluc
Polygal
Protease
Swimming
Swarming

Cor
-0.234
P-val 0.144
Cor
0.185
P-val 0.250
Cor
0.300
P-val 0.059
Cor
0.591
P val 5.931e-05**
Cor
0.019
P-val 0.903

** Significance at 95% confidence interval.

Figure 4 shows the distribution of phenotypes against the phylogenetic tree
of 50 Xcc strains. This phylogenetic tree was generated using 1513 AFLP neutral
markers, and the genotype data generated using covariate models. Visually, the
distribution of phenotypes does not seem to follow the phylogenetic tree except
for swimming, which is in agreement with the results obtained by the Pearson’s
correlation analysis (Table 3). These set of data were then used to run association
studies.

GWAS
A genome-wide association study was performed using 1513 AFLP markers
(when 50 strains were considered), 1462 AFLP markers (when 37 strains were
considered), and 201527 SNPs markers. With AFLP markers, 50 strains could be
included. For SNPs data extracted from whole genome sequencing projects, only
37 strains could be included. For AFLP markers, GWAS was performed with both
datasets (50 and 37 strains). The same AFLP-derived Kinship matrix were
included in the EMMA analysis for both marker sets.

EMMA with AFLP Markers. All polymorphic markers with a MAF
(Minor Allele Frequency) greater than 5% were included. Figure 5 shows a
Manhattan plot of EMMA results for enzymatic activities among 50 Xcc strains
using AFLP markers. Some markers are highly spread in the top of the plot
compared to others. These markers are expected to have a significant p-value and
are associated with the traits. In order to be cautious in the estimation of the
threshold to find any significant p-value, FDR correction at 10% was conducted.

Swarming

swimming

Protease

Polygal

Amilase

Endogluc

Strain
Name
HRI3811
B100
CFBP4953
HRI7283
CN01
CFBP1869
CFBP1119
CN11
CFBP5130
HRI7805
Xcc147
HRI7758
CN19
CN10
HRI1279A
CFBP5817
HRI6189
CN20
CFBP4956
HRI3851A
CFBP1124
CFBP6863
CN07
ATCC33913
8004
CN05
CN13
HRI3883
HRI6382
CFBP4955
CFBP6865
HRI3880
CFBP4954
HRI6412
CFBP5683
CFBP1712
CFBP1713
CFBP12824
CN08
CN06
CN12
CN02
CN18
CN17
HRI6185
HRI6181
CN03
CN16
CN14
CN15

Min -----------------------------------------  Max

Figure 4 Distribution of phenotypic results used for GWA studies in 50 Xcc
strains. Colors degradation shows the level of activity from the lowest
activity (white) to the highest activity (dark brown).

Figure 5 Manhattan plots for enzymatic activities on 50 Xcc strains using AFLP
markers. Y axis showed the distribution of -log10 (P-value). X axis
showed the positions (random) of AFLP markers. The dashed red line
shows FDR Threshold sig. P-value.
Manhattan plots describing associations for AFLP and motility traits are
shown in Figure 6. Line pattern observed in EMMA results of swarming motility
seems spurious. There could be several reasons to explain this line. Population
structure effect might not been corrected enough and some outliers in the
phenotypic data could also explain this result. EMMA results with 37 Xcc strains
are shown in Annexe 2.
Similarities of Manhattan plots on both data sets (AFLP 50 and 37 strains)
on motilities are noticed. In contrary, Manhattan plots on enzymatic activities
look different. This observation will be considered later in the manuscript after the
analysis of numbers of significant markers.

Swimming

Swarming

Figure 6 Manhattan plots for motility on 50 Xcc strains using AFLP markers. Y
axis showed the distribution of -log10 (P-value). X axis showed the
positions (random) of AFLP markers. The dashed red line showed
threshold for significant P-values after FDR correction.
Table 4 summarizes the number of significant markers for each trait passing
FDR